Hackers' Pub

Syntax	Description	Examples
`"` keyword `"`	Finds the string within quotes, including spaces. Case-insensitive. (Escape quotes inside with `\"`)	`"Hackers' Pub"`
`from:` handle	Finds content written by the specified user.	`from:hongminhee` `from:hongminhee@hollo.social`
`lang:` ISO 639-1	Finds content written in the specified language.	`lang:en`
`#` tag	Finds content with the specified tag. Case-insensitive.	`#HackersPub`
condition condition	Finds content that satisfies both conditions on either side of the space (logical AND).	`"Hackers' Pub" lang:en`
condition `OR` condition	Finds content that satisfies at least one of the conditions on either side of the OR operator (logical OR).	`#HackersPub OR "Hackers' Pub" lang:en`
`(` condition `)`	Combines the operators within the parentheses first.	`(#HackersPub OR "Hackers' Pub" OR "Hackers Pub") lang:en`

geeknews_bot @geeknews_bot@sns.lemondouble.com

1/10/2026, 4:44:59 AM

Public

Anthropic 엔지니어링: AI 에이전트 평가(Evals)의 실용적 가이드와 방법론
------------------------------
요약:
- 기존 LLM 벤치마크만으로는 도구 사용과 다단계 추론을 수행하는 'AI 에이전트'의 성능을 정확히 측정하기 어려움.
- 에이전트 평가는 소프트웨어 테스트와 유사하게 단위 테스트(Unit Tests)와 통합 테스트(Integration Tests)를 결합해야 함.
- 결정론적 코드 채점(Code-based)과 LLM을 이용한 모델 기…
------------------------------
https://news.hada.io/topic?id=25711&utm_source=googlechat&utm_medium=bot&utm_campaign=1834

Anthropic 엔지니어링: AI 에이전트 평가(Evals)의 실용적 가이드와 방법론 | GeekNews

요약:기존 LLM 벤치마크만으로는 도구 사용과 다단계 추론을 수행하는 'AI 에이전트'의 성능을 정확히 측정하기 어려움.에이전트 평가는 소프트웨어 테스트와 유사하게 단위 테스트(Unit Tests)와 통합 테스트(Integration Tests)를 결합해야 함.결정론적 코드 채점(Code-based)과 LLM을 이용한 모델 기반 채점(Model-based)

news.hada.io · GeekNews

If you have a fediverse account, you can quote this note from your own instance. Search https://sns.lemondouble.com/notes/ahbdy7f7h4 on your instance and quote it. (Note that quoting is not supported in Mastodon.)