Hackers' Pub

Syntax	Description	Examples
`"` keyword `"`	Finds the string within quotes, including spaces. Case-insensitive. (Escape quotes inside with `\"`)	`"Hackers' Pub"`
`from:` handle	Finds content written by the specified user.	`from:hongminhee` `from:hongminhee@hollo.social`
`lang:` ISO 639-1	Finds content written in the specified language.	`lang:en`
`#` tag	Finds content with the specified tag. Case-insensitive.	`#HackersPub`
condition condition	Finds content that satisfies both conditions on either side of the space (logical AND).	`"Hackers' Pub" lang:en`
condition `OR` condition	Finds content that satisfies at least one of the conditions on either side of the OR operator (logical OR).	`#HackersPub OR "Hackers' Pub" lang:en`
`(` condition `)`	Combines the operators within the parentheses first.	`(#HackersPub OR "Hackers' Pub" OR "Hackers Pub") lang:en`

Simon Park @parksb@silicon.moe

3/1/2026, 1:27:01 PM

Public

AI 에이전트의 신뢰성 측정 방법을 제안하는 연구. 기존 에이전트 벤치마크가 평균 성공률에만 치중했기 때문에 평가 결과가 좋아도 실제 환경에서는 자주 실패함을 지적한다. 에이전트가 얼마나 일관되게 동작하는지, 환경 변화에 얼마나 버티는지, 실패를 예측할 수 있는지, 오류가 얼마나 심각한지 알기 위해서는 정확성이 아닌 신뢰성을 평가해야 한다는 것이다. https://hal.cs.princeton.edu/reliability/

HAL Reliability Evaluation

Accuracy isn't enough. We evaluate AI agents on reliability — consistency, predictability, robustness, safety, and self-awareness.

hal-evals.com

If you have a fediverse account, you can quote this note from your own instance. Search https://social.silicon.moe/users/parksb/statuses/116154146576761931 on your instance and quote it. (Note that quoting is not supported in Mastodon.)