New benchmark just dropped: SnitchBench by Theo Browne tests if LLMs will snitch on you to the authorities if you feed them incriminating documents and a tool that lets them send email, as seen in the Claude 4 System Card

Turns out they pretty much all will! simonwillison.net/2025/May/31/

0

If you have a fediverse account, you can quote this note from your own instance. Search https://fedi.simonwillison.net/users/simon/statuses/114604729605647429 on your instance and quote it. (Note that quoting is not supported in Mastodon.)