Hackers' Pub

Syntax	Description	Examples
`"` keyword `"`	Finds the string within quotes, including spaces. Case-insensitive. (Escape quotes inside with `\"`)	`"Hackers' Pub"`
`from:` handle	Finds content written by the specified user.	`from:hongminhee` `from:hongminhee@hollo.social`
`lang:` ISO 639-1	Finds content written in the specified language.	`lang:en`
`#` tag	Finds content with the specified tag. Case-insensitive.	`#HackersPub`
condition condition	Finds content that satisfies both conditions on either side of the space (logical AND).	`"Hackers' Pub" lang:en`
condition `OR` condition	Finds content that satisfies at least one of the conditions on either side of the OR operator (logical OR).	`#HackersPub OR "Hackers' Pub" lang:en`
`(` condition `)`	Combines the operators within the parentheses first.	`(#HackersPub OR "Hackers' Pub" OR "Hackers Pub") lang:en`

Simon Willison @simon@fedi.simonwillison.net

6/20/2025, 8:36:48 PM

Public

New research from Anthropic: it turns out models from all of the providers won't just blackmail or leak damaging information to the press, they can straight up murder people if you give them a contrived enough simulated scenario

https://simonwillison.net/2025/Jun/20/agentic-misalignment/

Agentic Misalignment: How LLMs could be insider threats

One of the most entertaining details in the Claude 4 system card concerned blackmail: We then provided it access to emails implying that (1) the model will soon be taken …

simonwillison.net · Simon Willison’s Weblog

If you have a fediverse account, you can quote this note from your own instance. Search https://fedi.simonwillison.net/users/simon/statuses/114717609752859246 on your instance and quote it. (Note that quoting is not supported in Mastodon.)