Hackers' Pub

Syntax	Description	Examples
`"` keyword `"`	Finds the string within quotes, including spaces. Case-insensitive. (Escape quotes inside with `\"`)	`"Hackers' Pub"`
`from:` handle	Finds content written by the specified user.	`from:hongminhee` `from:hongminhee@hollo.social`
`lang:` ISO 639-1	Finds content written in the specified language.	`lang:en`
`#` tag	Finds content with the specified tag. Case-insensitive.	`#HackersPub`
condition condition	Finds content that satisfies both conditions on either side of the space (logical AND).	`"Hackers' Pub" lang:en`
condition `OR` condition	Finds content that satisfies at least one of the conditions on either side of the OR operator (logical OR).	`#HackersPub OR "Hackers' Pub" lang:en`
`(` condition `)`	Combines the operators within the parentheses first.	`(#HackersPub OR "Hackers' Pub" OR "Hackers Pub") lang:en`

Hyaniner @hyaniner@mastodon.gamedev.place

7/17/2025, 10:46:31 PM

Public

Nowadays, I think this sometimes:

Ethics is a matter of security.
Philosophy is a matter of survival.

I read these papers yesterday. Those are very impressive to me.

"Chain of Thought Monitorability: A New and Fragile Opportunity for AI Safety" https://arxiv.org/abs/2507.11473v1

"When Chain of Thought is Necessary, Language Models Struggle to Evade Monitors" https://arxiv.org/abs/2507.05246v1

"Claude Sonnet 3.7 (often) knows when it’s in alignment evaluations"
https://www.apolloresearch.ai/blog/claude-sonnet-37-often-knows-when-its-in-alignment-evaluations

Chain of Thought Monitorability: A New and Fragile Opportunity for AI Safety

AI systems that "think" in human language offer a unique opportunity for AI safety: we can monitor their chains of thought (CoT) for the intent to misbehave. Like all other known AI oversight methods, CoT monitoring is imperfect and allows some misbehavior to go unnoticed. Nevertheless, it shows promise and we recommend further research into CoT monitorability and investment in CoT monitoring alongside existing safety methods. Because CoT monitorability may be fragile, we recommend that frontier model developers consider the impact of development decisions on CoT monitorability.

arxiv.org · arXiv.org

If you have a fediverse account, you can quote this note from your own instance. Search https://mastodon.gamedev.place/users/hyaniner/statuses/114871002159571768 on your instance and quote it. (Note that quoting is not supported in Mastodon.)