Hackers' Pub

Hot take: llm "guardrails" are worthless and will always be ineffective; they are a throwback to a premodern model of security as a list of prohibitions against actions instead of a more modern, holistic approach where the system as a whole is structured such that impermissible operations fail as a consequence of the system architecture.

The core mechanism of llm systems relies on the random elision and remixing of inputs; all such guardrail systems exist within this milieu, and are thus - architecturally, according to how llms work as a baseline - subject to that same elision; therefore, you can never be assured that a given guardrail directive will be present in the context window for the llm at the time of processing.

I personally think this is blindingly obvious, but I do understand why people who are bought into the tech might not understand that any attempt to 'instruct' an llm as to 'alignment' is going to be subject to an erosion of those 'protections' as an inherent part of the function of the machine.

Bluntly, if you don't want the llm to "do" a thing, you must make that thing impossible for the llm to do. Do not give it access to your filesystem; do not give it access to your production infrastructure; do not give it access to your children; do not give it access to anything unsupervised whatsoever.

And do not use an llm for any system where determinacy of operation is even slightly important, for that matter.

https://www.theregister.com/2025/11/14/ai_guardrails_prompt_injections_echogram_tokens/?td=keepreading

Syntax	Description	Examples
`"` keyword `"`	Finds the string within quotes, including spaces. Case-insensitive. (Escape quotes inside with `\"`)	`"Hackers' Pub"`
`from:` handle	Finds content written by the specified user.	`from:hongminhee` `from:hongminhee@hollo.social`
`lang:` ISO 639-1	Finds content written in the specified language.	`lang:en`
`#` tag	Finds content with the specified tag. Case-insensitive.	`#HackersPub`
condition condition	Finds content that satisfies both conditions on either side of the space (logical AND).	`"Hackers' Pub" lang:en`
condition `OR` condition	Finds content that satisfies at least one of the conditions on either side of the OR operator (logical OR).	`#HackersPub OR "Hackers' Pub" lang:en`
`(` condition `)`	Combines the operators within the parentheses first.	`(#HackersPub OR "Hackers' Pub" OR "Hackers Pub") lang:en`