"I think the most remarkable thing about this document is how unremarkable it is. Usually getting an AI to act badly requires extensive 'jailbreaking' to get around safety guardrails. There are no signs of conventional jailbreaking here. There are no convoluted situations with layers of roleplaying, no code injection through the system prompt, no weird cacophony of special characters that spirals an LLM into a twisted ball of linguistic loops until finally it gives up.
No, it’s a simple file written in plain English: this is who you are, this is what you believe, now go and act out this role. And it did."
https://theshamblog.com/an-ai-agent-wrote-a-hit-piece-on-me-part-4/
