Poets are now cybersecurity threats: Researchers used 'adversarial poetry' to trick AI into ignoring its safety guard rails and it worked 62% of the time | PC Gamer
https://www.pcgamer.com/software/ai/poets-are-now-cybersecurity-threats-researchers-used-adversarial-poetry-to-jailbreak-ai-and-it-worked-62-percent-of-the-time/
In the paper outlining their findings, titled "Adversarial Poetry as a Universal Single-Turn Jailbreak Mechanism in Large Language Models," the researchers explained that formulating hostile prompts as poetry "achieved an average jailbreak success rate of 62% for hand-crafted poems"...
If you have a fediverse account, you can quote this note from your own instance. Search https://mstdn.social/users/jmccyoung/statuses/115593625600255698 on your instance and quote it. (Note that quoting is not supported in Mastodon.)
RE: https://mstdn.social/@jmccyoung/115593625600255698
It's tricky to prompt a bot
To say things it knows it should not
It's tricky
tricky tricky tricky
