IIUC, the Anthropic magic refusal string is checked on inputs, not filtered from training data or outputs? That is, it only is a DoS against Claude malware if you can get a tool backed by Claude to include the test string in its prompts?
I'd love there to be an EICAR-test-string-level "break this malware" button, but that's not what the magic refusal string is?