I published some notes on OpenAI's new text-to-speech and speech-to-text models. They're promising, but like other LLM-driven multi-modal models they appear to suffer from the prompt-injection-adjacent problem of mixing instructions and data in the same token stream
simonwillison.net/2025/Mar/20/

0

If you have a fediverse account, you can quote this note from your own instance. Search https://fedi.simonwillison.net/users/simon/statuses/114196695898611860 on your instance and quote it. (Note that quoting is not supported in Mastodon.)