@tomjennings I'm not a fan of the "stolen work" idea. I don't think it's accurate. The output of that indexing process isn't a verbatim copy of the original corpus of text, but something much more like a search engine index. The generated text often includes facts and ideas from the original works, but that's not covered by copyright or other IP protection. I agree that LLM training bots should respect robots.txt and other signals that the authors don't want their work used for training.
If you have a fediverse account, you can quote this note from your own instance. Search https://cosocial.ca/users/evan/statuses/114360599218276647 on your instance and quote it. (Note that quoting is not supported in Mastodon.)