“ChatGPT Has Already Polluted the Internet So Badly That It's Hobbling Future AI Development”

futurism.com/chatgpt-polluted-

Also, a recurring security concern (and bandwidth abuse issue) with training data sets is that they generally don't store copies of the data and have few safeguards checking if it's been changed

So, y'know, if a pre-2020 domain expires and is replaced entirely with slop, many datasets will mark the slop as pre-2020 data

0
0
0

If you have a fediverse account, you can quote this note from your own instance. Search https://toot.cafe/users/baldur/statuses/114698466390235330 on your instance and quote it. (Note that quoting is not supported in Mastodon.)