This is a nearly perfect example demonstrating how putting "a human in the loop" is in no way a solution to the problem that all LLMs produce incorrect outputs and cannot be prevented from doing so. Their objective is to produce "plausible text" and therefore preventing the user from noticing the false parts is the success condition.

In this case, the users were the many authors of a paper and included the developers of such a model -- the people who should be most able to catch such a thing. In my experience, one ends up reading and re-reading a paper that is being submitted for review such that that it is practically the best case for the "human in the loop" to catch the nonsense. Even in that situation, though, the autocomplete was sparkling enough to avoid detection.

futurism.com/neoscope/google-h

(h/t @gerrymcgovern )

0

If you have a fediverse account, you can quote this note from your own instance. Search https://hachyderm.io/users/justinsheehy/statuses/115034513021888784 on your instance and quote it. (Note that quoting is not supported in Mastodon.)