I think what a lot of AI critics are missing is that they're judging an LLM by its first draft. This is *not* what terrifies me about these machines.
What terrifies me is that you can ask them "find bugs in this PR." Or "find performance flaws." Or really anything.
Then have 3 agents (with different models ideally) vote on the result. Then have another fix it. Repeat until all bugs are clean.
If you haven't tried this experiment then you haven't reached the dark night of the soul that I have.