@shriramkShriram Krishnamurthi
I did some experiments a couple of months ago.
ChatGPT/Codex would produce code that compiled and worked, but absolutely ignored glaring issues.
Claude was insistant that by doing this ornate graph-coloring thing we could get better bounds on computation, ignoring that it had to redo the graph-coloring on every iteration and so the whole thing was accidentally quadratic.
Also lots of just subtle issues, or outright technically-correct-but-truly bizarre decisions.