Being on Team Words Mean Things is difficult these days, particularly when multibillion-dollar companies put out breathless press releases saying "By using our massive language model, whose training data includes every version of GCC ever released, and having it autocorrect its own output by testing it against GCC, we managed to make a C compiler that mostly works for only $20,000 in a week and gosh I have so many feelings."

I mean, what the fuck are we even doing here.

anthropic.com/engineering/buil

The fix was to use GCC as an online known-good compiler oracle to compare against. I wrote a new test harness that randomly compiled most of the kernel using GCC, and only the remaining files with Claude's C Compiler. If the kernel worked, then the problem wasn’t in Claude’s subset of the files. If it broke, then it could further refine by re-compiling some of these files with GCC. This let each agent work in parallel, fixing different bugs in different files, until Claude's compiler could eventually compile all files. (After this worked, it was still necessary to apply delta debugging techniques to find pairs of files that failed together but worked independently.)
0
0
0

If you have a fediverse account, you can quote this note from your own instance. Search https://cosocial.ca/users/mhoye/statuses/116024404796795456 on your instance and quote it. (Note that quoting is not supported in Mastodon.)