Recently, I spent a lot of time reading & writing about LLM benchmark construct validity for a forthcoming article. I also interviewed LLM researchers in academia & industry. The piece is more descriptive than interpretive, but if I’d had the freedom to take it where I wanted it to go, I would’ve addressed the possibility that mental capabilities (like those that benchmarks test for) are never completely innate; they’re always a function of the tests we use to measure them ...
(1/2)