Imagine if we relied on oil companies to publish evidence that CO2 emissions cause climate change. This statement against interest by Anthropic illustrates the epistemic vulnerability in which funding agencies and universities have uncritically accepted vibes-based claims. Universities should be at least as critical as Anthropic, and should have been leading independent studies with scope similar to this one and the METR study (a group aligned with boosters, also publishing against interest).
arxiv.org/abs/2601.20245

Figure 6: Difference in means of overall task time and quiz score between the control (No AI) and treatment
(AI Assistant) groups in main study (n=52). Error bars represent 95% CI. Significance values correspond to
treatment effect. * p<0.05, **<0.01, ***<0.001

Figure shows Task Time about 7% faster in the AI group than No-AI group (p=0.391; not statistically significant).

Quiz score mean 50% for AI group vs 66% for No-AI group (p=0.010).

Treatment effect is not significant for task time, but is significant for quiz score.Figure 8: Score breakdown by questions type relating to each task and skill area. Debugging questions
revealed the largest differences in average quiz score between the treatment and control groups

Figure shows higher scores from the No-AI group than the AI group in all tests (Task 1, Task 2, Conceptual, Debugging, and CodeReading).
0

If you have a fediverse account, you can quote this note from your own instance. Search https://hachyderm.io/users/jedbrown/statuses/115987761725622053 on your instance and quote it. (Note that quoting is not supported in Mastodon.)