This study—from Anthropic, no less—is rather damning of the entire generative AI project. In code creation, the realm where it should shine, not only were the time gains marginal, but developers understood their code far, far less. And they didn't even have more fun doing the work!

But to me the most concerning part of this study is the fact that Anthropic could not get the control (non-AI) group to comply. Up to 35% of the "control" in the initial studies used AI tools despite instructions not to. What kind of behavior does that sound like?

UPDATE: See below for important counterpoints as to the validity of the study.

arxiv.org/html/2601.20245v2#S5

Figure 6 shows that while using AI to complete our coding task did not significantly improve task completion time, the level of skill formation gained by completing the task, measured by our quiz, is significantly reduced (Cohen d=0.738, p=0.01). There is a 4.15 point difference between the means of the treatment and control groups. For a 27-point quiz, this translates into a 17% score difference or 2 grade points. Controlling for warm-up task time as a covariate, the treatment effect remains significant (Cohen’s d=0.725, p=0.016).In exploratory data analysis (not pre-registered), the quiz score was decomposed into subareas and question types (Figure  8). Each question in the quiz belonged to exactly one task (e.g., Task 1 or Task 2) and exactly one question type (e.g., Conceptual, Debugging, or Code Reading). For both tasks, there is a gap between the quiz scores between the treatment and control groups. Among the different types of questions, the largest score gap occurs in the debugging questions and the smallest score gap in the code reading questions. This outcome is expected since treatment and control groups may have similar exposure to reading code through the task, but the control group with no access to AI assistance encountered more errors during the task and became more capable at debugging.Task Experience In further exploratory data analysis, we also find differences in the way participants’ experience of completing the study. The control group (No AI) reported higher self-reported learning (on a 7-point scale), while both groups reported high levels of enjoyment in completing the task (Figure 10). In terms of difficulty of the task, Figure 10 shows that although participants in the treatment group (AI Assistance) found the task easier than the control group, both groups found the post-task quiz similarly challenging.
0
0
0

If you have a fediverse account, you can quote this note from your own instance. Search https://infosec.exchange/users/mttaggart/statuses/116013560389787661 on your instance and quote it. (Note that quoting is not supported in Mastodon.)