11월 나온 모델만 가지고 regression 때리면 내년 1월 중순에 SWE-Bench 가 saturate되네요. 굉장히 optimistic 한 전망이지만 반대로 갑자기 exponential grokking이 발생할수도 있으니 무섭네요

Scatter plot showing four AI models' SWE-Bench Verified scores (76.2-80.9%) from Nov 12-24, 2025, with linear regression line projecting 100% achievement by January 15, 2026 at current 11.41%/month improvement rate.
1

If you have a fediverse account, you can reply to this note from your own instance. Search https://hackers.pub/ap/notes/019ab781-2454-70cd-ab19-cc8168da42da on your instance and reply to it.