์šฐ์„  ์›๋ณธ ๋…ผ๋ฌธ์—์„œ๋Š” ๋ฌธ์ œ์˜ ์ •๋‹ต์„ 'moves = [[1, 0, 1], [2, 0, 2], [1, 1, 2], ...]'์˜ ํ˜•ํƒœ๋กœ ๋ชจ๋“  ์ด๋™์„ ํ•˜๋‚˜ํ•˜๋‚˜ ์ž‘์„ฑํ•˜๋„๋ก ํ”„๋กฌํ”„ํŠธ๋ฅผ ์งฐ๊ณ  ์ •๊ทœ์‹์„ ์จ์„œ ํ’€์ด ๊ณผ์ •์—์„œ ๋‹ต์•ˆ์„ ์ถ”์ถœํ–ˆ๋Š”๋ฐ, ๋ชจ๋ธ ์Šค์Šค๋กœ๊ฐ€ ํŒจํ„ด์„ ์ธ์‹ํ•˜๊ณ  ์•Œ์•„์„œ ๋Š๋Š” ๊ฒƒ[3]๊ณผ ๋ง๊ฐ€์ง€๋Š” ๊ฒƒ์„ ๊ตฌ๋ถ„ํ•˜์ง€ ๋ชปํ•œ๋‹ค๋Š” ์ง€์ ์ด ์žˆ์Šต๋‹ˆ๋‹ค.

The Illusion of the Illusion of Thinking ๋ฐœ์ทŒ:

2 Models Recognize Output Constraints

A critical observation overlooked in the original study: models actively recognize when they approach output limits. A recent replication by @scaling01 on Twitter [2] captured model outputs explicitly stating โ€The pattern continues, but to avoid making this too long, Iโ€™ll stop hereโ€ when solving Tower of Hanoi problems. This demonstrates that models understand the solution pattern but choose to truncate output due to practical constraints.

This mischaracterization of model behavior as โ€reasoning collapseโ€ reflects a broader issue with automated evaluation systems that fail to account for model awareness and decision-making. When evaluation frameworks cannot distinguish between โ€cannot solveโ€ and โ€choose not to enumerate exhaustively,โ€ they risk drawing incorrect conclusions about fundamental capabilities.
0

If you have a fediverse account, you can quote this note from your own instance. Search https://bsky.brid.gy/convert/ap/at://did:plc:ppk763j7o2wkinvzuqx4orrb/app.bsky.feed.post/3lrpqr27w4k24 on your instance and quote it. (Note that quoting is not supported in Mastodon.)