์œ„์˜ ํ† ํฐ ๊ฐœ์ˆ˜ ๋ฌธ์ œ์˜ ์—ฐ์žฅ์„ ์œผ๋กœ, ๋ฌธ์ œ๋ฅผ ์ง์ ‘ ํ‘ธ๋Š” ๋Œ€์‹  ๋ฌธ์ œ๋ฅผ ํ‘ธ๋Š” ์ฝ”๋“œ๋ฅผ ์ถœ๋ ฅํ•˜๋ผ๊ณ  ํ•˜๋ฉด ์ •ํ™•๋„๊ฐ€ ์˜ฌ๋ผ๊ฐ„๋‹ค๋Š” ์ง€์ ๋„ ์žˆ์Šต๋‹ˆ๋‹ค. (์˜ˆ์‚ฐ ๋ฌธ์ œ ๋•Œ๋ฌธ์— ํ†ต๊ณ„์ ์œผ๋กœ ์œ ์˜ํ•œ ๋ฐ์ดํ„ฐ๋Š” ๋ฝ‘์ง€ ๋ชปํ–ˆ๋‹ค๊ณ  ํ•ฉ๋‹ˆ๋‹ค.) ๋‹ค๋งŒ ์›๋ณธ ๋…ผ๋ฌธ๊ณผ ๋‹ฌ๋ฆฌ ํ”„๋กฌํ”„ํŠธ์—์„œ 'ํ•˜๋…ธ์ด์˜ ํƒ‘'์„ ์ง์ ‘์ ์œผ๋กœ ์–ธ๊ธ‰ํ•œ ๊ฒƒ์ด ์˜ํ–ฅ์„ ๋ฏธ์ณค์„ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

๊ฐ™์€ ๋…ผ๋ฌธ ๋ฐœ์ทŒ:

5 Alternative Representations Restore Performance

To test whether the failures reflect reasoning limitations or format constraints, we conducted preliminary testing of the same models on Tower of Hanoi N=15 using a different representation:

Prompt: "Solve Tower of Hanoi with 15 disks. Output a Lua function that prints the solution when called."

Results: Very high accuracy across tested models (Claude-3.7-Sonnet, Claude Opus 4, OpenAI o3, Google Gemini 2.5), completing in under 5,000 tokens.

The generated solutions correctly implement the recursive algorithm, demonstrating intact reasoning capabilities when freed from exhaustive enumeration requirements.
0

If you have a fediverse account, you can quote this note from your own instance. Search https://bsky.brid.gy/convert/ap/at://did:plc:ppk763j7o2wkinvzuqx4orrb/app.bsky.feed.post/3lrpqr536zc24 on your instance and quote it. (Note that quoting is not supported in Mastodon.)