λ…Όλ¬Έμ—μ„œ μ œμ‹œν•œ 4κ°€μ§€ 문제 쀑 ν•˜λ…Έμ΄μ˜ 탑은 μ›λž˜ 이동 νšŸμˆ˜κ°€ κΈ°ν•˜κΈ‰μˆ˜μ μœΌλ‘œ κΈΈμ–΄μ§€λŠ” 문제인데, * μ• μ΄ˆμ— 토큰 μ œν•œμ΄ μžˆμ–΄μ„œ λ‹΅μ•ˆμ΄ κ·Έ μ•ˆμ— λ‹€ 듀어가지도 μ•Šκ³  (원본 λ…Όλ¬Έμ—μ„œ 사고 과정이 λΆ•κ΄΄ν•œλ‹€κ³  μ§€λͺ©ν•˜λŠ” 지점이 토큰 μ œν•œκ³Ό μΌμΉ˜ν•¨) * μ œν•œ μ•ˆμ— λ“€μ–΄κ°„λ‹€κ³  해도 토큰 μˆ˜κ°€ λ„ˆλ¬΄ λ§Žμ•„μ„œ μž‘μ€ λ…Έμ΄μ¦ˆλ„ κΈ°ν•˜κΈ‰μˆ˜μ μœΌλ‘œ 영ν–₯을 끼칠 μˆ˜λ°–μ— μ—†λ‹€λŠ” 지적이 μžˆμŠ΅λ‹ˆλ‹€.

같은 λ…Όλ¬Έ 발췌:

4 Physical Token Limits Drive Apparent Collapse

Returning to the Tower of Hanoi analysis, we can quantify the relationship between problem size and token requirements. The authors' evaluation format requires outputting the full sequence of moves at each step, leading to quadratic token growth. If approximately 5 tokens are needed per move in the sequence:

T(N) β‰ˆ 5(2^N - 1)^2 + C

Given the token budgets allocated (64,000 for Claude-3.7-Sonnet and DeepSeek-R1, 100,000 for o3-mini), maximum solvable sizes are:

N_max β‰ˆ floor(log2(sqrt(L_max/5)))
β‰ˆ 7 - 8 (Claude-3.7, DeepSeek-R1), 8 (o3-mini)

The reported "collapse" beyond these sizes is consistent with these constraints.같은 λ…Όλ¬Έ 발췌:

2.1 Consequences of Rigid Evaluation

Such evaluation limitations can lead to other analytical errors. Consider the following statistical argument: if we grade Tower of Hanoi solutions character-by-character without allowing for error correction, the probability of perfect execution becomes:

P(all correct) = p^T

where p is per-token accuracy and T is total tokens. For T = 10,000 tokens:

p = 0.9999: P(success) < 37%
p = 0.999: P(success) < 0.005%

This type of "statistical inevitability" argument has in fact been put forward in the literature as a fundamental limitation of LLM scaling, yet it assumes models cannot recognize and adapt to their own limitations, an assumption contradicted by the evidence above.
0

If you have a fediverse account, you can quote this note from your own instance. Search https://bsky.brid.gy/convert/ap/at://did:plc:ppk763j7o2wkinvzuqx4orrb/app.bsky.feed.post/3lrpqr3ky2c24 on your instance and quote it. (Note that quoting is not supported in Mastodon.)