This is a great summary by Rohit Kumar Thakur about the Apple paper “The Illusion of Thinking”
The researchers asked LLM and LRM to solve well known problems like the Tower of Hanoi with a setup that the models very likely never encountered during their training (e.g. with 10 disks instead of 7) and, unsurprisingly, the models failed miserably.
If you don’t want to subscribe to Medium there is an archived copy that you can read as well: https://archive.ph/ASo9a