Apple Research Questioning LLMs Ability in Reasoning

@www.marktechpost.com //

Apple Research Questioning LLMs Ability in Reasoning

Apple researchers are challenging the perceived reasoning capabilities of Large Reasoning Models (LRMs), sparking debate within the AI community. A recent paper from Apple, titled "The Illusion of Thinking," suggests that these models, which generate intermediate thinking steps like Chain-of-Thought reasoning, struggle with fundamental reasoning tasks. The research indicates that current evaluation methods relying on math and code benchmarks are insufficient, as they often suffer from data contamination and fail to assess the structure or quality of the reasoning process.

To address these shortcomings, Apple researchers introduced controllable puzzle environments, including the Tower of Hanoi, River Crossing, Checker Jumping, and Blocks World, allowing for precise manipulation of problem complexity. These puzzles require diverse reasoning abilities, such as constraint satisfaction and sequential planning, and are free from data contamination. The Apple paper concluded that state-of-the-art LRMs ultimately fail to develop generalizable problem-solving capabilities, with accuracy collapsing to zero beyond certain complexities across different environments.

However, the Apple research has faced criticism. Experts, like Professor Seok Joon Kwon, argue that Apple's lack of high-performance hardware, such as a large GPU-based cluster comparable to those operated by Google or Microsoft, could be a factor in their findings. Some argue that the models perform better on familiar puzzles, suggesting that their success may be linked to training exposure rather than genuine problem-solving skills. Others, such as Alex Lawsen and "C. Opus," argue that the Apple researchers' results don't support claims about fundamental reasoning limitations, but rather highlight engineering challenges related to token limits and evaluation methods.

References :

TheSequence: The Sequence Research #663: The Illusion of Thinking, Inside the Most Controversial AI Paper of Recent Weeks
chatgptiseatingtheworld.com: Research: Did Apple researchers overstate â€œThe Illusion of Thinkingâ€ in reasoning models. Opus, Lawsen think so.
www.marktechpost.com: Apple Researchers Reveal Structural Failures in Large Reasoning Models Using Puzzle-Based Evaluation
arstechnica.com: New Apple study challenges whether AI models truly â€œreasonâ€ through problems
9to5Mac: New paper pushes back on Apple’s LLM ‘reasoning collapse’ study

Classification:

HashTags: #AI #ReasoningModels #AppleResearch
Company: Universities
Target: AI Industry
Attacker: AI
Product: Large Language Models
Feature: Large reasoning models
Type: Research
Severity: Informative

Top Mathematics discussions

NishMath

Apple Research Questioning LLMs Ability in Reasoning

Classification: