Top Mathematics discussions

NishMath

@Techmeme - 46d
OpenAI's new o3 model has achieved a significant breakthrough on the ARC-AGI benchmark, demonstrating advanced reasoning capabilities through a 'private chain of thought' mechanism. This approach involves the model searching over natural language programs to solve tasks, with a substantial increase in compute leading to a vastly improved score of 75.7% on the Semi-Private Evaluation set within a $10k compute limit, and 87.5% in a high-compute configuration. The o3 model uses deep learning to guide program search, moving beyond basic next-token prediction. Its ability to recombine knowledge at test time through program execution marks a major step toward more general AI capabilities.

The o3 model's architecture and performance represents a form of deep learning-guided program search, where it explores many paths through program space. This process, which can involve tens of millions of tokens and cost thousands of dollars for a single task, is guided by a base LLM. While o3 appears to be more than just next-token prediction, it’s still being speculated what the core mechanisms of this process are. This breakthrough highlights how increases in compute can drastically improve performance and marks a substantial leap in AI capabilities, moving far beyond previous GPT model performance. The model's development and testing also revealed that it cost around $6,677 to run o3 in "high efficiency" mode against the 400 public ARC-AGI puzzles for a score of 82.8%.

Share: bluesky twitterx--v2 facebook--v1 threads


References :
  • arcprize.org: OpenAI's new o3 system - trained on the ARC-AGI-1 Public Training set - has scored a breakthrough 75.7% on the Semi-Private Evaluation set at our stated public leaderboard $10k compute limit.
  • Simon Willison's Weblog: OpenAI o3 breakthrough high score on ARC-AGI-PUB
  • Techmeme: Techmeme report about O3 model.
  • TechCrunch: TechCrunch reporting on OpenAI's unveiling of o3 and o3-mini with advanced reasoning capabilities.
  • Ars Technica - All content: OpenAI announces o3 and o3-mini, its next simulated reasoning models
  • THE DECODER: OpenAI unveils o3, its most advanced reasoning model yet. A cost-effective mini version is set to launch in late January 2025, followed by the full version.
  • www.heise.de: OpenAI's new o3 model aims to outperform humans in reasoning benchmarks
  • NextBigFuture.com: OpenAI Releases O3 Model With High Performance and High Cost
  • www.techmeme.com: Techmeme post about OpenAI o3 model
  • @julianharris.bsky.social - Julian Harris: OpenAI announced o3 that is significantly better than previous systems, according to an independent benchmark org (The Arc Prize) that apparently got access. Only thing is it’s wildly wildly expensive to run. Like its top end system is around $10k per TASK.
  • shellypalmer.com: OpenAI’s o3: Progress Toward AGI or Just More Hype?
  • pub.towardsai.net: OpenAI’s O3: Pushing the Boundaries of Reasoning with Breakthrough Performance and Cost Efficiency Image Source: The world of AI continues to evolve at an astonishing pace, and OpenAI’s latest announcement has left the community buzzing with excitement.
  • www.marktechpost.com: OpenAI Announces OpenAI o3: A Measured Advancement in AI Reasoning with 87.5% Score on Arc AGI Benchmarks
  • www.rdworldonline.com: Just how big of a deal is OpenAI’s o3 model anyway?
  • NextBigFuture.com: OpenAI O3 Crushes Benchmark Tests But is it Intelligence ?
  • Analytics India Magazine: OpenAI soft-launches AGI with o3 models, Enters Next Phase of AI
  • OODAloop: OpenAI’s o3 shows remarkable progress on ARC-AGI, sparking debate on AI reasoning
  • pub.towardsai.net: TAI 131: OpenAI’s o3 Passes Human Experts; LLMs Accelerating With Inference Compute Scaling
Classification: