OpenAI’s new o3 model has achieved a breakthrough performance on the ARC-AGI benchmark, demonstrating advanced reasoning capabilities through a ‘private chain of thought’ mechanism. The model searches over natural language programs to solve tasks, with a significant increase in compute leading to a substantial improvement in its score. This approach highlights the use of deep learning to guide program search, pushing the boundaries beyond simple next-token prediction. The o3 model’s ability to recombine knowledge at test time through program execution suggests a significant step towards more general AI capabilities.
OpenAI has released its new O3 model which demonstrates significantly improved performance in reasoning, coding, and mathematical problem-solving compared to its previous models. The O3 model achieves 75.7% on the ARC Prize Semi-Private Evaluation in low-compute mode and an impressive 87.5% in high-compute mode. However, this performance comes at a very high cost, with the top-end system costing around $10,000 per task which makes it very expensive to run.