Top Mathematics discussions

NishMath - #aireasoning

@simonwillison.net //
Google has broadened access to its advanced AI model, Gemini 2.5 Pro, showcasing impressive capabilities and competitive pricing designed to challenge rival models like OpenAI's GPT-4o and Anthropic's Claude 3.7 Sonnet. Google's latest flagship model is currently recognized as a top performer, excelling in Optical Character Recognition (OCR), audio transcription, and long-context coding tasks. Alphabet CEO Sundar Pichai highlighted Gemini 2.5 Pro as Google's "most intelligent model + now our most in demand." Demand has increased by over 80 percent this month alone across both Google AI Studio and the Gemini API.

Google's expansion includes a tiered pricing structure for the Gemini 2.5 Pro API, offering a more affordable option compared to competitors. Prompts with less than 200,000 tokens are priced at $1.25 per million for input and $10 per million for output, while larger prompts increase to $2.50 and $15 per million tokens, respectively. Although prompt caching is not yet available, its future implementation could potentially lower costs further. The free tier allows 500 free grounding queries with Google Search per day, with an additional 1,500 free queries in the paid tier, with costs per 1,000 queries set at $35 beyond that.

The AI research group EpochAI reported that Gemini 2.5 Pro scored 84% on the GPQA Diamond benchmark, surpassing the typical 70% score of human experts. This benchmark assesses challenging multiple-choice questions in biology, chemistry, and physics, validating Google's benchmark results. The model is now available as a paid model, along with a free tier option. The free tier can use data to improve Google's products while the paid tier cannot. Rates vary by tier and range from 150-2,000/minute. Google will retire the Gemini 2.0 Pro preview entirely in favor of 2.5.

Recommended read:
References :
  • Data Phoenix: Google Unveils Gemini 2.5: Its Most Intelligent AI Model Yet
  • AI News | VentureBeat: Gemini 2.5 Pro is now available without limits and for cheaper than Claude, GPT-4o
  • Simon Willison's Weblog: Google's Gemini 2.5 Pro is currently the top model and, from , a superb model for OCR, audio transcription and long-context coding. You can now pay for it! The new gemini-2.5-pro-preview-03-25 model ID is priced like this: Prompts less than 200,00 tokens: $1.25/million tokens for input, $10/million for output Prompts more than 200,000 tokens (up to the 1,048,576 max): $2.50/million for input, $15/million for output This is priced at around the same level as Gemini 1.5 Pro ($1.25/$5 for input/output below 128,000 tokens, $2.50/$10 above 128,000 tokens), is cheaper than GPT-4o for shorter prompts ($2.50/$10) and is cheaper than Claude 3.7 Sonnet ($3/$15). Gemini 2.5 Pro is a reasoning model, and invisible reasoning tokens are included in the output token count. I just tried prompting "hi" and it charged me 2 tokens for input and 623 for output, of which 613 were "thinking" tokens. That still adds up to just 0.6232 cents (less than a cent) using my which I updated to support the new model just now. I released this morning adding support for the new model: llm install -U llm-gemini llm -m gemini-2.5-pro-preview-03-25 hi Note that the model continues to be available for free under the previous gemini-2.5-pro-exp-03-25 model ID: llm -m gemini-2.5-pro-exp-03-25 hi The free tier is "used to improve our products", the paid tier is not. Rate limits for the paid model - from 150/minute and 1,000/day for tier 1 (billing configured), 1,000/minute and 50,000/day for Tier 2 ($250 total spend) and 2,000/minute and unlimited/day for Tier 3 ($1,000 total spend). Meanwhile the free tier continues to limit you to 5 requests per minute and 25 per day. Google are entirely in favour of 2.5. Via Tags: , , , , , , ,
  • THE DECODER: Google has opened broader access to Gemini 2.5 Pro, its latest AI flagship model, which demonstrates impressive performance in scientific testing while introducing competitive pricing.
  • Bernard Marr: Google's latest AI model, Gemini 2.5 Pro, is poised to streamline complex mathematical and coding operations.
  • The Cognitive Revolution: In this illuminating episode of The Cognitive Revolution, host Nathan Labenz speaks with Jack Rae, principal research scientist at Google DeepMind and technical lead on Google's thinking and inference time scaling work.
  • bsky.app: Gemini 2. 5 Pro pricing was announced today - it's cheaper than both GPT-4o and Claude 3.7 Sonnet I've updated my llm-gemini plugin to add support for the new paid model Full notes here:
  • Last Week in AI: Google unveils a next-gen AI reasoning model, OpenAI rolls out image generation powered by GPT-4o to ChatGPT, Tencent’s Hunyuan T1 AI reasoning model rivals DeepSeek in performance and price

Maximilian Schreiner@THE DECODER //
Google's Gemini 2.5 Pro is making waves as a top-tier reasoning model, marking a leap forward in Google's AI capabilities. Released recently, it's already garnering attention from enterprise technical decision-makers, especially those who have traditionally relied on OpenAI or Claude for production-grade reasoning. Early experiments, benchmark data, and developer reactions suggest Gemini 2.5 Pro is worth serious consideration.

Gemini 2.5 Pro distinguishes itself with its transparent, structured reasoning. Google's step-by-step training approach results in a structured chain of thought that provides clarity. The model presents ideas in numbered steps, with sub-bullets and internal logic that's remarkably coherent and transparent. This breakthrough offers greater trust and steerability, enabling enterprise users to validate, correct, or redirect the model with more confidence when evaluating output for critical tasks.

Recommended read:
References :
  • AI News | VentureBeat: Google’s Gemini 2.5 Pro is the smartest model you’re not using — and 4 reasons it matters for enterprise AI
  • Composio: Gemini 2.5 Pro vs. Claude 3.7 Sonnet (thinking) vs. Grok 3 (think)
  • thezvi.wordpress.com: Gemini 2.5 is the New SoTA
  • www.infoworld.com: Google introduces Gemini 2.5 reasoning models
  • Composio: Gemini 2. 5 Pro vs. Claude 3.7 Sonnet: Coding Comparison
  • Analytics India Magazine: Gemini 2.5 is better than the Claude 3.7 Sonnet for coding in the Aider Polyglot leaderboard.
  • www.tomsguide.com: Surprise move comes just days after Gemini 2.5 Pro Experimental arrived for Advanced subscribers.

Ben Lorica@Gradient Flow //
Recent advancements in AI reasoning models are demonstrating step-by-step reasoning, self-correction, and multi-step decision-making, opening up application areas that require logical inference and strategic planning. These new models are rapidly evolving, driven by decreasing training costs which enables faster iteration and higher performance. This is also facilitated by the advent of techniques such as model compression, quantization, and distillation, making it possible to run sophisticated models on less powerful hardware.

The competitive landscape is becoming more global, and teams from countries like China are rapidly closing the performance gap, offering diverse approaches and fostering healthy competition. Open AI's deep research feature enables models to generate detailed reports after prolonged inference periods, which directly competes with Google's Gemini 2.0. It appears that one can spend large amounts of money and get continuous and predictable gains with AI models.

Recommended read:
References :
  • Sam Altman: This article provides insights into the reasoning capabilities of the AI models and their impact on the overall technology landscape.
  • THE DECODER: This article discusses OpenAI's new reasoning models, emphasizing direct instruction over complex prompts.
  • the-decoder.com: OpenAI has published guidelines for effective use of its o-series models, emphasizing direct instruction over complex prompting techniques.
  • www.analyticsvidhya.com: OpenAI’s o1 and o3-mini are advanced reasoning models that differ from the base GPT-4 (often referred to as GPT-4o) in how they process prompts and produce answers.

@www.theverge.com //
OpenAI has recently launched its o3-mini model, the first in their o3 family, showcasing advancements in both speed and reasoning capabilities. The model comes in two variants: o3-mini-high, which prioritizes in-depth reasoning, and o3-mini-low, designed for quicker responses. Benchmarks indicate that o3-mini offers comparable performance to its predecessor, o1, but at a significantly reduced cost, being approximately 15 times cheaper and five times faster. This is especially interesting because o3-mini is cheaper than GPT-4o, despite having a usage limit of 150 messages per hour compared to the unrestricted GPT-4o, showcasing its cost-effectiveness.

OpenAI is also now providing more detailed insights into the reasoning process of o3-mini, addressing criticism regarding transparency and competition from models like DeepSeek-R1. This includes revealing summarized versions of the chain of thought (CoT) used by the model, offering users greater clarity on its reasoning logic. OpenAI CEO Sam Altman believes that merging large language model scaling with reasoning capabilities could lead to "new scientific knowledge," hinting at future advancements beyond current limitations in inventing new algorithms or fields.

Recommended read:
References :
  • techcrunch.com: OpenAI on Friday launched a new AI "reasoning" model, o3-mini, the newest in the company's o family of reasoning models.
  • www.theverge.com: o3-mini should outperform o1 and provide faster, more accurate answers.
  • community.openai.com: Today we’re releasing the latest model in our reasoning series, OpenAI o3-mini, and you can start using it now in the API.
  • Techmeme: OpenAI launches o3-mini, its latest reasoning model that it says is largely on par with o1 and o1-mini in capability, but runs faster and costs less.
  • simonwillison.net: OpenAI's o3-mini costs $1.10 per 1M input tokens and $4.40 per 1M output tokens, cheaper than GPT-4o, which costs $2.50 and $10, and o1, which costs $15 and $60.
  • community.openai.com: This article discusses the release of OpenAI's o3-mini model and its capabilities, including its ability to search the web for data and return what it found.
  • futurism.com: This article discusses the release of OpenAI's o3-mini reasoning model, aiming to improve the performance of large language models (LLMs) by handling complex reasoning tasks. This new model is projected to be an advancement in both performance and cost efficiency.
  • the-decoder.com: This article discusses how OpenAI's o3-mini reasoning model is poised to advance scientific knowledge through the merging of LLM scaling and reasoning capabilities.
  • www.analyticsvidhya.com: This blog post highlights the development and use of OpenAI's reasoning model, focusing on its increased performance and cost-effectiveness compared to previous generations. The emphasis is on its use for handling complex reasoning tasks.
  • AI News | VentureBeat: OpenAI is now showing more details of the reasoning process of o3-mini, its latest reasoning model. The change was announced on OpenAI’s X account and comes as the AI lab is under increased pressure by DeepSeek-R1, a rival open model that fully displays its reasoning tokens.
  • Composio: This article discusses OpenAI's o3-mini model and its performance in reasoning tasks.
  • composio.dev: This article discusses OpenAI's release of the o3-mini model, highlighting its improved speed and efficiency in AI reasoning.
  • THE DECODER: Training larger and larger language models (LLMs) with more and more data hits a wall.
  • Analytics Vidhya: OpenAI’s o3- mini is not even a week old and it’s already a favorite amongst ChatGPT users.
  • slviki.org: OpenAI unveils o3-mini, a faster, more cost-effective reasoning model
  • singularityhub.com: This post talks about improvements in LLMs, focusing on the new o3-mini model from OpenAI.
  • computational-intelligence.blogspot.com: This blog post summarizes various AI-related news stories, including the launch of OpenAI's o3-mini model.
  • www.lemonde.fr: OpenAI's new o3-mini model is designed to be faster and more cost-effective than prior models.