Top Mathematics discussions

NishMath - #openai

@the-decoder.com //
OpenAI has launched its latest AI models, o3 and o4-mini, marking a significant advancement in artificial intelligence. These models are designed with improved capabilities in reasoning and problem-solving, going beyond previous models by being able to manipulate and reason with images. This allows them to analyze visual inputs and use them to solve complex problems, opening up new possibilities for AI in various fields.

OpenAI are really emphasizing tool use with these new models. For the first time, these reasoning models can use and combine every tool within ChatGPT. This includes searching the web, analyzing uploaded files and other data with Python, reasoning deeply about visual inputs, and even generating images. Critically, o3 and o4-mini are trained to reason about when and how to use tools to produce detailed and thoughtful answers in the right output formats. This all happens typically in under a minute, to solve complex problems

The most striking feature of these models is their ability to "think with images," going beyond simply seeing them to manipulate and reason about them as part of the problem-solving process. This unlocks a new class of problem-solving that blends visual and textual reasoning. For example, o3 can analyze a physics poster from a decade-old internship, navigate its complex diagrams independently, and even identify that the final result wasn’t present in the poster itself.

Recommended read:
References :
  • Simon Willison's Weblog: OpenAI are really emphasizing tool use with these: For the first time, our reasoning models can agentically use and combine every tool within ChatGPT—this includes searching the web, analyzing uploaded files and other data with Python, reasoning deeply about visual inputs, and even generating images. Critically, these models are trained to reason about when and how to use tools to produce detailed and thoughtful answers in the right output formats, typically in under a minute, to solve more complex problems.
  • the-decoder.com: OpenAI’s new o3 and o4-mini models reason with images and tools
  • venturebeat.com: OpenAI launches o3 and o4-mini, AI models that ‘think with images’ and use tools autonomously
  • www.analyticsvidhya.com: o3 and o4-mini: OpenAI’s Most Advanced Reasoning Models
  • www.tomsguide.com: OpenAI's o3 and o4-mini models
  • Maginative: OpenAI’s latest models—o3 and o4-mini—introduce agentic reasoning, full tool integration, and multimodal thinking, setting a new bar for AI performance in both speed and sophistication.
  • THE DECODER: OpenAI’s new o3 and o4-mini models reason with images and tools
  • Analytics Vidhya: o3 and o4-mini: OpenAI’s Most Advanced Reasoning Models
  • www.zdnet.com: These new models are the first to independently use all ChatGPT tools.
  • The Tech Basic: OpenAI recently released its new AI models, o3 and o4-mini, to the public. Smart tools employ pictures to address problems through pictures, including sketch interpretation and photo restoration.
  • thetechbasic.com: OpenAI’s new AI Can “See†and Solve Problems with Pictures
  • www.marktechpost.com: OpenAI Introduces o3 and o4-mini: Progressing Towards Agentic AI with Enhanced Multimodal Reasoning
  • MarkTechPost: OpenAI Introduces o3 and o4-mini: Progressing Towards Agentic AI with Enhanced Multimodal Reasoning
  • analyticsindiamag.com: Access to o3 and o4-mini is rolling out today for ChatGPT Plus, Pro, and Team users.
  • THE DECODER: OpenAI is expanding its o-series with two new language models featuring improved tool usage and strong performance on complex tasks.
  • gHacks Technology News: OpenAI released its latest models, o3 and o4-mini, to enhance the performance and speed of ChatGPT in reasoning tasks.
  • www.ghacks.net: OpenAI Launches o3 and o4-Mini models to improve ChatGPT's reasoning abilities
  • Data Phoenix: OpenAI releases new reasoning models o3 and o4-mini amid intense competition
  • Shelly Palmer: OpenAI Quietly Reshapes the Landscape with o3 and o4-mini

@bdtechtalks.com //
Alibaba has recently launched Qwen-32B, a new reasoning model, which demonstrates performance levels on par with DeepMind's R1 model. This development signifies a notable achievement in the field of AI, particularly for smaller models. The Qwen team showcased that reinforcement learning on a strong base model can unlock reasoning capabilities for smaller models that enhances their performance to be on par with giant models.

Qwen-32B not only matches but also surpasses models like DeepSeek-R1 and OpenAI's o1-mini across key industry benchmarks, including AIME24, LiveBench, and BFCL. This is significant because Qwen-32B achieves this level of performance with only approximately 5% of the parameters used by DeepSeek-R1, resulting in lower inference costs without compromising on quality or capability. Groq is offering developers the ability to build FAST with Qwen QwQ 32B on GroqCloud™, running the 32B parameter model at ~400 T/s. This model is proving to be very competitive in reasoning benchmarks and is one of the top open source models being used.

The Qwen-32B model was explicitly designed for tool use and adapting its reasoning based on environmental feedback, which is a huge win for AI agents that need to reason, plan, and adapt based on context (outperforms R1 and o1-mini on the Berkeley Function Calling Leaderboard). With these capabilities, Qwen-32B shows that RL on a strong base model can unlock reasoning capabilities for smaller models that enhances their performance to be on par with giant models.

Recommended read:
References :
  • Last Week in AI: LWiAI Podcast #202 - Qwen-32B, Anthropic's $3.5 billion, LLM Cognitive Behaviors
  • Groq: A Guide to Reasoning with Qwen QwQ 32B
  • Last Week in AI: #202 - Qwen-32B, Anthropic's $3.5 billion, LLM Cognitive Behaviors
  • Sebastian Raschka, PhD: This article explores recent research advancements in reasoning-optimized LLMs, with a particular focus on inference-time compute scaling that have emerged since the release of DeepSeek R1.
  • Analytics Vidhya: China is rapidly advancing in AI, releasing models like DeepSeek and Qwen to rival global giants.
  • Last Week in AI: Alibaba’s New QwQ 32B Model is as Good as DeepSeek-R1
  • Maginative: Despite having far fewer parameters, Qwen’s new QwQ-32B model outperforms DeepSeek-R1 and OpenAI’s o1-mini in mathematical benchmarks and scientific reasoning, showcasing the power of reinforcement learning.

Soumyadeep Sarkar@The Tech Portal //
References: ai fray , Fello AI , Towards AI ...
OpenAI is reportedly preparing to launch specialized AI agents with premium pricing. These AI agents are designed to handle complex, high-value tasks for professionals and organizations, targeting corporations, high-level consultants, elite tech companies, and research-heavy institutions. Leaked pricing suggests a range from $2,000 per month for a "high-income knowledge worker agent" to $20,000 per month for a "PhD-level research agent".

These AI agents are not typical chatbots; they are custom-built for specialized tasks. For example, a knowledge worker agent could assist top-tier consultants with complex research, while a software developer agent could aid experienced programmers with coding. In related legal news, Musk's request for a preliminary injunction against OpenAI was denied in the Northern District of California, with the judge finding that Musk and his co-plaintiffs failed to meet their burden of proof on their antitrust allegations.

Recommended read:
References :
  • ai fray: Musk request for preliminary injunction against OpenAI denied in Northern District of California
  • Fello AI: OpenAI Is About To Launch a $20,000 AI Assistant That Replaces Entire Research Teams
  • THE DECODER: OpenAI reportedly prepares to launch AI agents with monthly fees up to $20,000
  • Towards AI: OpenAI is reportedly planning to introduce high-cost AI "agents" tailored for specialized tasks, with pricing reaching up to $20,000 per month. These agents are designed for various professional applications, from sales lead management to PhD-level research.
  • MarkTechPost: OpenAI's specialized AI agents aim to target various professions, such as sales lead management and PhD-level research.
  • The Tech Portal: OpenAI releases new APIs and tools for businesses to create AI agents
  • TestingCatalog: OpenAI released new tools and APIs for AI agent development
  • Gradient Flow: Deep Dive into OpenAI’s Agent Ecosystem
  • Developer Tech News: OpenAI launches tools to build AI agents faster
  • www.itpro.com: OpenAI wants to simplify how developers build AI agents

Michal Langmajer@Fello AI //
OpenAI has announced the release of GPT-4.5, its latest language model which they are calling their 'last non-chain-of-thought model.' According to OpenAI, GPT-4.5 offers substantial enhancements over its predecessors, particularly in advanced reasoning, problem-solving, and contextual understanding. Sam Altman, CEO of OpenAI, described it as the "first model that feels like talking to a thoughtful person," noting moments of astonishment at the quality of advice received from the AI.

However, the rollout is facing challenges due to GPU shortages. Altman stated they are "out of GPUs," leading to a staggered release, initially limited to ChatGPT Pro subscribers who pay $200 a month. While GPT-4.5 is available to developers across all paid API tiers, OpenAI plans to expand access to Plus and Team tiers next week, with tens of thousands of GPUs expected to arrive to alleviate the supply constraints. Despite not being a reasoning model, OpenAI estimates that GPT-4.5 is 30 times more expensive to run than GPT-4o.

Recommended read:
References :
  • Fello AI: OpenAI’s GPT‑4.5 Finally Arrived: Can It Beat Grok 3 and Claude 3.7?
  • Shelly Palmer: Shelly Palmer discusses the release of OpenAI's GPT-4.5.
  • Analytics Vidhya: Everything You Need to Know About OpenAI’s GPT-4.5
  • www.tomshardware.com: Tom's Hardware reports on Sam Altman's statement about GPU shortages delaying the GPT-4.5 release.
  • venturebeat.com: VentureBeat reports OpenAI releases GPT-4.5 claiming 10X efficiency over GPT-4, but says it’s ‘not a frontier model’
  • Gradient Flow: Scaling Up, Costs Up: GPT-4.5 and the Intensifying AI Competition
  • Pivot to AI: OpenAI releases GPT-4.5 with ridiculous prices for a mediocre model
  • Techstrong.ai: TechStrong.ai article on OpenAI's GPT-4.5 AI model.
  • THE DECODER: OpenAI has released GPT-4.5 as a "Research Preview".
  • eWEEK: OpenAI releases GPT-4.5, a “Warm” Generative AI Model, for Paid Plans and APIs
  • www.windowscentral.com: Sam Altman on GPT-4.5: Expensive, yet the closest thing to a thoughtful conversational partner we've seen
  • THE DECODER: OpenAI's largest model GPT-4.5 delivers on vibes instead of benchmarks
  • 9to5Mac: OpenAI announces GPT-4.5, ChatGPT’s largest and best model for chat
  • www.engadget.com: OpenAI's new GPT-4.5 model is a better, more natural conversationalist
  • The Verge: Anthropic’s new ‘hybrid reasoning’ AI model is its smartest yet
  • THE DECODER: OpenAI has presented its largest language model to date. According to Mark Chen, Chief Research Officer at OpenAI, GPT 4.5 shows that the scaling of AI models has not yet reached its limits.
  • The Verge: OpenAI is launching GPT-4.5 today, its newest and largest AI language model.
  • NextBigFuture.com: OpenAI GPT 4.5 Has BIG Coding Improvement – Claims Scaling Still Works – Expensive
  • TechCrunch: OpenAI unveils GPT-4.5 ‘Orion,’ its largest AI model yet
  • PCMag Middle East ai: Reporting that OpenAI has launched GPT-4.5 but is limiting it to priciest tiers due to GPU shortages.
  • Analytics Vidhya: Two days ago, on 27 Feb 2025, OpenAI dropped GPT-4.5, expectations were sky-high. But instead of a groundbreaking leap forward, we got a model prioritizing emotional intelligence over raw reasoning power.
  • AI News | VentureBeat: GPT-4.5 for enterprise: Do its accuracy and knowledge justify the cost?
  • Windows Report: OpenAI released GPT-4.5, but it’s not much of an upgrade from GPT-4o. After DeepSeek was unleashed into the world, everyone wondered what OpenAI would do now that another AI company had developed an extremely powerful model at a tiny fraction of the budget.
  • iHLS: OpenAI Unveils GPT-4.5
  • Data Phoenix: OpenAI releases the long-awaited GPT-4.5/Orion, its last non-chain-of-thought model
  • Towards AI: GPT-4.5: The Next Evolution in AI
  • Towards AI: Towards AI article on TAI #142: GPT-4.5 Released.
  • www.marketingaiinstitute.com: [The AI Show Episode 138]: Introducing GPT-4.5, Claude 3.7 Sonnet, Alexa+, Deep Research Now in ChatGPT Plus & How AI Is Disrupting Writing
  • Analytics Vidhya: Now, this is a shocker, despite a lot of backlash on the cost of GPT 4.5, it becomes #1 in the Chatbot Arena LLM Leaderboard! Securing over 3,200+ votes, OpenAI’s latest model has emerged as number one across all evaluation categories, prominently excelling in Style Control and Multi-Turn interactions.

@ncatlab.org //
References: nLab
Microsoft has announced a significant breakthrough in quantum computing with its new Majorana 1 chip. This groundbreaking processor is built upon a novel "Topological Core" architecture and boasts a theoretical capacity of up to one million qubits. The chip leverages a new material called topoconductor, the world’s first topological conductor, which harnesses topological superconductivity to control Majorana particles. This innovative approach promises more stable and reliable qubits, the fundamental building blocks of quantum computers. Microsoft also claims the chip could potentially break down microplastics into harmless byproducts or create self-healing materials for applications in construction, manufacturing, and healthcare.

Microsoft's Majorana 1 chip represents a paradigm shift in quantum computing technology, a development with far-reaching implications for industries and cybersecurity. By using topological qubits, Majorana 1 is designed to be inherently more stable and less prone to errors than current qubit technologies. While Microsoft touts this development as progress and hopes quantum computing will be used to benefit humanity, some experts warn of its potential use as a new tool that could break existing encryption methods. Despite these potential risks, Microsoft is dedicated to developing a scalable quantum computing prototype which solidifies their role at the forefront of quantum innovation.

Recommended read:
References :
  • nLab: quantum Fourier transform

iHLS News Desk@iHLS //
OpenAI launched its latest model, the o3-mini, last Friday. It is the first member of the o3 family of models. According to benchmarks, o3-mini displays comparable performance to model o1 while being roughly 15 times cheaper and about five times faster. There are two specialized variants of o3-mini: o3-mini-high, which takes more time to reason for more in-depth answers, and o3-mini-low, which prioritizes speed for quicker responses.

Along with o3-mini, OpenAI has also launched Deep Research, a feature that allows pro subscribers to do deep research. The o3-mini is a streamlined version of OpenAI’s most advanced AI model, o3, which focuses on efficiency and speed. It offers advanced reasoning capabilities, enabling it to break down complex problems and provide effective solutions.

Recommended read:
References :
  • composio.dev: OpenAI launched its latest model, the o3-mini, last Friday.
  • SingularityHub: OpenAI launched its latest model, the o3-mini, last Friday.
  • Composio: OpenAI launched its latest model, the o3-mini, last Friday. It is the first member of the o3 family of models.
  • Analytics Vidhya: 5 o3-mini Prompts to Try Out Today
  • Tao of Mac: This blog post contains notes on various topics, including OpenAI's o3-mini model.
  • AI GPT Journal: OpenAI's latest model, o3-mini, is a faster and more cost-effective reasoning model. It's also available through the OpenAI API.
  • shellypalmer.com: OpenAI's o3-mini model is faster and more cost-effective.
  • techcrunch.com: OpenAI now reveals more of its o3-mini models' thought process.
  • bdtechtalks.com: OpenAI reveals o3's reasoning process to bridge gap with DeepSeek-R1
  • bdtechtalks.com: OpenAI reveals o3’s reasoning process to bridge gap with DeepSeek-R1
  • futurism.com: In a statement posted on his arch-nemesis Elon Musk's social network, Altman wrote circuitously around the previously slated launch of o3, the latest awkwardly-named version of its frontier reasoning model that said to cost more than $1,000 worth of computing power per query. "In both ChatGPT and our API, we will release GPT-5 as a system that integrates a lot of our technology, including o3," the statement reads. OpenAI is not alone in this trend. After a period of intense excitement surrounding its GPT-3, OpenAI has increasingly had to contend with a flood of similarly-capable models, particularly from China, where the government has provided billions of dollars in research funds. While many researchers and investors thought OpenAI had a firm grip on the field, with GPT-4 and the planned release of o3, the company has faced challenges with DeepSeek, a Chinese AI model that appears to be as powerful as OpenAI's latest models but far less expensive to use.
  • Xbox Wire: OpenAI's o3 model has enhanced reasoning capabilities and provides detailed reports.
  • the-decoder.com: OpenAI's o3 model is generating increasingly sophisticated code, but safeguards against malicious exploitation are critical. The article discusses the need for further research and safety precautions to ensure responsible and ethical development of such advanced technology.

@www.theverge.com //
OpenAI has recently launched its o3-mini model, the first in their o3 family, showcasing advancements in both speed and reasoning capabilities. The model comes in two variants: o3-mini-high, which prioritizes in-depth reasoning, and o3-mini-low, designed for quicker responses. Benchmarks indicate that o3-mini offers comparable performance to its predecessor, o1, but at a significantly reduced cost, being approximately 15 times cheaper and five times faster. This is especially interesting because o3-mini is cheaper than GPT-4o, despite having a usage limit of 150 messages per hour compared to the unrestricted GPT-4o, showcasing its cost-effectiveness.

OpenAI is also now providing more detailed insights into the reasoning process of o3-mini, addressing criticism regarding transparency and competition from models like DeepSeek-R1. This includes revealing summarized versions of the chain of thought (CoT) used by the model, offering users greater clarity on its reasoning logic. OpenAI CEO Sam Altman believes that merging large language model scaling with reasoning capabilities could lead to "new scientific knowledge," hinting at future advancements beyond current limitations in inventing new algorithms or fields.

Recommended read:
References :
  • techcrunch.com: OpenAI on Friday launched a new AI "reasoning" model, o3-mini, the newest in the company's o family of reasoning models.
  • www.theverge.com: o3-mini should outperform o1 and provide faster, more accurate answers.
  • community.openai.com: Today we’re releasing the latest model in our reasoning series, OpenAI o3-mini, and you can start using it now in the API.
  • Techmeme: OpenAI launches o3-mini, its latest reasoning model that it says is largely on par with o1 and o1-mini in capability, but runs faster and costs less.
  • simonwillison.net: OpenAI's o3-mini costs $1.10 per 1M input tokens and $4.40 per 1M output tokens, cheaper than GPT-4o, which costs $2.50 and $10, and o1, which costs $15 and $60.
  • community.openai.com: This article discusses the release of OpenAI's o3-mini model and its capabilities, including its ability to search the web for data and return what it found.
  • futurism.com: This article discusses the release of OpenAI's o3-mini reasoning model, aiming to improve the performance of large language models (LLMs) by handling complex reasoning tasks. This new model is projected to be an advancement in both performance and cost efficiency.
  • the-decoder.com: This article discusses how OpenAI's o3-mini reasoning model is poised to advance scientific knowledge through the merging of LLM scaling and reasoning capabilities.
  • www.analyticsvidhya.com: This blog post highlights the development and use of OpenAI's reasoning model, focusing on its increased performance and cost-effectiveness compared to previous generations. The emphasis is on its use for handling complex reasoning tasks.
  • AI News | VentureBeat: OpenAI is now showing more details of the reasoning process of o3-mini, its latest reasoning model. The change was announced on OpenAI’s X account and comes as the AI lab is under increased pressure by DeepSeek-R1, a rival open model that fully displays its reasoning tokens.
  • Composio: This article discusses OpenAI's o3-mini model and its performance in reasoning tasks.
  • composio.dev: This article discusses OpenAI's release of the o3-mini model, highlighting its improved speed and efficiency in AI reasoning.
  • THE DECODER: Training larger and larger language models (LLMs) with more and more data hits a wall.
  • Analytics Vidhya: OpenAI’s o3- mini is not even a week old and it’s already a favorite amongst ChatGPT users.
  • slviki.org: OpenAI unveils o3-mini, a faster, more cost-effective reasoning model
  • singularityhub.com: This post talks about improvements in LLMs, focusing on the new o3-mini model from OpenAI.
  • computational-intelligence.blogspot.com: This blog post summarizes various AI-related news stories, including the launch of OpenAI's o3-mini model.
  • www.lemonde.fr: OpenAI's new o3-mini model is designed to be faster and more cost-effective than prior models.

@the-decoder.com //
OpenAI's o3 model is facing scrutiny after achieving record-breaking results on the FrontierMath benchmark, an AI math test developed by Epoch AI. It has emerged that OpenAI quietly funded the development of FrontierMath, and had prior access to the benchmark's datasets. The company's involvement was not disclosed until the announcement of o3's unprecedented performance, where it achieved a 25.2% accuracy rate, a significant jump from the 2% scores of previous models. This lack of transparency has drawn comparisons to the Theranos scandal, raising concerns about potential data manipulation and biased results. Epoch AI's associate director has admitted the lack of transparency was a mistake.

The controversy has sparked debate within the AI community, with questions being raised about the legitimacy of o3's performance. While OpenAI claims the data wasn't used for model training, concerns linger as six mathematicians who contributed to the benchmark said that they were not aware of OpenAI's involvement or the company having exclusive access. They also indicated that had they known, they might not have contributed to the project. Epoch AI has said that an "unseen-by-OpenAI hold-out set" was used to verify the model's capabilities. Now, Epoch AI is working on developing new hold-out questions to retest the o3 model's performance, ensuring OpenAI does not have prior access.

Recommended read:
References :
  • Analytics India Magazine: The company has had prior access to datasets of a benchmark the o3 model scored record results on. 
  • the-decoder.com: OpenAI's involvement in funding FrontierMath, a leading AI math benchmark, only came to light when the company announced its record-breaking performance on the test.
  • THE DECODER: OpenAI's involvement in funding FrontierMath, a leading AI math benchmark, only came to light when the company announced its record-breaking performance on the test. Now, the benchmark's developer Epoch AI acknowledges they should have been more transparent about the relationship.
  • LessWrong: Some lessons from the OpenAI-FrontierMath debacle
  • Pivot to AI: OpenAI o3 beats FrontierMath — because OpenAI funded the test and had access to the questions