Matthias Bastian@THE DECODER
//
Microsoft has launched three new additions to its Phi series of compact language models: Phi-4-reasoning, Phi-4-reasoning-plus, and Phi-4-mini-reasoning. These models are designed to excel in complex reasoning tasks, including mathematical problem-solving, algorithmic planning, and coding, demonstrating that smaller AI models can achieve significant performance. The models are optimized to handle complex problems through structured reasoning and internal reflection, while also being efficient enough to run on lower-end hardware, including mobile devices, making advanced AI accessible on resource-limited devices.
Phi-4-reasoning, a 14-billion parameter model, was trained using supervised fine-tuning with reasoning paths from OpenAI's o3-mini. Phi-4-reasoning-plus enhances this with reinforcement learning and processes more tokens, leading to higher accuracy, although with increased computational cost. Notably, these models outperform larger systems, such as the 70B parameter DeepSeek-R1-Distill-Llama, and even surpass DeepSeek-R1 with 671 billion parameters on the AIME-2025 benchmark, a qualifier for the U.S. Mathematical Olympiad, highlighting the effectiveness of Microsoft's approach to efficient, high-performing AI. The Phi-4 reasoning models show strong results in programming, algorithmic problem-solving, and planning tasks, with improvements in logical reasoning positively impacting general capabilities such as following prompts and answering questions based on long-form content. Microsoft employed a data-centric training strategy, using structured reasoning outputs marked with special tokens to guide the model's intermediate reasoning steps. The open-weight models have been released with transparent training details and are hosted on Hugging Face, allowing for public access, fine-tuning, and use in various applications under a permissive MIT license. Recommended read:
References :
Carl Franzen@AI News | VentureBeat
//
Microsoft has announced the release of Phi-4-reasoning-plus, a new small, open-weight language model designed for advanced reasoning tasks. Building upon the architecture of the previously released Phi-4, this 14-billion parameter model integrates supervised fine-tuning and reinforcement learning to achieve strong performance on complex problems. According to Microsoft, the Phi-4 reasoning models outperform larger language models on several demanding benchmarks, despite their compact size. This new model pushes the limits of small AI, demonstrating that carefully curated data and training techniques can lead to impressive reasoning capabilities.
The Phi-4 reasoning family, consisting of Phi-4-reasoning, Phi-4-reasoning-plus, and Phi-4-mini-reasoning, is specifically trained to handle complex reasoning tasks in mathematics, scientific domains, and software-related problem solving. Phi-4-reasoning-plus, in particular, extends supervised fine-tuning with outcome-based reinforcement learning, which is targeted for improved performance in high-variance tasks such as competition-level mathematics. All models are designed to enable reasoning capabilities, especially on lower-performance hardware such as mobile devices. Microsoft CEO Satya Nadella revealed that AI is now contributing to 30% of Microsoft's code. The open weight models were released with transparent training details and evaluation logs, including benchmark design, and are hosted on Hugging Face for reproducibility and public access. The model has been released under a permissive MIT license, enabling its use for broad commercial and enterprise applications, and fine-tuning or distillation, without restriction. Recommended read:
References :
Adam Zewe@news.mit.edu
//
MIT researchers have unveiled a "periodic table of machine learning," a groundbreaking framework that organizes over 20 common machine-learning algorithms based on a unifying algorithm. This innovative approach allows scientists to combine elements from different methods, potentially leading to improved algorithms or the creation of entirely new ones. The researchers believe this framework will significantly fuel further AI discovery and innovation by providing a structured approach to understanding and developing machine learning techniques.
The core concept behind this "periodic table" is that all these algorithms, while seemingly different, learn a specific kind of relationship between data points. Although the way each algorithm accomplishes this may vary, the fundamental mathematics underlying each approach remains consistent. By identifying a unifying equation, the researchers were able to reframe popular methods and arrange them into a table, categorizing each based on the relationships it learns. Shaden Alshammari, an MIT graduate student and lead author of the related paper, emphasizes that this is not just a metaphor, but a structured system for exploring machine learning. Just like the periodic table of chemical elements, this new framework contains empty spaces, representing algorithms that should exist but haven't been discovered yet. These spaces act as predictions, guiding researchers toward unexplored areas within machine learning. To illustrate the framework's potential, the researchers combined elements from two different algorithms, resulting in a new image-classification algorithm that outperformed current state-of-the-art approaches by 8 percent. The researchers hope that this "periodic table" will serve as a toolkit, allowing researchers to design new algorithms without needing to rediscover ideas from prior approaches. Recommended read:
References :
@arstechnica.com
//
References:
arstechnica.com
, www.techrepublic.com
,
Microsoft researchers have achieved a significant breakthrough in AI efficiency with the development of a 1-bit large language model (LLM) called BitNet b1.58 2B4T. This model, boasting two billion parameters and trained on four trillion tokens, stands out due to its remarkably low memory footprint and energy consumption. Unlike traditional AI models that rely on 16- or 32-bit floating-point formats for storing numerical weights, BitNet utilizes only three distinct weight values: -1, 0, and +1. This "ternary" architecture dramatically reduces complexity, enabling the AI to run efficiently on a standard CPU, even an Apple M2 chip, according to TechCrunch.
The development of BitNet b1.58 2B4T represents a significant advancement in the field of AI, potentially paving the way for more accessible and sustainable AI applications. This 1-bit model, available on Hugging Face, uses a novel approach of representing each weight with a single bit. While this simplification can lead to a slight reduction in accuracy compared to larger, more complex models, BitNet b1.58 2B4T compensates through its massive training dataset, comprising over 33 million books. The reduction in memory usage is substantial, with the model requiring only 400MB of non-embedded memory, significantly less than comparable models. Comparisons against leading mainstream models like Meta’s LLaMa 3.2 1B, Google’s Gemma 3 1B, and Alibaba’s Qwen 2.5 1.5B have shown that BitNet b1.58 2B4T performs competitively across various benchmarks. In some instances, it has even outperformed these models. However, to achieve optimal performance and efficiency, the LLM must be used with the bitnet.cpp inference framework. This highlights a current limitation as the model does not run on GPU and requires a proprietary framework. Despite this, the creation of such a lightweight and efficient LLM marks a crucial step toward future AI that may not necessarily require supercomputers. Recommended read:
References :
@www.analyticsvidhya.com
//
OpenAI has recently launched its o3 and o4-mini models, marking a shift towards AI agents with enhanced tool-use capabilities. These models are specifically designed to excel in areas such as web search, code interpretation, and memory utilization, leveraging reinforcement learning to optimize their performance. The focus is on creating AI that can intelligently use tools in a loop, behaving more like a streamlined and rapid-response system for complex tasks. The development underscores a growing industry trend of major AI labs delivering inference-optimized models ready for immediate deployment.
The o3 model stands out for its ability to provide quick answers, often within 30 seconds to three minutes, a significant improvement over the longer response times of previous models. This speed is coupled with integrated tool use, making it suitable for real-world applications requiring quick, actionable insights. Another key advantage of o3 is its capability to manipulate image inputs using code, allowing it to identify key features by cropping and zooming, which has been demonstrated in tasks such as the "GeoGuessr" game. While o3 demonstrates strengths across various benchmarks, tests have also shown variances in performance compared to other models like Gemini 2.5 and even its smaller counterpart, o4-mini. While o3 leads on most benchmarks and set a new state-of-the-art with 79.60% on the Aider polyglot coding benchmark, the costs are much higher. However, when used as a planner and GPT-4.1, the pair scored a new SOTA with 83% at 65% of the cost, though still expensive. One analysis notes the importance of context awareness when iterating on code, which Gemini 2.5 seems to handle better than o3 and o4-mini. Overall, the models represent OpenAI's continued push towards more efficient and agentic AI systems. Recommended read:
References :
Chris McKay@Maginative
//
OpenAI has released its latest AI models, o3 and o4-mini, designed to enhance reasoning and tool use within ChatGPT. These models aim to provide users with smarter and faster AI experiences by leveraging web search, Python programming, visual analysis, and image generation. The models are designed to solve complex problems and perform tasks more efficiently, positioning OpenAI competitively in the rapidly evolving AI landscape. Greg Brockman from OpenAI noted the models "feel incredibly smart" and have the potential to positively impact daily life and solve challenging problems.
The o3 model stands out due to its ability to use tools independently, which enables more practical applications. The model determines when and how to utilize tools such as web search, file analysis, and image generation, thus reducing the need for users to specify tool usage with each query. The o3 model sets new standards for reasoning, particularly in coding, mathematics, and visual perception, and has achieved state-of-the-art performance on several competition benchmarks. The model excels in programming, business, consulting, and creative ideation. Usage limits for these models vary, with o3 at 50 queries per week, and o4-mini at 150 queries per day, and o4-mini-high at 50 queries per day for Plus users, alongside 10 Deep Research queries per month. The o3 model is available to ChatGPT Pro and Team subscribers, while the o4-mini models are used across ChatGPT Plus. OpenAI says o3 is also beneficial in generating and critically evaluating novel hypotheses, especially in biology, mathematics, and engineering contexts. Recommended read:
References :
@www.quantamagazine.org
//
References:
finance.yahoo.com
, Quanta Magazine
,
Researchers are exploring innovative methods to enhance the performance of artificial intelligence language models by minimizing their reliance on direct language processing. This approach involves enabling models to operate more within mathematical or "latent" spaces, reducing the need for constant translation between numerical representations and human language. Studies suggest that processing information directly in these spaces can improve efficiency and reasoning capabilities, as language can sometimes constrain and diminish the information retained by the model. By sidestepping the traditional language-bound processes, AI systems may achieve better results by "thinking" independently of linguistic structures.
Meta has announced plans to resume training its AI models using publicly available content from European users. This move aims to improve the capabilities of Meta's AI systems by leveraging a vast dataset of user-generated information. The decision comes after a period of suspension prompted by concerns regarding data privacy, which were raised by activist groups. Meta is emphasizing that the training will utilize public posts and comments shared by adult users within the European Union, as well as user interactions with Meta AI, such as questions and queries, to enhance model accuracy and overall performance. A new method has been developed to efficiently safeguard sensitive data used in AI model training, reducing the traditional tradeoff between privacy and accuracy. This innovative framework maintains an AI model's performance while preventing attackers from extracting confidential information, such as medical images or financial records. By focusing on the stability of algorithms and utilizing a metric called PAC Privacy, researchers have shown that it's possible to privatize almost any algorithm without needing access to its internal workings, potentially making privacy more accessible and less computationally expensive in real-world applications. Recommended read:
References :
Megan Crouse@techrepublic.com
//
References:
hlfshell
, www.techrepublic.com
Researchers from DeepSeek and Tsinghua University have recently made significant advancements in AI reasoning capabilities. By combining Reinforcement Learning with a self-reflection mechanism, they have created AI models that can achieve a deeper understanding of problems and solutions without needing external supervision. This innovative approach is setting new standards for AI development, enabling models to reason, self-correct, and explore alternative solutions more effectively. The advancements showcase that outstanding performance and efficiency don’t require secrecy.
Researchers have implemented the Chain-of-Action-Thought (COAT) approach in these enhanced AI models. This method leverages special tokens such as "continue," "reflect," and "explore" to guide the model through distinct reasoning actions. This allows the AI to navigate complex reasoning tasks in a more structured and efficient manner. The models are trained in a two-stage process. DeepSeek has also released papers expanding on reinforcement learning for LLM alignment. Building off prior work, they introduce Rejective Fine-Tuning (RFT) and Self-Principled Critique Tuning (SPCT). The first method, RFT, has a pre-trained model produce multiple responses and then evaluates and assigns reward scores to each response based on generated principles, helping the model refine its output. The second method, SPCT, uses reinforcement learning to improve the model’s ability to generate critiques and principles without human intervention, creating a feedback loop where the model learns to self-evaluate and improve its reasoning capabilities. Recommended read:
References :
Ryan Daws@AI News
//
Anthropic has unveiled groundbreaking insights into the 'AI biology' of their advanced language model, Claude. Through innovative methods, researchers have been able to peer into the complex inner workings of the AI, demystifying how it processes information and learns strategies. This research provides a detailed look at how Claude "thinks," revealing sophisticated behaviors previously unseen, and showing these models are more sophisticated than previously understood.
These new methods allowed scientists to discover that Claude plans ahead when writing poetry and sometimes lies, showing the AI is more complex than previously thought. The new interpretability techniques, which the company dubs “circuit tracing” and “attribution graphs,” allow researchers to map out the specific pathways of neuron-like features that activate when models perform tasks. This approach borrows concepts from neuroscience, viewing AI models as analogous to biological systems. This research, published in two papers, marks a significant advancement in AI interpretability, drawing inspiration from neuroscience techniques used to study biological brains. Joshua Batson, a researcher at Anthropic, highlighted the importance of understanding how these AI systems develop their capabilities, emphasizing that these techniques allow them to learn many things they “wouldn’t have guessed going in.” The findings have implications for ensuring the reliability, safety, and trustworthiness of increasingly powerful AI technologies. Recommended read:
References :
Maximilian Schreiner@THE DECODER
//
Google's Gemini 2.5 Pro is making waves as a top-tier reasoning model, marking a leap forward in Google's AI capabilities. Released recently, it's already garnering attention from enterprise technical decision-makers, especially those who have traditionally relied on OpenAI or Claude for production-grade reasoning. Early experiments, benchmark data, and developer reactions suggest Gemini 2.5 Pro is worth serious consideration.
Gemini 2.5 Pro distinguishes itself with its transparent, structured reasoning. Google's step-by-step training approach results in a structured chain of thought that provides clarity. The model presents ideas in numbered steps, with sub-bullets and internal logic that's remarkably coherent and transparent. This breakthrough offers greater trust and steerability, enabling enterprise users to validate, correct, or redirect the model with more confidence when evaluating output for critical tasks. Recommended read:
References :
Matt Marshall@AI News | VentureBeat
//
References:
Microsoft Security Blog
, www.zdnet.com
Microsoft is enhancing its Copilot Studio platform with AI-driven improvements, introducing deep reasoning capabilities that enable agents to tackle intricate problems through methodical thinking and combining AI flexibility with deterministic business process automation. The company has also unveiled specialized deep reasoning agents for Microsoft 365 Copilot, named Researcher and Analyst, to help users achieve tasks more efficiently. These agents are designed to function like personal data scientists, processing diverse data sources and generating insights through code execution and visualization.
Microsoft's focus includes securing AI and using it to bolster security measures, as demonstrated by the upcoming Microsoft Security Copilot agents and new security features. Microsoft aims to provide an AI-first, end-to-end security platform that helps organizations secure their future, one example being the AI agents designed to autonomously assist with phishing, data security, and identity management. The Security Copilot tool will automate routine tasks, allowing IT and security staff to focus on more complex issues, aiding in defense against cyberattacks. Recommended read:
References :
Maximilian Schreiner@THE DECODER
//
Google DeepMind has announced Gemini 2.5 Pro, its latest and most advanced AI model to date. This new model boasts enhanced reasoning capabilities and improved accuracy, marking a significant step forward in AI development. Gemini 2.5 Pro is designed with built-in 'thinking' capabilities, enabling it to break down complex tasks into multiple steps and analyze information more effectively before generating a response. This allows the AI to deduce logical conclusions, incorporate contextual nuances, and make informed decisions with unprecedented accuracy, according to Google.
The Gemini 2.5 Pro has already secured the top position on the LMArena leaderboard, surpassing other AI models in head-to-head comparisons. This achievement highlights its superior performance and high-quality style in handling intricate tasks. The model also leads in math and science benchmarks, demonstrating its advanced reasoning capabilities across various domains. This new model is available as Gemini 2.5 Pro (experimental) on Google’s AI Studio and for Gemini Advanced users on the Gemini chat interface. Recommended read:
References :
Maximilian Schreiner@THE DECODER
//
Google has unveiled Gemini 2.5 Pro, its latest and "most intelligent" AI model to date, showcasing significant advancements in reasoning, coding proficiency, and multimodal functionalities. According to Google, these improvements come from combining a significantly enhanced base model with improved post-training techniques. The model is designed to analyze complex information, incorporate contextual nuances, and draw logical conclusions with unprecedented accuracy. Gemini 2.5 Pro is now available for Gemini Advanced users and on Google's AI Studio.
Google emphasizes the model's "thinking" capabilities, achieved through chain-of-thought reasoning, which allows it to break down complex tasks into multiple steps and reason through them before responding. This new model can handle multimodal input from text, audio, images, videos, and large datasets. Additionally, Gemini 2.5 Pro exhibits strong performance in coding tasks, surpassing Gemini 2.0 in specific benchmarks and excelling at creating visually compelling web apps and agentic code applications. The model also achieved 18.8% on Humanity’s Last Exam, demonstrating its ability to handle complex knowledge-based questions. Recommended read:
References :
Editor-In-Chief, BitDegree@bitdegree.org
//
A new, fully AI-driven weather prediction system called Aardvark Weather is making waves in the field. Developed through an international collaboration including researchers from the University of Cambridge, Alan Turing Institute, Microsoft Research, and the European Centre for Medium-Range Weather Forecasts (ECMWF), Aardvark Weather uses a deep learning architecture to process observational data and generate high-resolution forecasts. The model is designed to ingest data directly from observational sources, such as weather stations and satellites.
This innovative system stands out because it can run on a single desktop computer, generating forecasts tens of times faster than traditional systems and requiring thousands of times less computing power. While traditional weather forecasting relies on Numerical Weather Prediction (NWP) models that use physics-based equations and vast computational resources, Aardvark Weather replaces all stages of this process with a streamlined machine learning model. According to researchers, Aardvark Weather can generate a forecast in seconds or minutes, using only about 10% of the weather data required by current forecasting systems. Recommended read:
References :
Matthias Bastian@THE DECODER
//
Mistral AI, a French artificial intelligence startup, has launched Mistral Small 3.1, a new open-source language model boasting 24 billion parameters. According to the company, this model outperforms similar offerings from Google and OpenAI, specifically Gemma 3 and GPT-4o Mini, while operating efficiently on consumer hardware like a single RTX 4090 GPU or a MacBook with 32GB RAM. It supports multimodal inputs, processing both text and images, and features an expanded context window of up to 128,000 tokens, which makes it suitable for long-form reasoning and document analysis.
Mistral Small 3.1 is released under the Apache 2.0 license, promoting accessibility and competition within the AI landscape. Mistral AI aims to challenge the dominance of major U.S. tech firms by offering a high-performance, cost-effective AI solution. The model achieves inference speeds of 150 tokens per second and is designed for text and multimodal understanding, positioning itself as a powerful alternative to industry-leading models without the need for expensive cloud infrastructure. Recommended read:
References :
msaul@mathvoices.ams.org
//
Researchers at the Technical University of Munich (TUM) and the University of Cologne have developed an AI-based learning system designed to provide individualized support for schoolchildren in mathematics. The system utilizes eye-tracking technology via a standard webcam to identify students’ strengths and weaknesses. By monitoring eye movements, the AI can pinpoint areas where students struggle, displaying the data on a heatmap with red indicating frequent focus and green representing areas glanced over briefly.
This AI-driven approach allows teachers to provide more targeted assistance, improving the efficiency and personalization of math education. The software classifies the eye movement patterns and selects appropriate learning videos and exercises for each pupil. Professor Maike Schindler from the University of Cologne, who has collaborated with TUM Professor Achim Lilienthal for ten years, emphasizes that this system is completely new, tracking eye movements, recognizing learning strategies via patterns, offering individual support, and creating automated support reports for teachers. Recommended read:
References :
Matthew S.@IEEE Spectrum
//
References:
IEEE Spectrum
, Composio
Recent research has revealed that AI reasoning models, particularly Large Language Models (LLMs), are prone to overthinking, a phenomenon where these models favor extended internal reasoning over direct interaction with the problem's environment. This overthinking can negatively impact their performance, leading to reduced success rates in resolving issues and increased computational costs. The study highlights a crucial challenge in training AI models: finding the optimal balance between reasoning and efficiency.
The study, conducted by researchers, tasked leading reasoning LLMs with solving problems in benchmark. The results indicated that reasoning models overthought nearly three times as often as their non-reasoning counterparts. Furthermore, the more a model overthought, the fewer problems it successfully resolved. This suggests that while enhanced reasoning capabilities are generally desirable, excessive internal processing can be detrimental, hindering the model's ability to arrive at correct and timely solutions. This raises questions about how to effectively train models to utilize just the right amount of reasoning, avoiding the pitfalls of "analysis paralysis." Recommended read:
References :
@bdtechtalks.com
//
Alibaba has recently launched Qwen-32B, a new reasoning model, which demonstrates performance levels on par with DeepMind's R1 model. This development signifies a notable achievement in the field of AI, particularly for smaller models. The Qwen team showcased that reinforcement learning on a strong base model can unlock reasoning capabilities for smaller models that enhances their performance to be on par with giant models.
Qwen-32B not only matches but also surpasses models like DeepSeek-R1 and OpenAI's o1-mini across key industry benchmarks, including AIME24, LiveBench, and BFCL. This is significant because Qwen-32B achieves this level of performance with only approximately 5% of the parameters used by DeepSeek-R1, resulting in lower inference costs without compromising on quality or capability. Groq is offering developers the ability to build FAST with Qwen QwQ 32B on GroqCloud™, running the 32B parameter model at ~400 T/s. This model is proving to be very competitive in reasoning benchmarks and is one of the top open source models being used. The Qwen-32B model was explicitly designed for tool use and adapting its reasoning based on environmental feedback, which is a huge win for AI agents that need to reason, plan, and adapt based on context (outperforms R1 and o1-mini on the Berkeley Function Calling Leaderboard). With these capabilities, Qwen-32B shows that RL on a strong base model can unlock reasoning capabilities for smaller models that enhances their performance to be on par with giant models. Recommended read:
References :
@bdtechtalks.com
//
Alibaba's Qwen team has unveiled QwQ-32B, a 32-billion-parameter reasoning model that rivals much larger AI models in problem-solving capabilities. This development highlights the potential of reinforcement learning (RL) in enhancing AI performance. QwQ-32B excels in mathematics, coding, and scientific reasoning tasks, outperforming models like DeepSeek-R1 (671B parameters) and OpenAI's o1-mini, despite its significantly smaller size. Its effectiveness lies in a multi-stage RL training approach, demonstrating the ability of smaller models with scaled reinforcement learning to match or surpass the performance of giant models.
The QwQ-32B is not only competitive in performance but also offers practical advantages. It is available as open-weight under an Apache 2.0 license, allowing businesses to customize and deploy it without restrictions. Additionally, QwQ-32B requires significantly less computational power, running on a single high-end GPU compared to the multi-GPU setups needed for larger models like DeepSeek-R1. This combination of performance, accessibility, and efficiency positions QwQ-32B as a valuable resource for the AI community and enterprises seeking to leverage advanced reasoning capabilities. Recommended read:
References :
@Communications of the ACM
//
Andrew G. Barto and Richard S. Sutton have been awarded the 2024 ACM A.M. Turing Award for their foundational work in reinforcement learning (RL). The ACM recognized Barto and Sutton for developing the conceptual and algorithmic foundations of reinforcement learning, one of the most important approaches for creating intelligent systems. The researchers took principles from psychology and transformed them into a mathematical framework now used across AI applications. Their 1998 textbook "Reinforcement Learning: An Introduction" has become a cornerstone of the field, cited more than 75,000 times.
Their work, beginning in the 1980s, has enabled machines to learn independently through reward signals. This technology later enabled achievements like AlphaGo and today's large reasoning models (LRMs). Combining RL with deep learning has led to major advances, from AlphaGo defeating Lee Sedol to ChatGPT's training through human feedback. Their algorithms are used in various areas such as game playing, robotics, chip design and online advertising. Recommended read:
References :
george.fitzmaurice@futurenet.com (George@Latest from ITPro
//
DeepSeek, a Chinese AI startup founded in 2023, is rapidly gaining traction as a competitor to established models like ChatGPT and Claude. They have quickly risen to prominence and are now competing against much larger parameter models with much smaller compute requirements. As of January 2025, DeepSeek boasts 33.7 million monthly active users and 22.15 million daily active users globally, showcasing its rapid adoption and impact.
Qwen has recently introduced QwQ-32B, a 32-billion-parameter reasoning model, designed to improve performance on complex problem-solving tasks through reinforcement learning and demonstrates robust performance in tasks requiring deep analytical thinking. The QwQ-32B leverages Reinforcement Learning (RL) techniques through a reward-based, multi-stage training process to improve its reasoning capabilities, and can match a 671B parameter model. QwQ-32B demonstrates that Reinforcement Learning (RL) scaling can dramatically enhance model intelligence without requiring massive parameter counts. Recommended read:
References :
Michal Langmajer@Fello AI
//
OpenAI has announced the release of GPT-4.5, its latest language model which they are calling their 'last non-chain-of-thought model.' According to OpenAI, GPT-4.5 offers substantial enhancements over its predecessors, particularly in advanced reasoning, problem-solving, and contextual understanding. Sam Altman, CEO of OpenAI, described it as the "first model that feels like talking to a thoughtful person," noting moments of astonishment at the quality of advice received from the AI.
However, the rollout is facing challenges due to GPU shortages. Altman stated they are "out of GPUs," leading to a staggered release, initially limited to ChatGPT Pro subscribers who pay $200 a month. While GPT-4.5 is available to developers across all paid API tiers, OpenAI plans to expand access to Plus and Team tiers next week, with tens of thousands of GPUs expected to arrive to alleviate the supply constraints. Despite not being a reasoning model, OpenAI estimates that GPT-4.5 is 30 times more expensive to run than GPT-4o. Recommended read:
References :
Emily Forlini@PCMag Middle East ai
//
Google DeepMind has announced the pricing for its Veo 2 AI video generation model, making it available through its cloud API platform. The cost is set at $0.50 per second, which translates to $30 per minute or $1,800 per hour. While this may seem expensive, Google DeepMind researcher Jon Barron compared it to the cost of traditional filmmaking, noting that the blockbuster "Avengers: Endgame" cost around $32,000 per second to produce.
Veo 2 aims to create videos with realistic motion and high-quality output, up to 4K resolution, based on simple text prompts. While it's not the cheapest option compared to alternatives like OpenAI's Sora, which costs $200 per month, Google is targeting filmmakers and studios with larger budgets. The primary customers for Veo are filmmakers and studios, who typically have bigger budgets than film hobbyists. They would run Veo throughVertexAI, Google's platform for training and deploying advanced AI models."Veo 2 understands the unique language of cinematography: ask it for a genre, specify a lens, suggest cinematic effects and Veo 2 will deliver," Google says. Recommended read:
References :
vishnupriyan@Verdict
//
Google's AI mathematics system, known as AlphaGeometry2 (AG2), has surpassed the problem-solving capabilities of International Mathematical Olympiad (IMO) gold medalists in solving complex geometry problems. This second-generation system combines a language model with a symbolic engine, enabling it to solve 84% of IMO geometry problems, compared to the 81.8% solved by human gold medalists. Developed by Google DeepMind, AG2 can engage in both pattern matching and creative problem-solving, marking a significant advancement in AI's ability to mimic human reasoning in mathematics.
This achievement comes shortly after Microsoft released its own advanced AI math reasoning system, rStar-Math, highlighting the growing competition in the AI math domain. While rStar-Math uses smaller language models to solve a broader range of problems, AG2 focuses on advanced geometry problems using a hybrid reasoning model. The improvements in AG2 represent a 30% performance increase over the original AlphaGeometry, particularly in visual reasoning and logic, essential for solving complex geometry challenges. Recommended read:
References :
|
Blogs
|