@www.theverge.com
//
OpenAI has recently launched its o3-mini model, the first in their o3 family, showcasing advancements in both speed and reasoning capabilities. The model comes in two variants: o3-mini-high, which prioritizes in-depth reasoning, and o3-mini-low, designed for quicker responses. Benchmarks indicate that o3-mini offers comparable performance to its predecessor, o1, but at a significantly reduced cost, being approximately 15 times cheaper and five times faster. This is especially interesting because o3-mini is cheaper than GPT-4o, despite having a usage limit of 150 messages per hour compared to the unrestricted GPT-4o, showcasing its cost-effectiveness.
OpenAI is also now providing more detailed insights into the reasoning process of o3-mini, addressing criticism regarding transparency and competition from models like DeepSeek-R1. This includes revealing summarized versions of the chain of thought (CoT) used by the model, offering users greater clarity on its reasoning logic. OpenAI CEO Sam Altman believes that merging large language model scaling with reasoning capabilities could lead to "new scientific knowledge," hinting at future advancements beyond current limitations in inventing new algorithms or fields. References :
Classification:
@bdtechtalks.com
//
Alibaba has recently launched Qwen-32B, a new reasoning model, which demonstrates performance levels on par with DeepMind's R1 model. This development signifies a notable achievement in the field of AI, particularly for smaller models. The Qwen team showcased that reinforcement learning on a strong base model can unlock reasoning capabilities for smaller models that enhances their performance to be on par with giant models.
Qwen-32B not only matches but also surpasses models like DeepSeek-R1 and OpenAI's o1-mini across key industry benchmarks, including AIME24, LiveBench, and BFCL. This is significant because Qwen-32B achieves this level of performance with only approximately 5% of the parameters used by DeepSeek-R1, resulting in lower inference costs without compromising on quality or capability. Groq is offering developers the ability to build FAST with Qwen QwQ 32B on GroqCloud™, running the 32B parameter model at ~400 T/s. This model is proving to be very competitive in reasoning benchmarks and is one of the top open source models being used. The Qwen-32B model was explicitly designed for tool use and adapting its reasoning based on environmental feedback, which is a huge win for AI agents that need to reason, plan, and adapt based on context (outperforms R1 and o1-mini on the Berkeley Function Calling Leaderboard). With these capabilities, Qwen-32B shows that RL on a strong base model can unlock reasoning capabilities for smaller models that enhances their performance to be on par with giant models. References :
Classification:
@bdtechtalks.com
//
Alibaba's Qwen team has unveiled QwQ-32B, a 32-billion-parameter reasoning model that rivals much larger AI models in problem-solving capabilities. This development highlights the potential of reinforcement learning (RL) in enhancing AI performance. QwQ-32B excels in mathematics, coding, and scientific reasoning tasks, outperforming models like DeepSeek-R1 (671B parameters) and OpenAI's o1-mini, despite its significantly smaller size. Its effectiveness lies in a multi-stage RL training approach, demonstrating the ability of smaller models with scaled reinforcement learning to match or surpass the performance of giant models.
The QwQ-32B is not only competitive in performance but also offers practical advantages. It is available as open-weight under an Apache 2.0 license, allowing businesses to customize and deploy it without restrictions. Additionally, QwQ-32B requires significantly less computational power, running on a single high-end GPU compared to the multi-GPU setups needed for larger models like DeepSeek-R1. This combination of performance, accessibility, and efficiency positions QwQ-32B as a valuable resource for the AI community and enterprises seeking to leverage advanced reasoning capabilities. References :
Classification:
|
Blogs
|