@the-decoder.com - 40d
DeepSeek has unveiled its v3 large language model (LLM), a significant advancement in AI. This new model was trained on an impressive 14.8 trillion tokens using 2,788,000 H800 GPU hours at a cost of approximately $5.576 million, a figure remarkably lower than other models of similar capability. DeepSeek v3's training involved both supervised fine-tuning and reinforcement learning, enabling it to achieve performance benchmarks comparable to Claude 3.5 Sonnet, showcasing its strong capabilities. The model is a Mixture-of-Experts (MoE) model with 671 billion parameters, with 37 billion activated for each token.
The release of DeepSeek v3 also includes API access, with highly competitive pricing compared to others in the market. Input is priced at $0.27 per million tokens (or $0.07 with cache hits), and output at $1.10 per million tokens. For comparison, Claude 3.5 Sonnet charges $3 per million tokens for input and $15 for output. These prices, along with its strong performance, indicate DeepSeek v3 is set to disrupt the market in terms of model quality and affordability. The model was also released as fully open-source with all associated papers and training frameworks provided to the research community. References :
Classification:
@pub.towardsai.net - 36d
Recent developments in AI agent frameworks are paving the way for more efficient and scalable applications. The Jido framework, built in Elixir, is designed to run thousands of agents using minimal resources. Each agent requires only 25KB of memory at rest, enabling large-scale deployment without heavy infrastructure. This capability could significantly reduce the cost and complexity of running multiple parallel agents, a common challenge in current agent frameworks. Jido also allows agents to dynamically manage their own workflows and sub-agents utilizing Elixir's concurrency features and OTP architecture.
The core of Jido centers around four key concepts: Actions, Workflows, Agents, and Sensors. Actions represent small, reusable tasks, while workflows chain these actions together to achieve broader goals. Agents are stateful entities that can plan and execute these workflows. The focus is on creating a system where the agents can, to a degree, manage themselves without constant human intervention. Jido provides a practical approach to building autonomous, distributed systems through functional programming principles, and dynamic error handling. References :
Classification:
@gregrobison.medium.com - 47d
Recent articles are exploring the practical applications of mathematics across various fields and skill levels. One focus is on linear regression, with detailed guides covering the theory, implementation from scratch using tools like NumPy, and even a demonstration of a needlessly complicated tool for the same purpose. These resources are aimed at helping people understand the math behind machine learning algorithms and their real-world applications, such as predicting housing prices, forecasting sales, and analyzing scientific data.
The use of Large Language Models (LLMs) in mathematics education is another key area of investigation. While LLMs demonstrate remarkable capabilities in language generation, they struggle with direct mathematical problem-solving, instead they use coding to indirectly solve mathematical problems. However, they can be powerful tools for helping students, for instance, ChatGPT can be used to solve math problems with step-by-step instructions and can also read research papers and provide explanations. Additionally, topics like multinomial distributions, and real world mathematical analysis from The Cat In The Hat are being explored to further enhance mathematical understanding. References :
Classification:
|
|