Top Mathematics discussions

NishMath

@the-decoder.com - 40d
DeepSeek has unveiled its v3 large language model (LLM), a significant advancement in AI. This new model was trained on an impressive 14.8 trillion tokens using 2,788,000 H800 GPU hours at a cost of approximately $5.576 million, a figure remarkably lower than other models of similar capability. DeepSeek v3's training involved both supervised fine-tuning and reinforcement learning, enabling it to achieve performance benchmarks comparable to Claude 3.5 Sonnet, showcasing its strong capabilities. The model is a Mixture-of-Experts (MoE) model with 671 billion parameters, with 37 billion activated for each token.

The release of DeepSeek v3 also includes API access, with highly competitive pricing compared to others in the market. Input is priced at $0.27 per million tokens (or $0.07 with cache hits), and output at $1.10 per million tokens. For comparison, Claude 3.5 Sonnet charges $3 per million tokens for input and $15 for output. These prices, along with its strong performance, indicate DeepSeek v3 is set to disrupt the market in terms of model quality and affordability. The model was also released as fully open-source with all associated papers and training frameworks provided to the research community.

Share: bluesky twitterx--v2 facebook--v1 threads


References :
  • Hacker News: DeepSeek v3 beats Claude sonnet 3.5 and way cheaper
  • THE DECODER: Deepseek V3 emerges as China's most powerful open-source language model to date
  • github.com: DeepSeek_V3.pdf
  • www.marktechpost.com: The field of Natural Language Processing (NLP) has made significant strides with the development of large-scale language models (LLMs).
Classification:
@decodebuzzing.medium.com - 2d
DeepSeek, a Chinese AI startup, has rapidly gained attention in the AI industry with its R1 model, challenging established players like ChatGPT and Qwen. Benchmarking tests have revealed strengths and weaknesses in DeepSeek's performance across coding, mechanics, and algorithmic precision, highlighting its potential as a cost-effective AI solution. However, concerns have emerged regarding the model's safety guardrails, with researchers demonstrating a "100 percent attack success rate" in bypassing its protections against malicious prompts.

DeepSeek's rise has also attracted scrutiny due to potential violations of U.S. export restrictions. The U.S. government is investigating whether DeepSeek acquired restricted Nvidia GPUs through intermediaries in Singapore, which has seen a significant increase in its share of Nvidia's revenue. Furthermore, Texas Governor Greg Abbott has banned DeepSeek and other Chinese-backed AI platforms on state government-issued devices, citing security concerns. Despite these challenges, DeepSeek-R1 models are now available on Amazon Web Services (AWS), allowing users to deploy and scale their generative AI applications.

Share: bluesky twitterx--v2 facebook--v1 threads


References :
Classification: