DeepSeek launches the biggest LLM in open source history… “Caught up with GPT-4o”

-

(Photo = Shutterstock)

China’s DeepSeek has unveiled ‘DeepSeek-V3’, the biggest open source large language model (LLM) ever. It was emphasized that this model has performance that surpasses existing open source models similar to Meta’s ‘Rama 3.1 405B’ and Alibaba’s ‘Q1 2.5 72B’, and even surpasses OpenAI’s ‘GPT-4o’.

On the twenty sixth (local time), DeepSeek released ‘DeepSeek-V3’, an open source LLM with 671 billion parameters. That is greater than 1.5 times larger than Rama 3.1 405B with 405 billion parameters, and is the biggest open source model released up to now.

DeepSeek-V3 can perform a wide range of text tasks based on descriptive prompts, including coding, translation, essay writing, and email writing.

As well as, it’s subdivided into several expert models in keeping with the characteristics of the work, and the ‘Mixed Experts (MoE)’ method is used to maximise efficiency by activating or combining models appropriate for the query. Through this, by activating only about 34 billion out of 671 billion parameters, inference costs and memory usage will be significantly reduced while maintaining performance.

It’s pre-trained with 14.8 trillion tokens and supports a context window of as much as 128,000 tokens. The training process was conducted in an NVIDIA ‘H800’ GPU-based data center, and price a comparatively small amount of roughly $5.57 million (roughly KRW 8.2 billion). That is evaluated as being very economical in comparison with Rama 3.1, which is estimated to have a training cost of greater than 500 million dollars (about 730 billion won).

Technological innovation also stands out. ‘Multihead Latent Attention (MLA)’ technology is used to repeatedly extract essential details from text, reducing the opportunity of missing essential information, and the multi-token prediction (MTP) function generates multiple tokens directly to enhance inference speed. did it

Benchmark results (Photo = Deep Seek)
Benchmark results (Photo = Deep Seek)

When it comes to performance, it has established itself as some of the powerful open source models. It recorded excellent scores in various benchmark tests centered on Chinese and math, and specifically, received 90.2 points within the Math-500 test, surpassing Q1 (80 points) by a big margin. Apart from English-centered SimpleQA and FRAMES, most benchmarks showed results that surpassed GPT-4o. Nonetheless, only Antropic’s ‘Claude 3.5 Sonnet’ showed competitiveness by recording higher scores in certain tests similar to MMLU-Pro, IF-Eval, and GPQA-Diamond.

DeepSeek is China’s leading open source powerhouse. Last month, it was the primary to release an open source inference model that surpasses ‘o1-preview’.

Currently, DeepSeek-V3 is hugging faceand githubIt is out there in

Reporter Park Chan cpark@aitimes.com

ASK DUKE

What are your thoughts on this topic?
Let us know in the comments below.

0 0 votes
Article Rating
guest
0 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments

Share this article

Recent posts

0
Would love your thoughts, please comment.x
()
x