‘Deep Chik-R2’

-

(Photo = Shutterstock)

Details about Deep Chic’s latest reasoning model ‘Deep Chic-R2’, which was within the early stage of launch, is floating on the Web. If it is understood, Deep Chic is prone to be shocked by Western countries. This time, he trained as a chip in Huawei, China, and inferred at lower than 1%of the open AI ‘GPT-4O’.

Chinese media, including South China Morning Post, reported on the twenty ninth that details about Deep Chic-R2 was leaked to the Web.

That is the knowledge appeared on a community site in China on the twenty fifth. The user of ‘Hot Spot Chaser’ revealed the main points within the article, ‘Deep Chic-R2: Dan Price 97.3%, Soon Launch, Core List’.

He cited R2’s technical innovation, including three core, architecture innovation, data engineering, and hardware application.

Initially, the architecture said it has adopted its own hybrid MOE 3.0. It’s a type of a combination of reasoning and non -theory models. As well as, the parameters are 1.2 trillion, and the variety of parameters which can be activated in accordance with expert mixing method is 78 billion.

This is sort of twice the R1 model with 670 billion parameters, especially the primary model known to be over one trillion within the released parameters. In other words, it becomes the biggest model ever.

Within the test conducted by Alibaba Cloud, R2 said that when the long text inference was processed, the token cost decreased by 97.3%in comparison with the ‘GPT-4O’. In other words, Deep Chic introduced that the input cost per token is $ 0.07 and the output cost is $ 0.27.

Deep Chic-R1 cost $ 0.07 for input and $ 1.10 for output. R-2 is cheaper.

This time, it is understood that ‘distillation’ was used. He also learned a 5.2 petabyte (PB) high -quality dataset, including finance and laws, and said that accuracy increased to 89.7%.

As well as, the distributed learning framework, which was developed in -house, achieved 82% of the Huawei ‘Ascend 910B’ chip cluster, and computing power recorded 512 petaflop within the FP16 precision, reaching 91% of the NVIDIA ‘A100’ cluster.

It is just not known whether Deep Chic has been training with Huawei’s chip alone or a combination of NVIDIA chips. Nevertheless, it’s surprising that Chinese chips showed performance comparable to NVIDIA’s existing flagship chips.

The R2 can also be known to have multimote. The ‘Vit-TRANSFORMER’ hybrid architecture utilized in visual language models said that it was 11.6% higher than the CLIP model of the open AI within the division of objects. In consequence of the introduction of the medical diagnosis photo evaluation, the chest X-ray reading rate achieved 98.1%accuracy.

Lastly, he emphasized that the quantization compression technology can reduce the model size by lower than 2% and reduce the model size by 83%. This is feasible to distribute edge.

Chinese media also said that this is barely a guess and can’t confirm the actual contents. But when that is true, he stressed that there’s a possibility of peculiar the world.

Particularly, despite the US government’s export control, it implies that China has been in a position to pursue NVIDIA.

By Dae -jun Lim, reporter ydj@aitimes.com

ASK ANA

What are your thoughts on this topic?
Let us know in the comments below.

0 0 votes
Article Rating
guest
0 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments

Share this article

Recent posts

0
Would love your thoughts, please comment.x
()
x