DeepSeek, which has been evaluated because the world’s best open source model with its ‘V3’ model, has now released the ‘R1’ series, an inference model that competes with OpenAI’s ‘o1’ model, as open source.
DeepSeek announced on the twentieth (local time) that it has officially launched open source inference models ▲R1 ▲R1-Zero ▲R1-Distill.
R1 and R1-Zero are fine-tuned models of ‘DeepSeek-V3’ and every contain 671 billion parameters. This model adopts the ‘Mixed Experts (MoE)’ architecture and is designed to activate only about 34 billion of the entire parameters. In other words, it maintains high performance while reducing inference costs and memory usage.
Inference-specific LLMs are generally learned using two methods: reinforcement learning (RL) and supervised fine-tuning (SFT). RL is a technique of coaching AI to perform tasks through trial and error, and SFT is a technique of improving output quality by providing task examples.
Nevertheless, despite the fact that DeepSeek omitted SFT in the course of the development of R1-Zero, it successfully implemented key inference techniques, resembling decomposing complex tasks into easy substeps. R1-Zero recorded similar performance to o1 within the inference benchmark (AIME 2024).
DeepSeek explained, “That is the primary published study to display that the reasoning ability of LLM could be induced by RL without SFT.”
Nevertheless, R1-Zero had limitations in output quality. There have been cases where responses were repetitive, poor readability, and language mixing problems were observed. To compensate for this, DeepSeek developed the R1 model.

R1 is an improved version of R1-Zero, with a modified training workflow. This includes SFT, which was omitted during R1-Zero development. DeepSeek announced that this significantly improved output quality.
Benchmark results showed that R1 outperformed the o1 model in lots of areas, and even when o1 scored higher, the difference between R1 was only inside 5%.
Along with high performance, R1 is provided through DeepSeek’s API, and its cost is 90-95% lower than that of o1.
DeepSeek also released the ‘R1-Distillation’ model group, which has excellent hardware efficiency but low performance, as open source. These include ▲R1-Distillation-Q One-1.5B ▲R1-Distillation-Q One-7B ▲R1-Distillation-Rama-8B ▲R1-Distillation-Q One-14B ▲R1-Distillation-Q One-32B ▲R1-Distillation-Rama- 70B, etc. are included.
These models were developed by fine-tuning Meta’s ‘Rama’ and Alibaba’s ‘Q1’ based on data distilled from R1. Specifically, R1-Distillation-Q1-1.5B could be run on laptops, and R1-Distillation-Q1-32B has shown performance that surpasses OpenAI’s o1-mini in several benchmarks.
Currently the R1 series is In the cuddling face You’ll be able to download model weights and code or use the API, Deep seek chat platformYou’ll be able to test it through .
Reporter Park Chan cpark@aitimes.com