Join our daily and weekly newsletters for the latest updates and exclusive content covering cutting-edge AI. Learn more
Chinese AI startup Deep searchknown for challenging leading AI vendors with open source technologies, has just dropped another bomb: a new open reasoning LLM called DeepSeek-R1.
Based on the recently introduced DeepSeek V3 expert mixture model, DeepSeek-R1 matches the performance of o1, OpenAI’s frontier reasoning LLM, in math, coding, and reasoning tasks. The best part? It does so at a much more tempting cost, turning out to be 90-95% more affordable than the latter.
This release marks a major step forward in the field of open source. It shows that open models further narrow the gap with closed business models in the race for artificial general intelligence (AGI). To show off the prowess of its work, DeepSeek also used R1 to distill six Llama and Qwen models, taking their performance to new levels. In one case, the distilled version of Qwen-1.5B outperformed much larger models, GPT-4o and Claude 3.5 Sonnet, in some mathematical tests.
These distilled models, as well as the main R1have been open source and are available on Hugging Face licensed by MIT.
What does DeepSeek-R1 bring?
The focus is on artificial general intelligence (AGI), a level of AI capable of performing intellectual tasks like humans. Many teams are doubling their efforts to improve the reasoning capabilities of models. OpenAI took the first notable step in the field with its o1 model, which uses a chain-of-thought reasoning process to solve a problem. Through reinforcement learning or reward-based optimization, o1 learns to refine his chain of thought and refine the strategies he uses, ultimately learning to recognize and correct his mistakes, or try new approaches when those current ones do not work.
Now, continuing the work in this direction, DeepSeek has released DeepSeek-R1, which uses a combination of RL and supervised fine-tuning to handle complex reasoning tasks and match the performance of o1.
In testing, DeepSeek-R1 scored 79.8% on the AIME 2024 math tests and 97.3% on the MATH-500. It also scored 2,029 on Codeforces, better than 96.3% of human programmers. On the other hand, o1-1217 obtained 79.2%, 96.4% and 96.6% respectively on these criteria.
He also demonstrated strong general knowledge, with an accuracy of 90.8% on MMLU, just behind o1’s 91.8%.
The training pipeline
DeepSeek-R1’s reasoning performance marks a big win for the Chinese startup in the US-dominated AI field, especially since the entire work is open source, including how the The company trained everything.
However, the job is not as simple as it seems.
According to the paper describing the research, DeepSeek-R1 was developed as an improved version of DeepSeek-R1-Zero, a revolutionary model trained solely on reinforcement learning.
The company first used the DeepSeek-V3 base as a base model, developing its reasoning capabilities without using supervised data, essentially focusing only on its self-evolution through a purely trial-and-error process. based on RL. Developed intrinsically from the work, this capability ensures that the model can solve increasingly complex reasoning tasks by leveraging extended test time calculation to explore and refine its thought processes in greater depth.
“During training, DeepSeek-R1-Zero naturally emerged with many powerful and interesting reasoning behaviors,” the researchers note in the paper. “After thousands of RL steps, DeepSeek-R1-Zero shows superb performance on reasoning tests. For example, the pass@1 score on AIME 2024 increases from 15.6% to 71.0%, and with the majority vote, the score further improves to 86.7%, matching the performance of OpenAI-o1- 0912.
However, although it showed improved performance, including behaviors such as thinking and exploring alternatives, the initial model showed some issues, including poor readability and language mixing. To solve this problem, the company built on the work done for R1-Zero, using a multi-step approach combining both supervised learning and reinforcement learning, and thus developed the improved R1 model.
“Specifically, we start by collecting thousands of cold start data to refine the DeepSeek-V3-Base model,” the researchers explained. “Following this, we perform reasoning-oriented RL like DeepSeek-R1-Zero. As we approach convergence in the RL process, we create new SFT data via rejection sampling on the RL checkpoint, combined with supervised data from DeepSeek-V3 in areas such as writing, evidence-based quality assurance and self-cognition, then recycle DeepSeek-V3. -Basic model. After refining the new data, the checkpoint goes through an additional RL process, taking into account the prompts from all scenarios. After these steps, we obtained a checkpoint called DeepSeek-R1, which achieves comparable performance to OpenAI-o1-1217.
Much more affordable than o1
In addition to improved performance that almost matches OpenAI’s o1 in all benchmarks, the new DeepSeek-R1 is also very affordable. Specifically, where OpenAI o1 costs $15 per million input tokens and $60 per million output tokens, DeepSeek Reasoner, which is based on the R1 model, costs $0.55 per million input tokens and $2.19 per million output tokens.
The model can be tested under the name “DeepThink” on the DeepSeek chat platformwhich is similar to ChatGPT. Interested users can access the model weights and code repository via Hugging Face, under an MIT license, or can use the API for direct integration.