Open-source DeepSeek-R1 uses pure reinforcement learning to match OpenAI o1

Join our daily and weekly newsletters for the latest updates and exclusive content covering cutting-edge AI. Learn more

Chinese AI startup Deep searchknown for challenging leading AI vendors with open source technologies, has just dropped another bomb: a new open reasoning LLM called DeepSeek-R1.

Based on the recently introduced DeepSeek V3 expert mixture model, DeepSeek-R1 matches the performance of o1, OpenAI’s frontier reasoning LLM, in math, coding, and reasoning tasks. The best part? It does so at a much more tempting cost, turning out to be 90-95% more affordable than the latter.

This release marks a major step forward in the field of open source. It shows that open models further narrow the gap with closed business models in the race for artificial general intelligence (AGI). To show off the prowess of its work, DeepSeek also used R1 to distill six Llama and Qwen models, taking their performance to new levels. In one case, the distilled version of Qwen-1.5B outperformed much larger models, GPT-4o and Claude 3.5 Sonnet, in some mathematical tests.

These distilled models, as well as the main R1have been open source and are available on Hugging Face licensed by MIT.

What does DeepSeek-R1 bring?

The focus is on artificial general intelligence (AGI), a level of AI capable of performing intellectual tasks like humans. Many teams are doubling their efforts to improve the reasoning capabilities of models. OpenAI took the first notable step in the field with its o1 model, which uses a chain-of-thought reasoning process to solve a problem. Through reinforcement learning or reward-based optimization, o1 learns to refine his chain of thought and refine the strategies he uses, ultimately learning to recognize and correct his mistakes, or try new approaches when those current ones do not work.

Now, continuing the work in this direction, DeepSeek has released DeepSeek-R1, which uses a combination of RL and supervised fine-tuning to handle complex reasoning tasks and match the performance of o1.

In testing, DeepSeek-R1 scored 79.8% on the AIME 2024 math tests and 97.3% on the MATH-500. It also scored 2,029 on Codeforces, better than 96.3% of human programmers. On the other hand, o1-1217 obtained 79.2%, 96.4% and 96.6% respectively on these criteria.

He also demonstrated strong general knowledge, with an accuracy of 90.8% on MMLU, just behind o1’s 91.8%.

Performance of DeepSeek-R1 compared to OpenAI o1 and o1-mini

The training pipeline

DeepSeek-R1’s reasoning performance marks a big win for the Chinese startup in the US-dominated AI field, especially since the entire work is open source, including how the The company trained everything.

However, the job is not as simple as it seems.

According to the paper describing the research, DeepSeek-R1 was developed as an improved version of DeepSeek-R1-Zero, a revolutionary model trained solely on reinforcement learning.

The company first used the DeepSeek-V3 base as a base model, developing its reasoning capabilities without using supervised data, essentially focusing only on its self-evolution through a purely trial-and-error process. based on RL. Developed intrinsically from the work, this capability ensures that the model can solve increasingly complex reasoning tasks by leveraging extended test time calculation to explore and refine its thought processes in greater depth.

“During training, DeepSeek-R1-Zero naturally emerged with many powerful and interesting reasoning behaviors,” the researchers note in the paper. “After thousands of RL steps, DeepSeek-R1-Zero shows superb performance on reasoning tests. For example, the pass@1 score on AIME 2024 increases from 15.6% to 71.0%, and with the majority vote, the score further improves to 86.7%, matching the performance of OpenAI-o1- 0912.

However, although it showed improved performance, including behaviors such as thinking and exploring alternatives, the initial model showed some issues, including poor readability and language mixing. To solve this problem, the company built on the work done for R1-Zero, using a multi-step approach combining both supervised learning and reinforcement learning, and thus developed the improved R1 model.

“Specifically, we start by collecting thousands of cold start data to refine the DeepSeek-V3-Base model,” the researchers explained. “Following this, we perform reasoning-oriented RL like DeepSeek-R1-Zero. As we approach convergence in the RL process, we create new SFT data via rejection sampling on the RL checkpoint, combined with supervised data from DeepSeek-V3 in areas such as writing, evidence-based quality assurance and self-cognition, then recycle DeepSeek-V3. -Basic model. After refining the new data, the checkpoint goes through an additional RL process, taking into account the prompts from all scenarios. After these steps, we obtained a checkpoint called DeepSeek-R1, which achieves comparable performance to OpenAI-o1-1217.

Much more affordable than o1

In addition to improved performance that almost matches OpenAI’s o1 in all benchmarks, the new DeepSeek-R1 is also very affordable. Specifically, where OpenAI o1 costs $15 per million input tokens and $60 per million output tokens, DeepSeek Reasoner, which is based on the R1 model, costs $0.55 per million input tokens and $2.19 per million output tokens.

The model can be tested under the name “DeepThink” on the DeepSeek chat platformwhich is similar to ChatGPT. Interested users can access the model weights and code repository via Hugging Face, under an MIT license, or can use the API for direct integration.

Daily insights into business use cases with VB Daily

If you want to impress your boss, VB Daily has you covered. We give you insight into what companies are doing with generative AI, from regulatory changes to practical deployments, so you can share insights for maximum ROI.

Read our privacy policy

Thank you for subscribing. Check out more VB newsletters here.

An error has occurred.

Open-source DeepSeek-R1 uses pure reinforcement learning to match OpenAI o1 — at 95% less cost

What does DeepSeek-R1 bring?

The training pipeline

Much more affordable than o1

Leave a Reply Cancel reply

Follow US

Popular News

Google Will Use AI to Determine if You’re Lying About Your Age

Global Coronavirus Cases

Categories

Quick Link

Top Categories

Subscribe US

What does DeepSeek-R1 bring?

The training pipeline

Much more affordable than o1

You Might Also Like

Meta is trying to stop a former employee from promoting her book about Facebook

Framework’s First Desktop Is an Xbox-Sized Mini Gaming PC

Anti-aging zealot Bryan Johnson wants to start ‘foodome sequencing’

Texas Official Warns Against ‘Measles Parties’ Amid Growing Outbreak

Elon Musk Is Running the Twitter Playbook on the Federal Government

Leave a Reply Cancel reply

Follow US

Weekly Newsletter

Popular News

Google Will Use AI to Determine if You’re Lying About Your Age

Global Coronavirus Cases

Categories

Quick Link

Top Categories

Subscribe US