Join our daily and weekly newsletters for the latest updates and the exclusive content on AI coverage. Learn more
Qwen teamA division of the Chinese electronic commerce giant Alibaba The development of his growing family of large language models (LLMS) in the open source source has introduced QWQ-32BA new reasoning model of 32 billion parameters designed to improve the performance of complex problem solving tasks thanks to strengthening learning (RL).
The model is available in the form of open weight on Face and on Model Under an Apache 2.0 license. This means that it is available for commercial and research uses, so that companies can use them immediately to feed their products and applications (even those they charge for customers to use).
It is also accessible for individual users via Qwen cat.
Quan-With-Questions was Alibaba’s response to the original Openai O1 reasoning model
QWQ, abbreviation of Qwen-With-Questions, was presented for the first time by Alibaba in November 2024 as an open source model aimed at competing with O1-PREVIEW D’OPENAI.
When launching, the model was designed to improve logical reasoning and planning by examining and refining its own responses during inference, a technique that made it particularly effective in mathematical and coding tasks.
The initial version of QWQ included 32 billion parameters and a context length of 32,000 tonnes, with Alibaba highlighting its ability to surpass O1-PREVIEW in mathematical benchmarks as loves and mathematics, as well as scientific reasoning tasks such as GPQA.
Despite his forces, QWQ’s first iterations struggled with programming references like Livecodebench, where Openai models maintained an edge. In addition, as with many emerging models of reasoning, the QWQ has been faced with challenges such as the mixture of language and occasional circular reasoning loops.
However, Alibaba’s decision to release the model under a Apache 2.0 license assured that developers and companies could adapt and market it freely, distinguishing it from owners like O1 of Openai.
Since the initial release of QWQ, the AI landscape has evolved quickly. The limits of traditional LLMs have become more apparent, with scaling laws giving decreasing yields in improving performance.
This change has fueled interest in important reasoning models (LRM) – a new category of AI systems that use reasoning in inference time and self -reflection to improve precision. These include the O3 series of Openai and the deep successful success of the Chinese laboratory Rival Deepseek, an emanation of the quantitative analysis company of Hong Kong High-Fly Capital Management.
A new report According to the similar analysis and research firm, similar, revealed that since the R1 launch in January 2024, Deepseek has gone bankrupt the graphics to become the most visited IA model website behind Openai.
QWQ-32B, the last iteration of Alibaba, is based on this progress by integrating RL and structured self-interrogation, by positioning it as a serious competitor in the growing field of AI focused on reasoning.
Enlargement of performance with learning to strengthen several stages
Traditional models regulated by instruction often have difficulties with difficult reasoning tasks, but research from the QWEN team suggests that RL can considerably improve the ability of a model to solve complex problems.
QWQ-32B is based on this idea by implementing an RL training approach in several stages to improve mathematical reasoning, coding competence and general resolution of problems.
The model was compared to leading alternatives such as Deepseek-R1, O1-Mini and Deepseek-R1-Distillé-Qwen-32B, demonstrating competitive results despite less parameters than some of these models.

For example, while Deepseek-R1 works with 671 billion parameters (with 37 billion activated), QWQ-32B obtains comparable performance with a much smaller imprint-generally requiring 24 GB of VRAM on a GPU (Nvidia H100 have 80 GB) compared to more than 1500 GB of VRAM To manage the Full Deepseek R1 (16 GPU NVIDIA A100) – highlighting the effectiveness of Qwen’s RL approach.
QWQ-32B follows an architecture of causal language model and includes several optimizations:
- 64 layers of transformer with rope, swiglu, rmsnorm and attention qkv biases;
- Watch out for the widespread request (GQA) with 40 attention heads for requests and 8 for key values pairs;
- Extended context length of 131,072 tokens, allowing better management of long sequence entries;
- Training in several stages, including pre-training, supervised fine setting and RL.
The RL process for QWQ-32B was executed in two phases:
- Mathematical focus and coding: The model was formed using a precision verifier for mathematical reasoning and a code execution server for coding tasks. This approach assured that the responses generated were validated for accuracy before being reinforced.
- Improvement of general capacities: In a second phase, the model received training based on rewards using general reward models and rules -based auditors. This step has improved teaching afterwards, human alignment and agent reasoning without compromising its mathematics and coding capacities.
What it means for corporate decision -makers
For business leaders – including CEOs, CTOs, IT leaders, team leaders and IA – QWQ -32B application developers represents a potential change in the way AI can support commercial decision -making and technical innovation.
With its RL -focused reasoning capacities, the model can provide more precise, structured and compatible information, which makes it precious for use cases such as automated data analysis, strategic planning, software development and intelligent automation.
Companies seeking to deploy AI solutions for complex problems solving, coding assistance, financial modeling or customer service automation can find the efficiency of QWQ-32B an attractive option. In addition, its open availability allows organizations to adjust and personalize the model for applications specific to the domain without proprietary restrictions, which makes it a flexible choice for corporate AI strategies.
The fact that it comes from a Chinese electronic commerce giant can raise safety and bias problems for certain non -Chinese users, especially when using the Qwen cat interface. But as with Deepseek-R1, the fact that the model is available on the face of the embrace for downloading and offline use and adjustment or recycling, suggests that these can be overcome quite easily. And it is a viable alternative to Deepseek-R1.
Early reactions of users and influencers of AI energy
The release of QWQ-32B has already drawn the attention of the AI research and development community, several developers and industry professionals sharing their initial impressions on X (formerly Twitter):
- Embraced face Vaibhav srivastav (@reach_vb) The speed of inference of QWQ-32B has highlighted thanks to the supplier Hyperbolic laboratoriesThe caller “fast -flamboyant” and comparable to high -level models. He also noted that the “Beats Deepseek-R1 and Openai O1-Min model with the Apache 2.0 license”.
- Publisher of news and rumors AI Chubby (@kimmonismus) was impressed by the performance of the model, stressing that QWQ-32B sometimes surpasses Deepseek-R1, although it is 20 times smaller. “Saint Moly! Qwen cooked! they wrote.
- Yuchen Jin (@yuchenj_uw), Co-founder and CTO of hyperbolic laboratories,, celebrated the exit by noting the efficiency gains. “The small models are so powerful!” Alibaba Qwen published QWQ-32B, a reasoning model that beats Deepseek-R1 (671b) and Openai O1-Mini! »»
- Another member of the team hugging the face, Erik Kaunismäki (@erikkaum) underlined the ease of deployment, sharing that the model is available for one click deployment on the ending points of the embrace, which makes it accessible to developers without in -depth configuration.
Agent Capacities
QWQ-32B incorporates agentic capacities, which allows it to dynamically adjust the reasoning processes according to environmental feedback.
For optimal performance, the Qwen team recommends using the following inference parameters:
- Temperature: 0.6
- Topp: 0.95
- Topk: Between 20 and 40
- Wire scale: Recommended for handling sequences of more than 32,768 tokens
The model supports deployment using VLLM, a broadband inference frame. However, the current VLLM implementations only support static wire scaling, which maintains a fixed scale factor regardless of the input length.
Future developments
The Qwen team considers QWQ-32B as the first step in RL scaling to improve reasoning capacities. For the future, the team provides:
- Explore more RL scaling to improve the intelligence of the model;
- Integrate the agents with RL for long horizon reasoning;
- Continue to develop optimized foundation models for RL;
- Go to general artificial intelligence (AG) thanks to more advanced training techniques.
With QWQ-32B, the QWEN team positions RL as a key engine for the next generation of AI models, demonstrating that scaling can produce very efficient and effective reasoning systems.