Join our daily and weekly newsletters for the latest updates and the exclusive content on AI coverage. Learn more
The entire AI landscape went up in January 2025 after a Chinese AI startup then little known Deepseek (a subsidiary of the quantitative analysis company based in Hong Kong, High-Fly Capital Management) launched its powerful model of reasoning in open source Deepseek R1 publicly in the world, bringing American giants such as Meta.
While deep use quickly spread among researchers and businesses, Meta would have been sent in panic mode By learning that this new R1 model had been formed for a fraction of the cost of many other leading models, but outclassed them for as little as several million dollars – which he pays for some of his own AI team leaders.
The Meta AI AI strategy had to this point that the publication of the best open source models in its category under its brand “Llama” so that researchers and businesses rely freely (at least, if they had less than 700 million monthly users, to which they are supposed to contact META for special license conditions).
However, the surprisingly good performance of Deepseek R1 on a much lower budget would have shaken the management of the company and forced a kind of calculation, with the latest version of Llama, 3.3After being published a month before in December 2024, but already exceeded.
Now we know the fruits of this calculation: today, The founder and CEO of Meta, Mark Zuckerberg, took his Instagram account to announce a New series of LLAMA 4 modelsWith two of them – the 400 billion Llama 4 Maverick parameters and 109 billion Llama 4 Scout parameters – available today so that developers can download and start using or refining now Lama.com and code sharing community AI Face.
A massive parameter of 2 milliards Llama 4 Behemoth is also previewed today, Although Meta’s blog on versions said he was still being training and has not given any indication of the moment when he could be released. (The recall parameters refer to the parameters which govern the behavior of the model and which generally signify more a more powerful and complex model all around.)
One characteristic of these models is that they are all multimodal – trained and, therefore, capable of receiving and generating text, video and imagery (Hough Audio has not been mentioned).
Another is that they have incredibly long context windows – 1 million tokens for Llama 4 Maverick and 10 million for Llama 4 Scout – which is equivalent to around 1,500 and 15,000 pages of text, respectively, all that the model can manage in a single input / output interaction. This means that a user could theoretically download or paste up to 7,500 pages of text and receive as much in return from Llama 4 Scout, which would be practical for the dense fields of information such as medicine, science, engineering, mathematics, literature, etc.
Here is what we have learned from this version so far:
All-in on the mixture
The three models use the approach of the architecture “mixture of experts (MOE)” popularized in the outings of anterior models of Openai And Mistral, which essentially combines several smaller specialized models (“experts”) in different tasks, subjects and media formats in an entire unified and larger model. Each version of Llama 4 would therefore be a mixture of 128 different experts, and more effective to perform because only the expert necessary for a particular task, plus a “shared” expert, manages each token, instead that the whole model must run for everyone.
As the Llama 4 blog notes:
Consequently, while all the parameters are stored in memory, only a subset of the total parameters is activated during the description of these models. This improves the effectiveness of inference by lowering the costs and service latency of the model – Lelma 4 Maverick can be executed on one [Nvidia] H100 DGX host for easy deployment, or with distributed inference for maximum efficiency.
Scout and Maverick are available for the public for self-housing, while no API or levels of pricing hosted has been announced for an official meta infrastructure. Instead, Meta focuses on distribution via open download and integration with Meta AI in WhatsApp, Messenger, Instagram and Web.
Meta estimates the cost of inference for Llama 4 Maverick from $ 0.19 to $ 0.49 per 1 million tokens (using a 3: 1 input and output mixture). This makes it much cheaper than proprietary models like GPT-4O, which should cost $ 4.38 per million tokens, based on community landmarks.
The three LLAMA 4 models – in particular Maverick and Behemoth – are explicitly designed for reasoning, coding and problem solving step by step – although they do not seem to show the dedicated reasoning models such as the OPENAI “O”, Ni Deepseek R1 series.
Instead, they seem to be designed to compete more directly with “classic” and not in progress and multimodal models such as the GPT -4O of Openai and the Deepseek V3 – with the exception of the Llama 4 giant, which, which, which, do seem to threaten Deepseek R1 (more about this below!)
In addition, for Llama 4, Meta built personalized post-training pipelines focused on improving reasoning, such as:
- The removal of more than 50% of “easy” prompts during the supervised fine adjustment.
- Adopt a continuous strengthening learning loop with progressively more difficult prompts.
- Use of the @ k pass assessment and the curriculum sampling to strengthen performance in mathematics, logics and coding.
- Implementation of Metap, a new technique that allows engineers to adjust hyperparameters (such as layer learning rates) on models and apply them to other model sizes and token types while preserving the behavior of the planned model.
METAP is of particular interest because it could be used in the future to define hyperparammeters on the model, then remove many other types of models, which increases the efficiency of the training.
Like my colleague from Venturebeat and LLM expert, Ben Dickson, said the new Metap technique: “It can save a lot of time and money.
This is particularly critical when training models as large as Behemoth, which uses 32k GPU and FP8 Precision, reaching 390 TFOP / GPU on more than 30 tokens – more than double LLAMA 3 training data.
In other words: researchers can say at the overall level of the model how they want it to act and apply it to a larger and smaller version of the model, and on different support forms.
A powerful – but not yet THE most Powerful family – Model
In his Announcement video on Instagram (A meta subsidiary, naturally), the meta-PDG, Mark Zuckerberg, said that the company’s objective was to build the first AI of the world, the open source and make it universally accessible so that everyone in the world takes advantage … I said for a certain time that I think that the Open Source will become the main models, and with Llama 4, which begins. “”
It is a clearly carefully written statement, just like Meta’s blog calling Llama 4 Scout, “The best multimodal model in the world in his class And is more powerful than all the previous generation Llama models ”(Imperty added by me).
In other words, these are very powerful models, near the top of the TAS compared to the others in their parameter size class, but not necessarily new performance records. Nevertheless, Meta wanted to deceive the models that her new Llama 4 family beats, among them:
LLAMA 4 Behemoth
- Surpass GPT-4.5, Gemini 2.0 Pro and Claude Sonnet 3.7 on:
- Math-500 (95.0)
- GPQA Diamond (73.7)
- MMLU Pro (82.2)
LLAMA 4 MAVERICK
- Beats GPT-4O and Gemini 2.0 Flash on most multimodal reasoning benchmarks:
- Chartqa, docvqa, mathvista, mmmu
- Competitive with Deepseek v3.1 (45.8b params) while using less than half of the active parameters (17b)
- Reference scores:
- Chartqa: 90.0 (against GPT-4O 85.7)
- Docvqa: 94.4 (vs 92.8)
- MMLU Pro: 80.5
- Cost-efficiency: $ 0.19 to $ 0.49 per 1 million tokens

LLAMA 4 Scout
- Corresponds to models like Mistral 3.1, Gemini 2.0 Flash-Lite and Gemma 3 ON:
- Docvqa: 94.4
- MMLU Pro: 74.3
- Mathvista: 70.7
- Country length of 10 m token not fixed – ideal for long documents, code bases or multi -tour analysis
- Designed for effective deployment in a single H100 GPU

But after all this, how does Llama 4 accumulate in Deepseek?
But of course, there is a completely different class of models of reasoning such as Deepseek R1, the “O” series by Openai (like GPT-4O), Gemini 2.0 and Claude Sonnet.
Using the highest compromise model model – the LLAMA 4 giant – and comparing it to the Inial Deepseek R1 Liberation Table for R1-32B and OPENAI O1 models, this is how the LLAMA 4 giant is pushed:
Reference | LLAMA 4 Behemoth | Deepseek R1 | OPENAI O1-1217 |
---|---|---|---|
Math-500 | 95.0 | 97.3 | 96.4 |
GPQA diamond | 73.7 | 71.5 | 75.7 |
Mmlu | 82.2 | 90.8 | 91.8 |
What can we conclude?
- Math-500: The Lama 4 giant is slightly behind Deepseek R1 and Openai O1.
- GPQA Diamond: Behemoth is Before Deepseek R1, but behind Openai O1.
- MMLU: Behemoth follows both, but still surpasses Gemini 2.0 Pro and GPT-4.5.
To take away: while Deepseek R1 and Openai O1 Edge Out Behemoth on a few metrics, the giant of Llama 4 remains very competitive and occurs at the top or near the classification of reasoning in his class.
Political security and “bias”
Meta has also focused on the alignment and safety of the model by introducing tools such as Llama Guard, prompt Guard and Cybersceval to help developers detect unwelling entry / output or contradictory prompts, and the implementation of generative offensive agents tests (Goat) for the automated red team.
The company also affirms that Llama 4 shows a substantial improvement in relation to “political prejudices” and says “in particular, [leading LLMs] Historically, lengthened on the left with regard to political and social subjects debated ”, which this llama 4 does better to court the right wing … in accordance with The adaptation by Zuckerberg of the American Republican President Donald J. Trump and his party after the 2024 elections.
Where is the Lama 4 to the so far
Meta’s Llama 4 models bring together high -end efficiency, opening and performance on multimodal and reasoning tasks.
With Scout and Maverick accessible to the public and Behemoth previewed as a model of advanced teacher, the Lama ecosystem is positioned to offer a competitive open alternative to the high -level OPENAI, Anthropic, Deepseek and Google proprietary models.
Whether you are building assistants on the business scale, IA research pipelines or long -context analytical tools, Llama 4 offers flexible and high performance options with a clear orientation towards the design of reasoning.