Join our daily and weekly newsletters for the latest updates and exclusive content covering cutting-edge AI. Learn more
MiniMax is perhaps best known here in the United States today as the Singapore company behind Hailuo, a realistic, high-resolution generative AI video model that rivals Runway, OpenAI’s Sora, and Luma AI’s Dream Machine.
But the company has many more tricks up its sleeve: today, for example, it announced the release and open source of the MiniMax-01 seriesa new family of models designed to handle ultra-long contexts and improve the development of AI agents.
The series includes MiniMax-Text-01, a large base language model (LLM), and MiniMax-VL-01a multimodal visual model.
A massive pop-up
The LLM, MiniMax-Text-o1, is particularly notable for allowing up to 4 million tokens in its pop-up, which is equivalent to a small library full of books. The pop-up window indicates the amount of information the LLM can handle in an input/output exchangewith words and concepts represented as digital “tokens,” the LLM’s own internal mathematical abstraction of the data it has trained on.
And while Google previously led the pack with its Gemini 1.5 Pro model and 2 Million Tokens PopupMiniMax has somehow doubled that figure!
Like MiniMax posted on his official X account today: “MiniMax-01 efficiently processes up to 4 million tokens, which is 20 to 32 times the capacity of other leading models. We believe MiniMax-01 is poised to support the anticipated increase in agent-related applications in the coming year as agents increasingly require extensive context management capabilities and memory sustained.
They are now available for download on Cuddly face And GitHub under a personalized MiniMax licenseso users can try it directly Chat with Hailuo AI (a competitor to ChatGPT/Gemini/Claude), and via MiniMax Application Programming Interface (API)where third-party developers can attach their own unique apps to it.
MiniMax offers text processing and multimodal APIs at competitive prices:
- $0.2 for 1 million entry tokens
- $1.1 for 1 million exit tokens
For comparison, OpenAI’s GPT-4o costs $2.50 for 1 million entry tokens thanks to its API, 12.5 times more expensive.
MiniMax has also integrated a Mixture of Experts (MoE) framework with 32 experts to optimize scalability. This design balances compute and memory efficiency while maintaining competitive performance on key tests.
Innovate with Lightning Attention Architecture
At the heart of the MiniMax-01 is the Lightning Attention mechanism, an innovative alternative to the traditional Transformer architecture.
This design significantly reduces computational complexity. The models include 456 billion parameters, of which 45.9 billion are inference-enabled.
Unlike previous architectures, Lightning Attention uses a mix of linear and traditional SoftMax layers, achieving near-linear complexity for long inputs. SoftMaxfor those new to the concept like me, are the transformation of input digits into probabilities totaling 1, so that the LLM can approximate the meaning of the most likely input.
MiniMax has rebuilt its training and inference frameworks to support the Lightning Attention architecture. Key improvements include:
- Optimizing MoE all-to-all communication: Reduces inter-GPU communication overhead.
- Be careful, Varlen’s ring: minimizes computer waste for processing long sequences.
- Efficient kernel implementations: Customized CUDA cores improve Lightning Attention performance.
These advancements make MiniMax-01 models accessible to real-world applications while remaining affordable.
Performance and references
On consumer text and multimodal tests, MiniMax-01 rivals leading models like GPT-4 and Claude-3.5, with particularly strong results on long-context assessments. Notably, MiniMax-Text-01 achieved 100% accuracy on the Needle-In-A-Haystack task with a context of 4 million tokens.
The models also demonstrate minimal performance degradation as input length increases.
MiniMax plans regular updates to expand the models’ capabilities, including code and multimodal improvements.
The company views open source as a step toward creating foundational AI capabilities for the evolving AI agent landscape.
As 2025 promises to be a transformative year for AI agents, the need for durable memory and effective inter-agent communication increases. MiniMax innovations are designed to address these challenges.
Open to collaboration
MiniMax invites developers and researchers to explore the capabilities of MiniMax-01. Beyond open source, its team welcomes technical suggestions and collaboration requests on model@minimaxi.com.
With its commitment to cost-effective and scalable AI, MiniMax is positioned as a key player in shaping the era of AI agents. The MiniMax-01 series offers developers an exciting opportunity to push the boundaries of what long-context AI can achieve.