Join our daily and weekly newsletters for the latest updates and the exclusive content on AI coverage. Learn more
Researchers at Alibaba group have developed a new approach that could considerably reduce the cost and complexity of AI systems training to search for information, completely eliminating the need for expensive commercial search engines.
The technique, called “Zerosearch», Allows large language models (LLMS) to develop advanced research capacities thanks to a simulation approach rather than interact with real search engines during the training process. This innovation could allow important API expenditure companies while providing better control over how AI systems learn to recover information.
“Learning to strengthen [RL] Training requires frequent deployments, potentially involving hundreds of thousands of research requests, which result in substantial API expenses and seriously limit scalability, ”write researchers in their Document published on Arxiv this week. “To meet these challenges, we introduce Zerosearch, a strengthening learning framework that encourages LLM search capacities without interacting with real search engines.”
Alibaba has just dropped Zerosearch on an embroidered face
Encourage LLMS search capacity without looking pic.twitter.com/qfnijno3lh
– AK (@_akhaliq) May 8, 2025
How Zerosearch trains to search without search engines
The problem that Zerosearch The resolved is significant. Companies that develop AI assistants who can independently seek information faced with two major challenges: the unpredictable quality of documents returned by search engines during training, and prohibitive costs to make hundreds of thousands of API calls to commercial search engines like Google.
Alibaba’s approach begins with a light supervised fine adjustment process to transform an LLM into a recovery module capable of generating relevant and non -relevant documents in response to a request. During the strengthening apprenticeship training, the system uses what researchers call a “programs -based deployment strategy” which gradually degrades the quality of the documents generated.
“Our key overview is that LLM have acquired in-depth global knowledge during large-scale pre-training and are able to generate relevant documents given a research request,” explains the researchers. “The main difference between a real search engine and an LLM simulation lies in the textual style of the returned content.”
Outline Google to a cost fraction
In complete experiences through Seven sets of answers to questionsZerosearch was not only corresponding, but often exceeded the performance of the models formed with real search engines. Remarkably, a Parameter 7B parameter recovery module Performances performed comparable to Google research, while a Parameter 14b module Even outclassed it.
Cost savings are substantial. According to researchers’ analysis, training with around 64,000 research requests using Google search via Serpapi would cost about $ 586.70, while the use of an LLM of parameter 14B simulation in four A100 GPUs costs only $ 70.80 – a reduction of 88%.
“This demonstrates the feasibility of using a well -formed LLM as a substitute for real search engines in strengthening learning configurations,” notes the document.
What this means for the future of AI development
This breakthrough is a major change in the way AI systems can be trained. Zerosearch shows that AI can improve without depending on external tools such as search engines.
The impact could be substantial for the AI industry. Until now, the training of advanced AI systems often required expensive API calls to services controlled by large technological companies. Zerosearch modifies this equation by allowing AI to simulate search instead of using real search engines.
For small IA companies and startups with limited budgets, this approach could level the rules of the game. The high costs of API calls have been a major obstacle to the development of Sophisticated AI Assistants. By reducing these costs by almost 90%, Zerosearch makes the training advanced in AI more accessible.
Beyond cost savings, this technique gives developers more control over the training process. When using real search engines, the quality of the returned documents is unpredictable. With simulated research, developers can accurately control the information that AI sees during training.
The technique works in several families of models, in particular Qwen-2.5 And LAMA-3.2and with basic variants and instructions. The researchers have rendered their code, their data set and their pre-formed models available on Github And Faceallowing other researchers and companies to implement the approach.
While large languages models continue to evolve, techniques like Zerosearch Suggest a future where AI systems can develop increasingly sophisticated capacities thanks to self -simulation rather than relying on external services – potentially modifying the economy of AI development and reducing dependencies on large technological platforms.
The irony is clear: in the teaching of the AI to be sought without search engines, Alibaba may have created a technology that makes traditional search engines less necessary for the development of the AI. As these systems become more self -sufficient, the technological landscape could be very different in a few years.