PRUNA AIA European startup that works on compression algorithms for AI models, makes its optimization framework open source THURSDAY.
PRUNA AI has created a framework that applies several efficiency methods, such as chatting, pruning, quantification and distillation, to a given AI model.
“We also normalize the backup and loading of compressed models, the application of combinations of these compression methods, and also evaluate your compressed model after having compressed it,” Techcrunch, co-friendly pruna AI and CTO Rachwan.
In particular, the framework of PRUNA AI can assess if there is a significant loss of quality after having compressed a model and the performance gains you get.
“If I had to use a metaphor, we are similar to how the embraces are facing standardized transformers and diffusers-how to call them, how to save them, load them, etc. We do the same, but for efficiency methods,” he added.
Large AI laboratories have already used various compression methods. For example, Openai counted on distillation to create faster versions of its flagship models.
This is probably how Openai developed GPT-4 Turbo, a faster version of GPT-4. Likewise, the Flow. 1-SCHNELL The image generation model is a distilled version of the Black Forest Labs Flux.1 model.
Distillation is a technique used to extract the knowledge of a large AI model with a “teacher-student” model. The developers send requests to a teacher model and record outings. The answers are sometimes compared to a set of data to see how precise they are. These outings are then used to form the student model, which is formed to approximate the teacher’s behavior.
“For large companies, what they usually do is that they build these elements internally. And what you can find in the open source world is generally based on unique methods. For example, let’s say a quantification method for LLM, or a method of chatting for dissemination models,” said Rachwan. “But you cannot find a tool that brings them all, makes them all easy to use and combine together. And this is the great value that Pruna is bringing right now.”
While Pruna AI supports any type of model, large -language models with diffusion models, speech models and text and computer vision models, the company focuses more on image models and generation of videos at the moment.
Some of the existing PRUNA AI users include Scenario And Photoro -Avy. In addition to the open source publishing, PRUNA AI offers a business offer with advanced optimization features, including an optimization agent.
“The most exciting feature we publish will be a compression agent,” said Rachwan. “Basically, you give him your model, you say:” I want more speed but do not lower my precision over 2%. “And then, the agent will simply make his magic.
PRUNA AI Boomed at the time of its pro version. “This is similar to the way you think of a GPU when you rent a GPU on AWS or any cloud service,” said Rachwan.
And if your model is an essential element of your AI infrastructure, you will eventually save a lot of money on inference with the optimized model. For example, Pruna AI has made a model of Lama eight times smaller without too much loss using its compression framework. PRUNA AI hopes that its customers will think of its compression framework as an investment that is paid for itself.
Pruna AI has raised a seed financing round of $ 6.5 million a few months ago. Startup investors include EQT Ventures, Daphni, Motier Ventures and Kima Ventures.