Pruna AI, a European startup known for its advanced AI model compression tools, has officially open-sourced its model optimization framework. Starting Thursday, developers worldwide can access this innovative tool designed to make AI models leaner and faster without compromising quality.
The company’s framework combines several cutting-edge efficiency techniques, including caching, pruning, quantization, and distillation. By integrating these methods, Pruna AI enables developers to compress large AI models while maintaining control over performance and accuracy.
According to John Rachwan, Pruna AI’s co-founder and CTO, the platform also standardizes the process of saving, loading, and evaluating compressed models. “We make it simple to apply multiple compression techniques and then assess if your model still performs well after compression,” Rachwan shared in an interview with TechCrunch.
What sets Pruna AI apart is its ability to measure any performance gains and detect quality loss—giving developers crucial insights into how well their optimized model performs in real scenarios.
Rachwan compared Pruna’s framework to Hugging Face’s influence on transformer models. “Hugging Face standardized calling, saving, and loading transformers and diffusers. We’re doing the same— but focused on AI model efficiency,” he explained.
Large AI labs like OpenAI have been using techniques like distillation to build lighter, faster models. For example, OpenAI likely used distillation when developing GPT-4 Turbo, a speed-optimized version of GPT-4. Similarly, Black Forest Labs distilled their Flux.1 image generation model into the Flux.1-schnell version for enhanced speed.
Distillation works through a teacher-student approach, where developers run prompts through a large model (teacher) and use its outputs to train a smaller, more efficient model (student) that mimics the teacher’s behavior with minimal accuracy loss.
However, most open-source tools available today only focus on individual compression methods—like a single quantization technique for large language models or basic caching for diffusion models. This fragmented approach forces companies to either build custom solutions in-house or struggle with incomplete tools.
“What’s missing is a comprehensive framework that unifies these methods and makes them easy to use together. That’s the gap Pruna AI is filling,” Rachwan said.
Currently, the platform supports a wide range of models, from LLMs (Large Language Models) to diffusion models, speech-to-text systems, and computer vision models. However, Pruna AI is honing its focus on the booming image and video generation sectors, where demand for efficiency is growing fast.
Early adopters of Pruna’s framework include Scenario and PhotoRoom, two companies known for their creative AI applications. Alongside its open-source version, Pruna AI offers a premium enterprise plan packed with advanced features, including an AI-driven compression agent.
One of the standout upcoming features is the optimization agent, designed to automate the compression process based on user-defined goals. “Just feed your model into the agent and tell it, ‘I want more speed but no more than a 2% drop in accuracy,’ and it will handle the rest. Developers won’t need to lift a finger,” Rachwan explained.
Pruna AI’s enterprise model operates on a pay-as-you-go model, similar to cloud services like AWS. “Think of it like renting a GPU—our users pay by the hour,” Rachwan added.
For businesses running AI models at scale, this optimization framework offers serious cost savings. Pruna AI recently reduced the size of a Llama model by eight times with minimal accuracy loss—dramatically cutting inference costs and boosting speed.
Pruna AI views its optimization framework as a strategic investment, especially for AI startups and companies where model performance directly impacts revenue. The long-term savings on compute resources could easily outweigh the initial costs.
The startup recently secured $6.5 million in seed funding from top-tier investors including EQT Ventures, Daphni, Motier Ventures, and Kima Ventures. With this fresh capital, Pruna AI aims to accelerate its product development and scale its platform to support even more AI use cases.