Sakana AI’s CycleQD outperforms traditional fine-tuning methods for multi-skill language models
Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More
Researchers at Sakana AI have developed a resource-efficient framework that can create hundreds of language models specializing in different tasks. Called CycleQD, the technique uses evolutionary algorithms to combine the skills of different models without the need for expensive and slow training processes.
CycleQD can create swarms of task-specific agents that offer a more sustainable alternative to the current paradigm of increasing model size.
Rethinking model training
Large language models (LLMs) have shown remarkable capabilities in various tasks. However, training LLMs to master multiple skills remains a challenge. When fine-tuning models, engineers must balance data from different skills and ensure that one skill doesn’t dominate the others. Current approaches often involve training ever-larger models, which leads to increasing computational demands and resource requirements.
“We believe rather than aiming to develop a single large model to perform well on all tasks, population-based approaches to evolve a diverse swarm of niche models may offer an alternative, more sustainable path to scaling up the development of AI agents with advanced capabilities,” the Sakana researchers write in a blog post.
To create populations of models, the researchers took inspiration from quality diversity (QD), an evolutionary computing paradigm that focuses on discovering a diverse set of solutions from an initial population sample. QD aims at creating specimens with various “behavior characteristics” (BCs), which represent different skill domains. It achieves this through evolutionary algorithms (EA) that select parent examples and use crossover and mutation operations to create new samples.
CycleQD
CycleQD incorporates QD into the post-training pipeline of LLMs to help them learn new, complex skills. CycleQD is useful when you have multiple small models that have been fine-tuned for very specific skills, such as coding or performing database and operating system operations, and you want to create new variants that have different combinations of those skills.
In the CycleQD framework, each of these skills is considered a behavior characteristic or a quality that the next generation of models is optimized for. In each generation, the algorithm focuses on one specific skill as its quality metric while using the other skills as BCs.
“This ensures every skill gets its moment in the spotlight, allowing the LLMs to grow more balanced and capable overall,” the researchers explain.
CycleQD starts with a set of expert LLMs, each specialized in a single skill. The algorithm then applies “crossover” and “mutation” operations to add new higher-quality models to the population. Crossover combines the characteristics of two parent models to create a new model while mutation makes random changes to the model to explore new possibilities.
The crossover operation is based on model merging, a technique that combines the parameters of two LLMs to create a new model with combined skills. This is a cost-effective and quick method for developing well-rounded models without the need to fine-tune them.
The mutation operation uses singular value decomposition (SVD), a factorization method that breaks down any matrix into simpler components, making it easier to understand and manipulate its elements. CycleQD uses SVD to break down the model’s skills into fundamental components or sub-skills. By tweaking these sub-skills, the mutation process creates models that explore new capabilities beyond those of their parent models. This helps the models avoid getting stuck in predictable patterns and reduces the risk of overfitting.
Evaluating CycleQD’s performance
The researchers applied CycleQD to a set of Llama 3-8B expert models fine-tuned for coding, database operations and operating system operations. The goal was to see if the evolutionary method could combine the skills of the three models to create a superior model.
The results showed that CycleQD outperformed traditional fine-tuning and model merging methods across the evaluated tasks. Notably, a model fine-tuned on all datasets combined performed only marginally better than the single-skill expert models, despite being trained on more data. Moreover, the traditional training process is much slower and more expensive. CycleQD was also able to create various models with different performance levels on the target tasks.
“These results clearly show that CycleQD outperforms traditional methods, proving its effectiveness in training LLMs to excel across multiple skills,” the researchers write.
The researchers believe that CycleQD has the potential to enable lifelong learning in AI systems, allowing them to continuously grow, adapt and accumulate knowledge over time. This can have direct implications for real-world applications. For example, CycleQD can be used to continuously merge the skills of expert models instead of training a large model from scratch.
Another exciting direction is the development of multi-agent systems, where swarms of specialized agents evolved through CycleQD can collaborate, compete and learn from one another.
“From scientific discovery to real-world problem-solving, swarms of specialized agents could redefine the limits of AI,” the researchers write.