Skip to yearly menu bar Skip to main content


Poster

FW-Merging: Scaling Model Merging with Frank-Wolfe Optimization

Hao Chen · Shell Xu Hu · Wayne Luk · Timothy Hospedales · Hongxiang Fan


Abstract:

Model merging has emerged as a promising approach for multi-task learning (MTL) in large language models (LLMs), providing a training- and data-efficient alternative to conventional fine-tuning. However, with the rapid development of the open-source AI ecosystem and the increasing availability of fine-tuned LLMs, existing model merging methods face two key limitations: (i) they are primarily designed for in-house fine-tuned models, making them less adaptable to diverse model sources with partially unknown model and task information, (ii) they struggle to scale effectively when merging numerous model checkpoints.To address these challenges, we formulate model merging as a constrained optimization problem and introduce a novel approach: Frank-Wolfe Merging (FW-Merging). Inspired by the Frank-Wolfe optimization, our approach iteratively selects the most relevant model parameters to minimize a linear approximation of the objective function, merging them through a predefined merging function. The objective function is designed to capture the desired behavior of the target merged model, while the fine-tuned candidate models defines the constraint set.More importantly, FW-Merging serves as an orthogonal technique to existing merging methods, seamlessly integrating with them to further enhance performance.Our experiments show that FW-Merging scales across diverse model sources, remaining stable with 16 irrelevant models and improving by 15.3% with 16 relevant models on 20 CV tasks, all while maintaining constant memory overhead—unlike the linear overhead of data-informed methods.Compared with the state-of-the-art methods, FW-Merging surpasses the data-free merging method by 32.8% and outperforms the data-informed Adamerging by 8.39% when merging 20 ViT models. Our code is attached with this submission.

Live content is unavailable. Log in and register to view live content