2025-4-29: Our paper has been accepted by IJCAI-25. Congratulations!
2025-3-31: Delivery of a Prototype System for Parameter-Efficient and Gradient Projection Methods: A Comprehensive Benchmark Against 10+ State-of-the-Art Efficient Fine-Tuning Approaches.
2024-12-30: Theoretical Insights into Fine-Tuning Attention Mechanism.
According to the traditional statistical learning viewpoint, performance can be defined by the sum of optimization error and generalization error. In (generalization, storage-friendly), we give Theorem 1 (Information-theoretic genralization bounds), showing that with the same r value, fine-tuning Wq,Wv consistently achieves results comparable to or even surpassing those of fine-tuning Wq,Wk,Wv. This reduces the number of parameters for the same r, while improving generalization bounds and potentially providing memory benefits. In (optimization, time-friendly), we discuss the learning dynamics in fine-tuning attention mechanism, and we illustrate Theorem 2 that the feature learning of attention mechanism is efficient when the learning rate for Wv should be generally much larger than that of Wq,Wk in fine-tuning. Building on our experimental and theoretical insights, one can develop new algorithms to improve the effectiveness (e.g., storage, and time) of fine-tuning.
(2) Target:
This project conducts comprehensive benchmarking of the following 10+ efficient fine-tuning methods.
Notably, our proposed approach maintains orthogonal compatibility and can be synergistically combined with any of these methods.
# choose the target method_name and modules.
EfficientFT/sh/roberta-base-peft.sh
EfficientFT/sh/llama-peft.sh
Galore.
EfficientFT/sh/roberta_galore.sh
😊Some Results
📝 Citation
@article{yao2024theoretical,
title={Theoretical Insights into Fine-Tuning Attention Mechanism: Generalization and Optimization},
author={Yao, Xinhao and Qian, Hongjin and Hu, Xiaolin and Xu, Gengze and Liu, Yong and Liu, Wei and Luan, Jian and Wang, Bin},
journal={arXiv preprint arXiv:2410.02247},
year={2024}
}
RUC & Xiaomi: Efficient Fine-Tuning 🙌🎉
📰 News
🎯 Introduction and Target
(1) Our insights (paper, in progress):
According to the traditional statistical learning viewpoint, performance can be defined by the sum of optimization error and generalization error. In (generalization, storage-friendly), we give Theorem 1 (Information-theoretic genralization bounds), showing that with the same r value, fine-tuning Wq,Wv consistently achieves results comparable to or even surpassing those of fine-tuning Wq,Wk,Wv. This reduces the number of parameters for the same r, while improving generalization bounds and potentially providing memory benefits. In (optimization, time-friendly), we discuss the learning dynamics in fine-tuning attention mechanism, and we illustrate Theorem 2 that the feature learning of attention mechanism is efficient when the learning rate for Wv should be generally much larger than that of Wq,Wk in fine-tuning. Building on our experimental and theoretical insights, one can develop new algorithms to improve the effectiveness (e.g., storage, and time) of fine-tuning.
(2) Target:
This project conducts comprehensive benchmarking of the following 10+ efficient fine-tuning methods.
Notably, our proposed approach maintains orthogonal compatibility and can be synergistically combined with any of these methods.
📖 10+ efficient fine-tuning methods
⚙️ Install
🚀 Quick Start
Get Dataset
Usage
ensure execute permissions
Full-Finetuning, LoRA, AdaLoRA, DoRa, PiSSA, rsLoRA, OLoRA, EVA, SIFT
Galore.
😊Some Results
📝 Citation