RUC & Xiaomi: Efficient Fine-Tuning 🙌🎉

📰 News

2025-4-29: Our paper has been accepted by IJCAI-25. Congratulations!
2025-3-31: Delivery of a Prototype System for Parameter-Efficient and Gradient Projection Methods: A Comprehensive Benchmark Against 10+ State-of-the-Art Efficient Fine-Tuning Approaches.
2024-12-30: Theoretical Insights into Fine-Tuning Attention Mechanism.

🎯 Introduction and Target

(1) Our insights (paper, in progress):

According to the traditional statistical learning viewpoint, performance can be defined by the sum of optimization error and generalization error. In (generalization, storage-friendly), we give Theorem 1 (Information-theoretic genralization bounds), showing that with the same $r$ value, fine-tuning $\mathbf{W}_q,\mathbf{W}_v$ consistently achieves results comparable to or even surpassing those of fine-tuning $\mathbf{W}_q,\mathbf{W}_k,\mathbf{W}_v$ . This reduces the number of parameters for the same $r$ , while improving generalization bounds and potentially providing memory benefits. In (optimization, time-friendly), we discuss the learning dynamics in fine-tuning attention mechanism, and we illustrate Theorem 2 that the feature learning of attention mechanism is efficient when the learning rate for $\mathbf{W}_v$ should be generally much larger than that of $\mathbf{W}_q,\mathbf{W}_k$ in fine-tuning. Building on our experimental and theoretical insights, one can develop new algorithms to improve the effectiveness (e.g., storage, and time) of fine-tuning.

theorem1

theorem2

(2) Target:

$\text{\textcolor{blue}{This project conducts comprehensive benchmarking of the following 10+ efficient fine-tuning methods.}}$

Notably, our proposed approach maintains orthogonal compatibility and can be synergistically combined with any of these methods.

📖 10+ efficient fine-tuning methods

LoRA (ICLR 2022)
AdaLoRA (ICLR 2023)
DoRA (ICML Oral)
PiSSA (NeurIPS 2024)
rsLoRA
OLoRA
EVA
IA3
SIFT (ICML 2024)
Galore (ICML 2024 Oral)

⚙️ Install

To install the experiment, please install the pip file.

pip install -r requirements.txt

(Optional) For SIFT&Galore

git clone git@github.com:song-wx/SIFT.git
cd SIFT
pip install .

pip install galore-torch

🚀 Quick Start

Get Dataset

data_download.py

Usage

ensure execute permissions
```
chmod +x xxx.sh  #xxx->your file name
```

Full-Finetuning, LoRA, AdaLoRA, DoRa, PiSSA, rsLoRA, OLoRA, EVA, SIFT

# choose the target method_name and modules.
EfficientFT/sh/roberta-base-peft.sh 
EfficientFT/sh/llama-peft.sh

Galore.
```
EfficientFT/sh/roberta_galore.sh
```

😊Some Results

res1

📝 Citation

@article{yao2024theoretical,
  title={Theoretical Insights into Fine-Tuning Attention Mechanism: Generalization and Optimization},
  author={Yao, Xinhao and Qian, Hongjin and Hu, Xiaolin and Xu, Gengze and Liu, Yong and Liu, Wei and Luan, Jian and Wang, Bin},
  journal={arXiv preprint arXiv:2410.02247},
  year={2024}
}