A comprehensive library to post-train foundation models
🎉 What’s New
TRL v1: We released TRL v1 — a major milestone that marks a real shift in what TRL is. Read the blog post to learn more.
Overview
TRL is a cutting-edge library designed for post-training foundation models using advanced techniques like Supervised Fine-Tuning (SFT), Group Relative Policy Optimization (GRPO), and Direct Preference Optimization (DPO). Built on top of the 🤗 Transformers ecosystem, TRL supports a variety of model architectures and modalities, and can be scaled-up across various hardware setups.
If you want to use the examples you can clone the repository with the following command:
git clone https://github.com/huggingface/trl.git
Quick Start
For more flexibility and control over training, TRL provides dedicated trainer classes to post-train language models or PEFT adapters on a custom dataset. Each trainer in TRL is a light wrapper around the 🤗 Transformers trainer and natively supports distributed training methods like DDP, DeepSpeed ZeRO, and FSDP.
SFTTrainer
Here is a basic example of how to use the SFTTrainer:
from trl import SFTTrainer
from datasets import load_dataset
dataset = load_dataset("trl-lib/Capybara", split="train")
trainer = SFTTrainer(
model="Qwen/Qwen2.5-0.5B",
train_dataset=dataset,
)
trainer.train()
from datasets import load_dataset
from trl import DPOTrainer
dataset = load_dataset("trl-lib/ultrafeedback_binarized", split="train")
trainer = DPOTrainer(
model="Qwen3/Qwen-0.6B",
train_dataset=dataset,
)
trainer.train()
RewardTrainer
Here is a basic example of how to use the RewardTrainer:
from trl import RewardTrainer
from datasets import load_dataset
dataset = load_dataset("trl-lib/ultrafeedback_binarized", split="train")
trainer = RewardTrainer(
model="Qwen/Qwen2.5-0.5B-Instruct",
train_dataset=dataset,
)
trainer.train()
Command Line Interface (CLI)
You can use the TRL Command Line Interface (CLI) to quickly get started with post-training methods like Supervised Fine-Tuning (SFT) or Direct Preference Optimization (DPO):
If you want to contribute to trl or customize it to your needs make sure to read the contribution guide and make sure you make a dev install:
git clone https://github.com/huggingface/trl.git
cd trl/
pip install -e .[dev]
Experimental
A minimal incubation area is available under trl.experimental for unstable / fast-evolving features. Anything there may change or be removed in any release without notice.
Example:
from trl.experimental.new_trainer import NewTrainer
@software{vonwerra2020trl,
title = {{TRL: Transformers Reinforcement Learning}},
author = {von Werra, Leandro and Belkada, Younes and Tunstall, Lewis and Beeching, Edward and Thrush, Tristan and Lambert, Nathan and Huang, Shengyi and Rasul, Kashif and Gallouédec, Quentin},
license = {Apache-2.0},
url = {https://github.com/huggingface/trl},
year = {2020}
}
License
This repository’s source code is available under the Apache-2.0 License.
TRL - Transformers Reinforcement Learning
A comprehensive library to post-train foundation models
🎉 What’s New
TRL v1: We released TRL v1 — a major milestone that marks a real shift in what TRL is. Read the blog post to learn more.
Overview
TRL is a cutting-edge library designed for post-training foundation models using advanced techniques like Supervised Fine-Tuning (SFT), Group Relative Policy Optimization (GRPO), and Direct Preference Optimization (DPO). Built on top of the 🤗 Transformers ecosystem, TRL supports a variety of model architectures and modalities, and can be scaled-up across various hardware setups.
Highlights
Trainers: Various fine-tuning methods are easily accessible via trainers like
SFTTrainer,GRPOTrainer,DPOTrainer,RewardTrainerand more.Efficient and scalable:
Command Line Interface (CLI): A simple interface lets you fine-tune with models without needing to write code.
Installation
Python Package
Install the library using
pip:From source
If you want to use the latest features before an official release, you can install TRL from source:
Repository
If you want to use the examples you can clone the repository with the following command:
Quick Start
For more flexibility and control over training, TRL provides dedicated trainer classes to post-train language models or PEFT adapters on a custom dataset. Each trainer in TRL is a light wrapper around the 🤗 Transformers trainer and natively supports distributed training methods like DDP, DeepSpeed ZeRO, and FSDP.
SFTTrainerHere is a basic example of how to use the
SFTTrainer:GRPOTrainerGRPOTrainerimplements the Group Relative Policy Optimization (GRPO) algorithm that is more memory-efficient than PPO and was used to train Deepseek AI’s R1.DPOTrainerDPOTrainerimplements the popular Direct Preference Optimization (DPO) algorithm that was used to post-train Llama 3 and many other models. Here is a basic example of how to use theDPOTrainer:RewardTrainerHere is a basic example of how to use the
RewardTrainer:Command Line Interface (CLI)
You can use the TRL Command Line Interface (CLI) to quickly get started with post-training methods like Supervised Fine-Tuning (SFT) or Direct Preference Optimization (DPO):
SFT:
DPO:
Read more about CLI in the relevant documentation section or use
--helpfor more details.Development
If you want to contribute to
trlor customize it to your needs make sure to read the contribution guide and make sure you make a dev install:Experimental
A minimal incubation area is available under
trl.experimentalfor unstable / fast-evolving features. Anything there may change or be removed in any release without notice.Example:
Read more in the Experimental docs.
Citation
License
This repository’s source code is available under the Apache-2.0 License.