Text-to-Speech (TTS) is now supported, including sesame/csm-1b and STT openai/whisper-large-v3.
Qwen3 is now supported. Qwen3-30B-A3B fits on 17.5GB VRAM.
Introducing Dynamic 2.0 quants that set new benchmarks on 5-shot MMLU & Aider Polyglot.
EVERYTHING is now supported - all models (TTS, BERT, Mamba), FFT, etc. MultiGPU is now supported. Enable FFT with full_finetuning = True, 8-bit with load_in_8bit = True.
📣 Introducing Long-context Reasoning (GRPO) in Unsloth. Train your own reasoning model with just 5GB VRAM. Transform Llama, Phi, Mistral etc. into reasoning LLMs!
📣 Introducing Unsloth Dynamic 4-bit Quantization! We dynamically opt not to quantize certain parameters and this greatly increases accuracy while only using <10% more VRAM than BnB 4-bit. See our collection on Hugging Face here.
📣 Llama 4 by Meta, including Scout & Maverick are now supported.
📣 We worked with Apple to add Cut Cross Entropy. Unsloth now supports 89K context for Meta’s Llama 3.3 (70B) on a 80GB GPU - 13x longer than HF+FA2. For Llama 3.1 (8B), Unsloth enables 342K context, surpassing its native 128K support.
Check Your GPU and CUDA Version: Run nvidia-smi to confirm that your NVIDIA GPU is detected and note the CUDA version shown in the output. If nvidia-smi does not work, reinstall the latest NVIDIA drivers.
Install PyTorch: Install the Windows pip build of PyTorch that matches your CUDA version. Use Install PyTorch to select the correct command for your system, then verify that PyTorch can see your GPU.
import torch
print(torch.cuda.is_available())
A = torch.ones((10, 10), device="cuda")
B = torch.ones((10, 10), device="cuda")
A @ B
Install Unsloth: Only install Unsloth after PyTorch is working correctly.
pip install unsloth
Advanced/Troubleshooting
For advanced installation instructions or if you see weird errors during installations:
First try using an isolated environment via then pip install unsloth
Check if xformers succeeded with python -m xformers.info Go to https://github.com/facebookresearch/xformers. Another option is to install flash-attn for Ampere GPUs and ignore xformers
For GRPO runs, you can try installing vllm and seeing if pip install vllm succeeds.
Double check that your versions of Python, CUDA, CUDNN, torch, triton, and xformers are compatible with one another. The PyTorch Compatibility Matrix may be useful.
Finally, install bitsandbytes and check it with python -m bitsandbytes
Conda Installation (Optional)
⚠️Only use Conda if you have it. If not, use Pip. We support python=3.10,3.11,3.12,3.13.
⚠️Do **NOT** use this if you have Conda. Pip is a bit more complex since there are dependency issues. The pip command is different for torch 2.2,2.3,2.4,2.5,2.6,2.7,2.8,2.9,2.10 and CUDA versions.
For other torch versions, we support torch211, torch212, torch220, torch230, torch240, torch250, torch260, torch270, torch280, torch290, torch2100 and for CUDA versions, we support cu118 and cu121 and cu124. For Ampere devices (A100, H100, RTX3090) and above, use cu118-ampere or cu121-ampere or cu124-ampere. Note: torch 2.10 only supports CUDA 12.6, 12.8, and 13.0.
For example, if you have torch 2.4 and CUDA 12.1, use:
try: import torch
except: raise ImportError('Install torch via `pip install torch`')
from packaging.version import Version as V
import re
v = V(re.match(r"[0-9\.]{3,}", torch.__version__).group(0))
cuda = str(torch.version.cuda)
is_ampere = torch.cuda.get_device_capability()[0] >= 8
USE_ABI = torch._C._GLIBCXX_USE_CXX11_ABI
if cuda not in ("11.8", "12.1", "12.4", "12.6", "12.8", "13.0"): raise RuntimeError(f"CUDA = {cuda} not supported!")
if v <= V('2.1.0'): raise RuntimeError(f"Torch = {v} too old!")
elif v <= V('2.1.1'): x = 'cu{}{}-torch211'
elif v <= V('2.1.2'): x = 'cu{}{}-torch212'
elif v < V('2.3.0'): x = 'cu{}{}-torch220'
elif v < V('2.4.0'): x = 'cu{}{}-torch230'
elif v < V('2.5.0'): x = 'cu{}{}-torch240'
elif v < V('2.5.1'): x = 'cu{}{}-torch250'
elif v <= V('2.5.1'): x = 'cu{}{}-torch251'
elif v < V('2.7.0'): x = 'cu{}{}-torch260'
elif v < V('2.7.9'): x = 'cu{}{}-torch270'
elif v < V('2.8.0'): x = 'cu{}{}-torch271'
elif v < V('2.8.9'): x = 'cu{}{}-torch280'
elif v < V('2.9.1'): x = 'cu{}{}-torch290'
elif v < V('2.9.2'): x = 'cu{}{}-torch291'
elif v < V('2.10.1'): x = 'cu{}{}-torch2100'
else: raise RuntimeError(f"Torch = {v} too new!")
if v > V('2.6.9') and cuda not in ("11.8", "12.6", "12.8", "13.0"): raise RuntimeError(f"CUDA = {cuda} not supported!")
if v >= V('2.10.0') and cuda not in ("12.6", "12.8", "13.0"): raise RuntimeError(f"Torch 2.10 requires CUDA 12.6, 12.8, or 13.0! Got CUDA = {cuda}")
x = x.format(cuda.replace(".", ""), "-ampere" if False else "") # is_ampere is broken due to flash-attn
print(f'pip install --upgrade pip && pip install --no-deps git+https://github.com/unslothai/unsloth-zoo.git && pip install "unsloth[{x}] @ git+https://github.com/unslothai/unsloth.git" --no-build-isolation')
Docker Installation
You can use our pre-built Docker container with all dependencies to use Unsloth instantly with no setup required.
Read our guide.
For our most detailed benchmarks, read our Llama 3.3 Blog.
Benchmarking of Unsloth was also conducted by 🤗Hugging Face.
We tested using the Alpaca Dataset, a batch size of 2, gradient accumulation steps of 4, rank = 32, and applied QLoRA on all linear layers (q, k, v, o, gate, up, down):
Model
VRAM
🦥 Unsloth speed
🦥 VRAM reduction
🦥 Longer context
😊 Hugging Face + FA2
Llama 3.3 (70B)
80GB
2x
>75%
13x longer
1x
Llama 3.1 (8B)
80GB
2x
>70%
12x longer
1x
Context length benchmarks
Llama 3.1 (8B) max. context length
We tested Llama 3.1 (8B) Instruct and did 4bit QLoRA on all linear layers (Q, K, V, O, gate, up and down) with rank = 32 with a batch size of 1. We padded all sequences to a certain maximum sequence length to mimic long context finetuning workloads.
| GPU VRAM | 🦥Unsloth context length | Hugging Face + FA2 |
|———-|———————–|—————–|
| 8 GB | 2,972 | OOM |
| 12 GB | 21,848 | 932 |
| 16 GB | 40,724 | 2,551 |
| 24 GB | 78,475 | 5,789 |
| 40 GB | 153,977 | 12,264 |
| 48 GB | 191,728 | 15,502 |
| 80 GB | 342,733 | 28,454 |
Llama 3.3 (70B) max. context length
We tested Llama 3.3 (70B) Instruct on a 80GB A100 and did 4bit QLoRA on all linear layers (Q, K, V, O, gate, up and down) with rank = 32 with a batch size of 1. We padded all sequences to a certain maximum sequence length to mimic long context finetuning workloads.
GPU VRAM
🦥Unsloth context length
Hugging Face + FA2
48 GB
12,106
OOM
80 GB
89,389
6,916
Citation
You can cite the Unsloth repo as follows:
@software{unsloth,
author = {Daniel Han, Michael Han and Unsloth team},
title = {Unsloth},
url = {https://github.com/unslothai/unsloth},
year = {2023}
}
Train gpt-oss, DeepSeek, Gemma, Qwen & Llama 2x faster with 70% less VRAM!
✨ Train for Free
Notebooks are beginner friendly. Read our guide. Add dataset, run, then deploy your trained model.
⚡ Quickstart
Linux or WSL
Windows
For Windows,
pip install unslothworks only if you have Pytorch installed. Read our Windows Guide.Docker
Use our official Unsloth Docker image
unsloth/unslothcontainer. Read our Docker Guide.AMD, Intel, Blackwell & DGX Spark
For RTX 50x, B200, 6000 GPUs:
pip install unsloth. Read our guides for: Blackwell and DGX Spark.To install Unsloth on AMD and Intel GPUs, follow our AMD Guide and Intel Guide.
🦥 Unsloth News
Click for more news
sesame/csm-1band STTopenai/whisper-large-v3.full_finetuning = True, 8-bit withload_in_8bit = True.🔗 Links and Resources
⭐ Key Features
💾 Install Unsloth
You can also see our docs for more detailed installation and updating instructions here.
Unsloth supports Python 3.13 or lower.
Pip Installation
Install with pip (recommended) for Linux devices:
To update Unsloth:
See here for advanced pip install instructions.
Windows Installation
For this method, we will be utilizing Anaconda. You can view the full guide with screenshots here.
Install Miniconda (or Anaconda): Miniconda is recommended. Install Miniconda or Anaconda, then open Anaconda PowerShell Prompt to continue.
Create a Conda Environment: Create and activate a fresh Python 3.12 environment for Unsloth.
Check Your GPU and CUDA Version: Run
nvidia-smito confirm that your NVIDIA GPU is detected and note the CUDA version shown in the output. Ifnvidia-smidoes not work, reinstall the latest NVIDIA drivers.Install PyTorch: Install the Windows pip build of PyTorch that matches your CUDA version. Use Install PyTorch to select the correct command for your system, then verify that PyTorch can see your GPU.
Install Unsloth: Only install Unsloth after PyTorch is working correctly.
Advanced/Troubleshooting
For advanced installation instructions or if you see weird errors during installations:
First try using an isolated environment via then
pip install unslothInstall
torchandtriton. Go to https://pytorch.org to install it. For examplepip install torch torchvision torchaudio tritonConfirm if CUDA is installed correctly. Try
nvcc. If that fails, you need to installcudatoolkitor CUDA drivers.Install
xformersmanually via:Check if
xformerssucceeded withpython -m xformers.infoGo to https://github.com/facebookresearch/xformers. Another option is to installflash-attnfor Ampere GPUs and ignorexformersFor GRPO runs, you can try installing
vllmand seeing ifpip install vllmsucceeds.Double check that your versions of Python, CUDA, CUDNN,
torch,triton, andxformersare compatible with one another. The PyTorch Compatibility Matrix may be useful.Finally, install
bitsandbytesand check it withpython -m bitsandbytesConda Installation (Optional)
⚠️Only use Conda if you have it. If not, use Pip. We supportpython=3.10,3.11,3.12,3.13.Use
nvidia-smito get the correct CUDA version like 13.0 which becomescu130If you're looking to install Conda in a Linux environment, read here, or run the below 🔽
Advanced Pip Installation
⚠️Do **NOT** use this if you have Conda.Pip is a bit more complex since there are dependency issues. The pip command is different fortorch 2.2,2.3,2.4,2.5,2.6,2.7,2.8,2.9,2.10and CUDA versions.For other torch versions, we support
torch211,torch212,torch220,torch230,torch240,torch250,torch260,torch270,torch280,torch290,torch2100and for CUDA versions, we supportcu118andcu121andcu124. For Ampere devices (A100, H100, RTX3090) and above, usecu118-ampereorcu121-ampereorcu124-ampere. Note: torch 2.10 only supports CUDA 12.6, 12.8, and 13.0.For example, if you have
torch 2.4andCUDA 12.1, use:Another example, if you have
torch 2.9andCUDA 13.0, use:Another example, if you have
torch 2.10andCUDA 12.6, use:And other examples:
Or, run the below in a terminal to get the optimal pip installation command:
Or, run the below manually in a Python REPL:
Docker Installation
You can use our pre-built Docker container with all dependencies to use Unsloth instantly with no setup required. Read our guide.
This container requires installing NVIDIA’s Container Toolkit.
Access Jupyter Lab at
http://localhost:8888and start fine-tuning!📜 Documentation
Unsloth example code to fine-tune gpt-oss-20b:
💡 Reinforcement Learning
RL including GRPO, GSPO, FP8 training, DrGRPO, DAPO, PPO, Reward Modelling, Online DPO all work with Unsloth.
Read our Reinforcement Learning Guide or our advanced RL docs for batching, generation & training parameters.
List of RL notebooks:
🥇 Performance Benchmarking
We tested using the Alpaca Dataset, a batch size of 2, gradient accumulation steps of 4, rank = 32, and applied QLoRA on all linear layers (q, k, v, o, gate, up, down):
Context length benchmarks
Llama 3.1 (8B) max. context length
We tested Llama 3.1 (8B) Instruct and did 4bit QLoRA on all linear layers (Q, K, V, O, gate, up and down) with rank = 32 with a batch size of 1. We padded all sequences to a certain maximum sequence length to mimic long context finetuning workloads. | GPU VRAM | 🦥Unsloth context length | Hugging Face + FA2 | |———-|———————–|—————–| | 8 GB | 2,972 | OOM | | 12 GB | 21,848 | 932 | | 16 GB | 40,724 | 2,551 | | 24 GB | 78,475 | 5,789 | | 40 GB | 153,977 | 12,264 | | 48 GB | 191,728 | 15,502 | | 80 GB | 342,733 | 28,454 |
Llama 3.3 (70B) max. context length
We tested Llama 3.3 (70B) Instruct on a 80GB A100 and did 4bit QLoRA on all linear layers (Q, K, V, O, gate, up and down) with rank = 32 with a batch size of 1. We padded all sequences to a certain maximum sequence length to mimic long context finetuning workloads.
Citation
You can cite the Unsloth repo as follows:
Thank You to