mirrors/TinyR1-32B-Preview

TinyR1-32B-Preview: Boosting Accuracy with Branch-Merge Distillation

| 📑 Paper | 🤗 Hugging Face | 🌐 Blog |

TinyR1 Team

Introduction

We introduce our first-generation reasoning model, Tiny-R1-32B-Preview, which outperforms the 70B model Deepseek-R1-Distill-Llama-70B and nearly matches the full R1 model in math.

We applied supervised fine-tuning (SFT) to Deepseek-R1-Distill-Qwen-32B across three target domains—Mathematics, Code, and Science — using the 360-LLaMA-Factory training framework to produce three domain-specific models. We used questions from open-source data as seeds. Meanwhile, responses for mathematics, coding, and science tasks were generated by R1, creating specialized models for each domain. Building on this, we leveraged the Mergekit tool from the Arcee team to combine multiple models, creating Tiny-R1-32B-Preview, which demonstrates strong overall performance. For more technical details, please refer to our technical report. Paper Link👁️

Evaluation

Model	Math (AIME 2024)	Coding (LiveCodeBench)	Science (GPQA-Diamond)
Deepseek-R1-Distill-Qwen-32B	72.6	57.2	62.1
Deepseek-R1-Distill-Llama-70B	70.0	57.5	65.2
Deepseek-R1	79.8	65.9	71.5
Tiny-R1-32B-Preview (Ours)	78.1	61.6	65.0

All scores are reported as pass@1. For AIME 2024, we sample 16 responses, and for GPQA-Diamond, we sample 4 responses, both using average overall accuracy for stable evaluation.

We merged the models trained separately in three directions into a single model. Below are the comparison results.
| Model | Math (AIME 2024) | Coding (LiveCodeBench) | Science (GPQA-Diamond) | | ——————————- | ——————- | ———————– | ———————- | | Math-Model | 73.1 | - | - | | Code-Model | - | 63.4 | - | | Science-Model | - | - | 64.5 | | Merged-Model (Tiny-R1-32B-Preview) | 78.1 | 61.6 | 65.0

Getting Started

Branch Train

For multi-node training, please first fill in the train/hostfile file. For single-node training, this step is not required.

Note
About hostfile:
Each line in the hostfile specifies a node, formatted as <hostname> slots=<num_slots>, where <hostname> is the name of the node and <num_slots> is the number of GPUs available on that node. Here is an example:
worker-0 slots=8  
worker-1 slots=8  
For more details, please refer to the DeepSpeed official documentation.

Installation

To install the required dependencies, run:

pip install -r requirements.txt -i https://pypi.tuna.tsinghua.edu.cn/simple/ --trusted-host pypi.tuna.tsinghua.edu.cn

Math Model SFT

Hint: Replace BASE_MODEL with the actual path to the base model, e.g., “/model/deepseek-ai/DeepSeek-R1-Distill-Qwen-32B”.

BASE_MODEL="/path/to/base-model/" 
bash train/run.sh \
  --model $BASE_MODEL \
  --data-id-path "data/open-r1-math-default-0223.json" \
  --output-dir "model_output/branch-math-model" \
  --model-max-length 16384 \
  --learning-rate 1e-5 \
  --lr-scheduler-type constant_with_warmup \
  --num-train-epochs 5 \
  --save-steps 200 \
  --gradient-accumulation-steps 3 \
  --template qwen \
  --packing_type "packing"

Science Model SFT

BASE_MODEL="/path/to/base-model/" 
bash train/run.sh \
  --model $BASE_MODEL \
  --data-id-path "data/OpenThoughts-science-with-wrong5k-r1,s1_science_3k-r1,s1_1k-r1" \
  --output-dir "model_output/branch-science-model" \
  --model-max-length 16384 \
  --learning-rate 1e-5 \
  --lr-scheduler-type cosine \
  --num-train-epochs 5 \
  --save-steps 200 \
  --gradient-accumulation-steps 1 \
  --packing_type "neatpacking" \
  --template qwen

Code Model SFT

BASE_MODEL="/path/to/base-model/" 
bash train/run.sh \
  --model $BASE_MODEL \
  --data-id-path "data/openthoughts-16kseq-0218.json" \
  --output-dir "model_output/branch-code-model" \
  --model-max-length 16384 \
  --learning-rate 1e-5 \
  --lr-scheduler-type constant_with_warmup \
  --num-train-epochs 15 \
  --save-steps 200 \
  --gradient-accumulation-steps 3 \
  --packing_type "neatpacking" \
  --template qwen

Merge

Installation

To reproduce the merged qihoo360/TinyR1-32B-Preview model, using the script below.

git clone https://github.com/TinyR1-32B-Preview.git
cd TinyR1-32B-Preview/mergekit/
pip install -e .

If you encounter the error:

ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.

you can resolve it by following these steps:

Update the package list and install the virtual environment package:

apt-get update -y
apt-get install python3-venv -y

Create a virtual environment and activate the virtual environment:

python3.10 -m venv eval
source eval/bin/activate

After activating the virtual environment, reinstall the required packages. This approach isolates your Python environment from the global packages, thereby preventing dependency conflicts.

sh sh/tinyr1_merge.sh  [/path/to/math-model]  [/path/to/science-model]  [/path/to/code-model]  [/path/to/output-model-dir]

The following parameters are mandatory:

[/path/to/math-model]: the path to the math domain model that has been fine-tuned via SFT.
[/path/to/science-model]: the path to the science domain model that has been fine-tuned via SFT.
[/path/to/code-model]: the path to the code domain model that has been fine-tuned via SFT.
[/path/to/output-model-dir]: the path where the fused model will be saved.

Evaluation

We test the resulted models on three kinds of benchmarks, including Math Reasoning, Code Reasoning , and Scientific Reasoning.

Math Reasoning

AIME24
AIME25

Scientific Reasoning

GPQA-Diamond

Code Reasoning

LiveCodeBench (2408-2502)

Math Reasoning

The evaluation code is modified from Qwen2.5-Math. In our evaluation, we set the temperature to 0.6, the top-p to 0.95 and the max_tokens to 32768. We provide the example to reproduce our results in math_evaluation.

The system prompt for evaluation is set to:

Please reason step by step, and put your final answer within \boxed{{}}.

Scientific Reasoning

The evaluation code is modified from FuseO1-Preview. In our evaluation, we set the temperature to 0.6 and the max_tokens to 32768. We provide the example to reproduce our results in science_evaluation.

The system prompt for evaluation is set to:

You are a helpful and harmless assistant. You should think step-by-step.

Code Reasoning

The evaluation code is modified from FuseO1-Preview. In our evaluation, we set the temperature to 0.6, the top-p to 0.95 and the max_tokens to 32768. We provide the example to reproduce our results in code_lcb_evaluation.

The system prompt for evaluation is set to:

A conversation between User and Assistant. The user asks a question, and the Assistant solves it. The assistant first thinks about the reasoning process in the mind and then provides the user with the answer. The reasoning process and answer are enclosed within <think> </think> and <answer> </answer> tags, respectively, i.e., <think> reasoning process here </think> <answer> answer here </answer>.

Quickstart

from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "qihoo360/TinyR1-32B-Preview"

model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype="auto",
    device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained(model_name)

prompt = "Please reason step by step, and put your final answer within \boxed{}. Solve the integral:  \[I = \int \frac{x^2}{(x+1)^3} \,dx\]"
messages = [
    {"role": "user", "content": prompt}
]
text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True
)
model_inputs = tokenizer([text], return_tensors="pt").to(model.device)

generated_ids = model.generate(
    **model_inputs,
    max_new_tokens=4000
)
generated_ids = [
    output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
]

response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]

print(response)

Citation

@misc{tinyr1proj,
      title={TinyR1-32B-Preview: Boosting Accuracy with Branch-Merge Distillation}, 
      author={TinyR1 Team},
      year={2025},
      url={https://arxiv.org/abs/2503.04872}, 
}