Light-IF is a powerful instruction-following large language model (LLM) series that leverages Preview-Checking reasoning to handle complex instructions with generalizable behavior — all trained with less than $3,000 in compute.
SuperCLUE-CPIF
In the latest SuperCLUE-CPIF evaluation, Light-IF-14B (shown as 360zhinao3-o1.5 in the figure below) reached the domestic SOTA, outperforming ERNIE-X1.1 and DeepSeek-V3.2-Exp-Thinking.
SuperCLUE-CPIF (Chinese Precise Instruction Following) is a benchmark designed to assess how well large language models (LLMs) can accurately follow complex, multi-constraint instructions in Chinese.
Light-IF-14B 🤗 is the most powerful 14B instruction-following model we have open-sourced, even outperforming Light-IF-32B.
This remarkable performance is largely attributed to our carefully designed curriculum learning strategy.
📌 Highlights
🔍 Identifies and overcomes lazy reasoning in LLMs.
🧩 Integrates Preview + Self-Checking mechanisms.
🚀 Combines Entropy-SFT and TEA-RL for robust training.
💡 Achieves state-of-the-art results on instruction benchmarks.
💰 Trained efficiently on A800 GPUs at very low cost.
🔨 Technical Overview
Light-IF addresses the challenge of poor instruction-following due to lazy reasoning. Its pipeline includes:
1. Hardness-aware Prompt Synthesis
Construct prompts with complex verifiable constraints.
Filter invalid outputs using LLMs to form high-quality datasets.
2. Zero-RL Training
Train a base model to reject lazy thinking with length-based and correctness-based rewards.
3. Entropy-Preserving SFT
Select tokens by balancing NLL and entropy.
Prevents overfitting and preserves model diversity.
4. TEA-RL (Token-wise Entropy-Adaptive RL)
Dense rewards for partially satisfying constraints.
Entropy-regularized policy gradient for stable learning.
The overall framework of the proposed method:
💻 Quick Usage
The following contains a code snippet illustrating how to use the model generate content based on given inputs.
from transformers import AutoModelForCausalLM, AutoTokenizer
model_name = "qihoo360/Light-IF-32B"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
model_name,
torch_dtype="auto",
device_map="auto"
)
prompt = "Please help me write a poem with a total of 15 lines and no more than 300 words. The poem should be divided into 4 stanzas, each beginning with a **highlighted subtitle**."
messages = [
{"role": "user", "content": prompt}
]
text = tokenizer.apply_chat_template(
messages,
tokenize=False,
add_generation_prompt=True,
enable_thinking=True
)
model_inputs = tokenizer([text], return_tensors="pt").to(model.device)
generated_ids = model.generate(
**model_inputs,
max_new_tokens=32768
)
output_ids = generated_ids[0][len(model_inputs.input_ids[0]):].tolist()
# parsing thinking content
try:
# rindex finding 151668 (</think>)
index = len(output_ids) - output_ids[::-1].index(151668)
except ValueError:
index = 0
thinking_content = tokenizer.decode(output_ids[:index], skip_special_tokens=True).strip("\n")
content = tokenizer.decode(output_ids[index:], skip_special_tokens=True).strip("\n")
print("thinking content:", thinking_content)
print("content:", content)
⚙️ Training Cost
Model
GPUs
Hours
Cost (USD)
Light-IF-1.7B
A800×4
10
~$342
Light-IF-32B
A800×88
30
~$2,800
📜 License
This repository is licensed under the Apache 2.0 License.
Citation
@article{Light-IF,
title={Light-IF: Endowing LLMs with Generalizable Reasoning via Preview and Self-Checking for Complex Instruction Following},
author={Light-IF Team},
journal={arXiv preprint arXiv:2508.03178},
year={2025}
}
Light-IF: Endowing LLMs with Generalizable Reasoning via Preview and Self-Checking
Aug. 13 update: SOTA 14B
Light-IF was accepted as an Oral presentation at AAAI-26.
🧪 Benchmarks
SuperCLUE-CPIF
In the latest SuperCLUE-CPIF evaluation, Light-IF-14B (shown as 360zhinao3-o1.5 in the figure below) reached the domestic SOTA, outperforming ERNIE-X1.1 and DeepSeek-V3.2-Exp-Thinking.
SuperCLUE-CPIF (Chinese Precise Instruction Following) is a benchmark designed to assess how well large language models (LLMs) can accurately follow complex, multi-constraint instructions in Chinese.
Light-IF-14B 🤗 is the most powerful 14B instruction-following model we have open-sourced, even outperforming Light-IF-32B.
This remarkable performance is largely attributed to our carefully designed curriculum learning strategy.
📌 Highlights
🔨 Technical Overview
Light-IF addresses the challenge of poor instruction-following due to lazy reasoning. Its pipeline includes:
1. Hardness-aware Prompt Synthesis
2. Zero-RL Training
3. Entropy-Preserving SFT
4. TEA-RL (Token-wise Entropy-Adaptive RL)
The overall framework of the proposed method:
💻 Quick Usage
The following contains a code snippet illustrating how to use the model generate content based on given inputs.
⚙️ Training Cost
📜 License
This repository is licensed under the Apache 2.0 License.
Citation
Star History