AReaL is an open-source fully asynchronous reinforcement learning training system
for large reasoning and agentic models, developed by members from Tsinghua IIIS and
the AReaL Team at Ant Group. Built upon the open-source project
ReaLHF, we are fully committed to
open-source principles by providing the training details, data, and infrastructure
required to reproduce our results, along with the models themselves. AReaL aims to help
everyone build their own AI agents easily and affordably. Our team loves milk tea
because it’s delicious, customizable, and affordable—we hope you enjoy our project just
as much as you’d enjoy real milk tea. Cheers!
📈 Scalability: Through algorithm-system co-design, AReaL delivers stable fully
asynchronous RL training with industry-leading speed. AReaL seamlessly adapts to
diverse computational environments, scaling from a single node to 1,000+ GPUs.
✨ Cutting-Edge Performance: AReaL produces state-of-the-art
math, coding, and
search agents with exceptional
capabilities.
[2026/01/01] Happy New Year! Thanks to the outstanding contribution from
@HwVanICI, we are excited to officially announce stable support for AReaL training on
Ascend NPU devices! The code is actively maintained and continuously updated in the
ascend branch. Check out
our documentation
to get started, and feel free to report any issues!
[2025/08/30] Introducing ASearcher, a state-of-the-art search agent built with
AReaL’s end-to-end asynchronous RL training. Check out the paper and
the open-source repository!
📋 Previous Releases
[2025/07/31] (AReaL-lite) We introduce AReaL-lite, a lightweight version of
AReaL designed specifically for AI researchers and rapid prototyping. AReaL-lite
features an algorithm-first API design that prioritizes ease of use and algorithm
development, while natively supporting fully asynchronous agentic RL. With 80% fewer
lines of code, AReaL-lite maintains 90% of AReaL’s performance and core functionality.
Check out our AReaL-lite design documentation and
the quickstart guide to
begin your journey with AReaL-lite!
[2025/06/03] (v0.3, boba²) We release boba² (double-boba) for fully
asynchronous RL training, which achieves 2.77× speedup while delivering comparable or
superior training performance compared to synchronous systems. Furthermore,
asynchronous RL significantly simplifies multi-turn agentic RL training setup! Check out
our v0.3 overview blog and the
research paper.
[2025/03/31] (v0.2, boba) Introducing our milestone release—boba! Please call it
A-ReaL-boba! This release features significantly faster training with SGLang support and
state-of-the-art 7B and 32B models for mathematical reasoning. Check out our
v0.2 technical blog.
[2025/02/24] (v0.1) Our initial release includes reproducible results for 1.5B and
7B Large Reasoning Models (LRMs). Check out our
v0.1 technical blog.
We warmly welcome contributions from the community! Whether you’re fixing bugs, adding
features, improving documentation, or helping others, your contribution is valued.
Please check our Contributing Guide for detailed information.
# Fork and clone the repository
git clone https://github.com/YOUR-USERNAME/AReaL
cd AReaL
# Install in development mode
pip install -e ".[dev,docs]"
# Set up pre-commit hooks for automatic formatting
pip install pre-commit
pre-commit install
# Make changes
git checkout -b feat/gpt-o5
git add .
# `git commit` will automatically format your file
git commit -m "Implement gpt-o5 training loop"
git push
💬 Community & Support
GitHub Discussions - Ask
questions, share ideas, and connect with the community
AReaL is under active development with planned minor releases weekly and major releases
monthly. We warmly welcome community engagement and contributions. We are also
actively hiring interns and full-time employees with open positions in both the US
and China.
🙏 Acknowledgments
We gratefully acknowledge that major contributors are from the AReaL Team at Ant Group
and the Institute for Interdisciplinary Information Sciences, Tsinghua University.
We have also received invaluable assistance from the following groups (listed
alphabetically):
The Data Intelligence Lab at Ant Research for their data support
@HwVanICI for support on vLLM, LoRA, NPU integration, and more
The Relaxed System Lab at HKUST for seamless
collaboration on numerous system-related aspects
The SGLang team for supporting custom weight
update features and their contributions during AReaL-lite development
The Super Computing Technology (SCT) team at Ant Group for their expertise in
large-scale cluster operations and maintenance
Special thanks to @Lyken17 for providing valuable suggestions throughout the
development process
@inproceedings{mei2025real,
author = {Mei, Zhiyu and Fu, Wei and Li, Kaiwei and Wang, Guangju and Zhang, Huanchen and Wu, Yi},
title = {ReaL: Efficient RLHF Training of Large Language Models with Parameter Reallocation},
booktitle = {Proceedings of the Eighth Conference on Machine Learning and Systems,
MLSys 2025, Santa Clara, CA, USA, May 12-15, 2025},
publisher = {mlsys.org},
year = {2025},
}
@misc{fu2025areal,
title={AReaL: A Large-Scale Asynchronous Reinforcement Learning System for Language Reasoning},
author={Wei Fu and Jiaxuan Gao and Xujie Shen and Chen Zhu and Zhiyu Mei and Chuyi He and Shusheng Xu and Guo Wei and Jun Mei and Jiashu Wang and Tongkai Yang and Binhang Yuan and Yi Wu},
year={2025},
eprint={2505.24298},
archivePrefix={arXiv},
primaryClass={cs.LG},
url={https://arxiv.org/abs/2505.24298},
}
AReaL: A Large-Scale Asynchronous Reinforcement Learning System
| Paper | Documentation | Ask DeepWiki | 🤗 Models & Data |
WeChat (微信) Group |
AReaL is an open-source fully asynchronous reinforcement learning training system for large reasoning and agentic models, developed by members from Tsinghua IIIS and the AReaL Team at Ant Group. Built upon the open-source project ReaLHF, we are fully committed to open-source principles by providing the training details, data, and infrastructure required to reproduce our results, along with the models themselves. AReaL aims to help everyone build their own AI agents easily and affordably. Our team loves milk tea because it’s delicious, customizable, and affordable—we hope you enjoy our project just as much as you’d enjoy real milk tea. Cheers!
AReaL Highlights
📰 News
[2026/01/15] Congrats to our friends at CAMEL-AI for open-sourcing SETA, their terminal agent RL project trained with AReaL! Check out their training workflow and the announcement on X.
[2026/01/01] Happy New Year! Thanks to the outstanding contribution from @HwVanICI, we are excited to officially announce stable support for AReaL training on Ascend NPU devices! The code is actively maintained and continuously updated in the
ascendbranch. Check out our documentation to get started, and feel free to report any issues![2025/08/30] Introducing ASearcher, a state-of-the-art search agent built with AReaL’s end-to-end asynchronous RL training. Check out the paper and the open-source repository!
📋 Previous Releases
[2025/07/31] (AReaL-lite) We introduce AReaL-lite, a lightweight version of AReaL designed specifically for AI researchers and rapid prototyping. AReaL-lite features an algorithm-first API design that prioritizes ease of use and algorithm development, while natively supporting fully asynchronous agentic RL. With 80% fewer lines of code, AReaL-lite maintains 90% of AReaL’s performance and core functionality. Check out our AReaL-lite design documentation and the quickstart guide to begin your journey with AReaL-lite!
[2025/06/03] (v0.3, boba²) We release boba² (double-boba) for fully asynchronous RL training, which achieves 2.77× speedup while delivering comparable or superior training performance compared to synchronous systems. Furthermore, asynchronous RL significantly simplifies multi-turn agentic RL training setup! Check out our v0.3 overview blog and the research paper.
[2025/03/31] (v0.2, boba) Introducing our milestone release—boba! Please call it A-ReaL-boba! This release features significantly faster training with SGLang support and state-of-the-art 7B and 32B models for mathematical reasoning. Check out our v0.2 technical blog.
[2025/02/24] (v0.1) Our initial release includes reproducible results for 1.5B and 7B Large Reasoning Models (LRMs). Check out our v0.1 technical blog.
📚 Examples
🔧 Support Matrix
🧠 Algorithms
Models
transformersTraining Backends
Inference Backends
🚀 Getting Started
Our training scripts automatically download the required dataset (openai/gsm8k) and model (Qwen/Qwen2-1.5B-Instruct). To run on a single node:
To run on a Ray cluster with 2 nodes and 8 GPUs per node (remember to update paths in the YAML file to point to your shared storage):
For comprehensive setup instructions, see our quickstart guide.
📖 Resources
Code Walkthrough
Customization
🤝 Contributing
We warmly welcome contributions from the community! Whether you’re fixing bugs, adding features, improving documentation, or helping others, your contribution is valued. Please check our Contributing Guide for detailed information.
💬 Community & Support
🗺️ Future Roadmap
AReaL is under active development with planned minor releases weekly and major releases monthly. We warmly welcome community engagement and contributions. We are also actively hiring interns and full-time employees with open positions in both the US and China.
🙏 Acknowledgments
We gratefully acknowledge that major contributors are from the AReaL Team at Ant Group and the Institute for Interdisciplinary Information Sciences, Tsinghua University.
We have also received invaluable assistance from the following groups (listed alphabetically):
The Data Intelligence Lab at Ant Research for their data support
@HwVanICI for support on vLLM, LoRA, NPU integration, and more
The Relaxed System Lab at HKUST for seamless collaboration on numerous system-related aspects
The SGLang team for supporting custom weight update features and their contributions during AReaL-lite development
The Super Computing Technology (SCT) team at Ant Group for their expertise in large-scale cluster operations and maintenance
Special thanks to @Lyken17 for providing valuable suggestions throughout the development process
We also deeply appreciate all pioneering work from the community, particularly the ReaLHF project from OpenPsi Inc. and other outstanding projects, including but not limited to DeepScaleR, Open-Reasoner-Zero, OpenRLHF, VeRL, SGLang, QwQ, Light-R1, and DAPO.
📄 Citation