Flow-based Policy for Online Reinforcement Learning

We are delighted to introduce FlowRL. It is a new approach for online reinforcement learning that integrates flow-based policy representation with Wasserstein-2-regularized optimization. This creates a promising framework that integrates generative policies with reinforcement learning.

News

[2025/06/10] 🔥 We release the PyTorch version of the code.
[2025/09/18] 🎉 Our paper has been accepted to NeurIPS 2025.
Introduction
FlowRL is an Actor-Critic framework that leverages flow-based policy representation and integrates Wasserstein-2-regularized optimization. By implicitly constraining the current policy to the optimal behavioral policy via W2 distance, FlowRL achieves superior performance on challenging benchmarks like the DM_Control (Dog domain, Humanoid domain) and Humanoid_Bench.
Getting Started

Setup Conda Environment: Create an environment with
```
conda create -n flowrl python=3.11
```

Clone this Repository:

git clone https://github.com/bytedance/FlowRL.git
cd FlowRL

Install FlowRL Dependencies:
```
pip install -r requirements.txt
```
Training Examples:
- Run a single training instance:
```
python3 main.py --domain dog --task run
```
- Run parallel training:
```
bash scripts/train_parallel.sh
```

License

This project is licensed under the Apache License 2.0. See the LICENSE file for details.

TODO

Release JAX version source code
Citation
If you find FlowRL useful for your research and applications, please consider giving us a star ⭐ or cite us using:

@article{lv2025flow,
  title={Flow-Based Policy for Online Reinforcement Learning},
  author={Lv, Lei and Li, Yunfei and Luo, Yu and Sun, Fuchun and Kong, Tao and Xu, Jiafeng and Ma, Xiao},
  journal={arXiv preprint arXiv:2506.12811},
  year={2025}
}

About ByteDance Seed Team

Founded in 2023, ByteDance Seed Team is dedicated to crafting the industry’s most advanced AI foundation models. The team aspires to become a world-class research team and make significant contributions to the advancement of science and society.

Flow-based Policy for Online Reinforcement Learning

News

Introduction

Getting Started

License

TODO

Citation

About ByteDance Seed Team