Flow-based Policy for Online Reinforcement Learning
We are delighted to introduce FlowRL. It is a new approach for online reinforcement learning that integrates flow-based policy representation with Wasserstein-2-regularized optimization. This creates a promising framework that integrates generative policies with reinforcement learning.
News
[2025/06/10] 🔥 We release the PyTorch version of the code.
[2025/09/18] 🎉 Our paper has been accepted to NeurIPS 2025.
Introduction
FlowRL is an Actor-Critic framework that leverages flow-based policy representation and integrates Wasserstein-2-regularized optimization. By implicitly constraining the current policy to the optimal behavioral policy via W2 distance, FlowRL achieves superior performance on challenging benchmarks like the DM_Control (Dog domain, Humanoid domain) and Humanoid_Bench.
Getting Started
Setup Conda Environment:
Create an environment with
conda create -n flowrl python=3.11
Clone this Repository:
git clone https://github.com/bytedance/FlowRL.git
cd FlowRL
Install FlowRL Dependencies:
pip install -r requirements.txt
Training Examples:
Run a single training instance:
python3 main.py --domain dog --task run
Run parallel training:
bash scripts/train_parallel.sh
License
This project is licensed under the Apache License 2.0. See the LICENSE file for details.
TODO
Release JAX version source code
Citation
If you find FlowRL useful for your research and applications, please consider giving us a star ⭐ or cite us using:
@article{lv2025flow,
title={Flow-Based Policy for Online Reinforcement Learning},
author={Lv, Lei and Li, Yunfei and Luo, Yu and Sun, Fuchun and Kong, Tao and Xu, Jiafeng and Ma, Xiao},
journal={arXiv preprint arXiv:2506.12811},
year={2025}
}
Founded in 2023, ByteDance Seed Team is dedicated to crafting the industry’s most advanced AI foundation models. The team aspires to become a world-class research team and make significant contributions to the advancement of science and society.
We are ByteDance Seed team.
Flow-based Policy for Online Reinforcement Learning
We are delighted to introduce FlowRL. It is a new approach for online reinforcement learning that integrates flow-based policy representation with Wasserstein-2-regularized optimization. This creates a promising framework that integrates generative policies with reinforcement learning.
News
Introduction
FlowRL is an Actor-Critic framework that leverages flow-based policy representation and integrates Wasserstein-2-regularized optimization. By implicitly constraining the current policy to the optimal behavioral policy via W2 distance, FlowRL achieves superior performance on challenging benchmarks like the DM_Control (Dog domain, Humanoid domain) and Humanoid_Bench.Getting Started
Setup Conda Environment: Create an environment with
Clone this Repository:
Install FlowRL Dependencies:
Training Examples:
Run a single training instance:
Run parallel training:
License
This project is licensed under the Apache License 2.0. See the LICENSE file for details.
TODO
Citation
If you find FlowRL useful for your research and applications, please consider giving us a star ⭐ or cite us using:About ByteDance Seed Team
Founded in 2023, ByteDance Seed Team is dedicated to crafting the industry’s most advanced AI foundation models. The team aspires to become a world-class research team and make significant contributions to the advancement of science and society.