FLAC: Maximum Entropy RL via Kinetic Energy Regularized Bridge Matching

We are delighted to introduce FLAC (Field Least-Energy Actor-Critic), a likelihood-free framework for maximum entropy reinforcement learning that regulates policy stochasticity by penalizing the kinetic energy of the velocity field. FLAC integrates flow-based generative policies with principled entropy regularization — without ever computing action log-densities.

News

[2026/04] 🎉 FLAC was accepted by ICML2026.
[2026/03] 🔥 We release the code for FLAC.
[2026/02] 🎉 We release our paper on arXiv.

Introduction

Iterative generative policies, such as diffusion models and flow matching, offer superior expressivity for continuous control but complicate Maximum Entropy Reinforcement Learning because their action log-densities are not directly accessible. FLAC addresses this challenge by formulating policy optimization as a Generalized Schrödinger Bridge (GSB) problem relative to a high-entropy reference process (e.g., uniform)[FLAC: Maximum Entropy RL via Kinetic Energy Regularized Bridge Matching].

Under this view, the maximum-entropy principle emerges naturally as staying close to a high-entropy reference while optimizing return, without requiring explicit action densities. Kinetic energy serves as a physically grounded proxy for divergence from the reference: minimizing path-space energy bounds the deviation of the induced terminal action distribution[FLAC: Maximum Entropy RL via Kinetic Energy Regularized Bridge Matching].

Key Features

Likelihood-Free: No need to compute intractable log π(a|s) for generative policies.
Principled: GSB theory guarantees the terminal distribution matches the Boltzmann form.

The FLAC Objective

FLAC combines GSB formulation, RL potential, and kinetic energy regularization into a single tractable objective:

$\min_{\theta} J_{\text{FLAC}}(\theta) = \mathbb{E}_{\mathbb{P}^\theta} \left[ \alpha \int_0^1 \frac{1}{2} \left\| u_\theta(s, \tau, X_\tau) \right\|^2 d\tau - Q(s, X_1) \right]$

The objective minimizes kinetic energy (as an entropy proxy) while maximizing return — fully tractable with no density evaluation needed[FLAC: Maximum Entropy RL via Kinetic Energy Regularized Bridge Matching].

Getting Started

Setup Conda Environment: Create an environment with
```
conda create -n flac python=3.11
```

Clone this Repository:

git clone https://github.com/bytedance/FLAC.git
cd FLAC

Install FLAC Dependencies:
```
pip install -r requirements.txt
```
Training Examples:
- Run parallel training:
```
bash scripts/train_parallel.sh
```

License

This project is licensed under the Apache License 2.0. See the LICENSE file for details.

Citation

If you find FLAC useful for your research and applications, please consider giving us a star ⭐ or cite us using:

@article{lv2026flac,
  title={FLAC: Maximum Entropy RL via Kinetic Energy Regularized Bridge Matching},
  author={Lv, Lei and Li, Yunfei and Luo, Yu and Sun, Fuchun and Ma, Xiao},
  journal={arXiv preprint arXiv:2602.12829},
  year={2026}
}

About ByteDance Seed Team

Founded in 2023, ByteDance Seed Team is dedicated to crafting the industry’s most advanced AI foundation models. The team aspires to become a world-class research team and make significant contributions to the advancement of science and society.