FLAC: Maximum Entropy RL via Kinetic Energy Regularized Bridge Matching
We are delighted to introduce FLAC (Field Least-Energy Actor-Critic), a likelihood-free framework for maximum entropy reinforcement learning that regulates policy stochasticity by penalizing the kinetic energy of the velocity field. FLAC integrates flow-based generative policies with principled entropy regularization — without ever computing action log-densities.
News
[2026/03] 🔥 We release the code for FLAC.
[2026/02] 🎉 We release our paper on arXiv.
Introduction
Iterative generative policies, such as diffusion models and flow matching, offer superior expressivity for continuous control but complicate Maximum Entropy Reinforcement Learning because their action log-densities are not directly accessible. FLAC addresses this challenge by formulating policy optimization as a Generalized Schrödinger Bridge (GSB) problem relative to a high-entropy reference process (e.g., uniform)[FLAC: Maximum Entropy RL via Kinetic Energy Regularized Bridge Matching].
Under this view, the maximum-entropy principle emerges naturally as staying close to a high-entropy reference while optimizing return, without requiring explicit action densities. Kinetic energy serves as a physically grounded proxy for divergence from the reference: minimizing path-space energy bounds the deviation of the induced terminal action distribution[FLAC: Maximum Entropy RL via Kinetic Energy Regularized Bridge Matching].
Key Features
Likelihood-Free: No need to compute intractable log π(a|s) for generative policies.
Principled: GSB theory guarantees the terminal distribution matches the Boltzmann form.
The FLAC Objective
FLAC combines GSB formulation, RL potential, and kinetic energy regularization into a single tractable objective:
Setup Conda Environment:
Create an environment with
conda create -n flac python=3.11
Clone this Repository:
git clone https://github.com/bytedance/FLAC.git
cd FLAC
Install FLAC Dependencies:
pip install -r requirements.txt
Training Examples:
Run parallel training:
bash scripts/train_parallel.sh
License
This project is licensed under the Apache License 2.0. See the LICENSE file for details.
Citation
If you find FLAC useful for your research and applications, please consider giving us a star ⭐ or cite us using:
@article{lv2026flac,
title={FLAC: Maximum Entropy RL via Kinetic Energy Regularized Bridge Matching},
author={Lv, Lei and Li, Yunfei and Luo, Yu and Sun, Fuchun and Ma, Xiao},
journal={arXiv preprint arXiv:2602.12829},
year={2026}
}
Founded in 2023, ByteDance Seed Team is dedicated to crafting the industry’s most advanced AI foundation models. The team aspires to become a world-class research team and make significant contributions to the advancement of science and society.
We are ByteDance Seed team.
FLAC: Maximum Entropy RL via Kinetic Energy Regularized Bridge Matching
We are delighted to introduce FLAC (Field Least-Energy Actor-Critic), a likelihood-free framework for maximum entropy reinforcement learning that regulates policy stochasticity by penalizing the kinetic energy of the velocity field. FLAC integrates flow-based generative policies with principled entropy regularization — without ever computing action log-densities.
News
Introduction
Iterative generative policies, such as diffusion models and flow matching, offer superior expressivity for continuous control but complicate Maximum Entropy Reinforcement Learning because their action log-densities are not directly accessible. FLAC addresses this challenge by formulating policy optimization as a Generalized Schrödinger Bridge (GSB) problem relative to a high-entropy reference process (e.g., uniform)[FLAC: Maximum Entropy RL via Kinetic Energy Regularized Bridge Matching].
Under this view, the maximum-entropy principle emerges naturally as staying close to a high-entropy reference while optimizing return, without requiring explicit action densities. Kinetic energy serves as a physically grounded proxy for divergence from the reference: minimizing path-space energy bounds the deviation of the induced terminal action distribution[FLAC: Maximum Entropy RL via Kinetic Energy Regularized Bridge Matching].
Key Features
The FLAC Objective
FLAC combines GSB formulation, RL potential, and kinetic energy regularization into a single tractable objective:
θminJFLAC(θ)=EPθ[α∫0121∥uθ(s,τ,Xτ)∥2dτ−Q(s,X1)]
The objective minimizes kinetic energy (as an entropy proxy) while maximizing return — fully tractable with no density evaluation needed[FLAC: Maximum Entropy RL via Kinetic Energy Regularized Bridge Matching].
Getting Started
Setup Conda Environment: Create an environment with
Clone this Repository:
Install FLAC Dependencies:
Training Examples:
License
This project is licensed under the Apache License 2.0. See the LICENSE file for details.
Citation
If you find FLAC useful for your research and applications, please consider giving us a star ⭐ or cite us using:
About ByteDance Seed Team
Founded in 2023, ByteDance Seed Team is dedicated to crafting the industry’s most advanced AI foundation models. The team aspires to become a world-class research team and make significant contributions to the advancement of science and society.