目录
👋 Hi, everyone!
We are ByteDance Seed team.

seed logo

FLAC: Maximum Entropy RL via Kinetic Energy Regularized Bridge Matching

We are delighted to introduce FLAC (Field Least-Energy Actor-Critic), a likelihood-free framework for maximum entropy reinforcement learning that regulates policy stochasticity by penalizing the kinetic energy of the velocity field. FLAC integrates flow-based generative policies with principled entropy regularization — without ever computing action log-densities.

Paper Project Page License

News

  • [2026/03] 🔥 We release the code for FLAC.
  • [2026/02] 🎉 We release our paper on arXiv.

Introduction

Iterative generative policies, such as diffusion models and flow matching, offer superior expressivity for continuous control but complicate Maximum Entropy Reinforcement Learning because their action log-densities are not directly accessible. FLAC addresses this challenge by formulating policy optimization as a Generalized Schrödinger Bridge (GSB) problem relative to a high-entropy reference process (e.g., uniform)[FLAC: Maximum Entropy RL via Kinetic Energy Regularized Bridge Matching].

Under this view, the maximum-entropy principle emerges naturally as staying close to a high-entropy reference while optimizing return, without requiring explicit action densities. Kinetic energy serves as a physically grounded proxy for divergence from the reference: minimizing path-space energy bounds the deviation of the induced terminal action distribution[FLAC: Maximum Entropy RL via Kinetic Energy Regularized Bridge Matching].

Key Features

  • Likelihood-Free: No need to compute intractable log π(a|s) for generative policies.
  • Principled: GSB theory guarantees the terminal distribution matches the Boltzmann form.

The FLAC Objective

FLAC combines GSB formulation, RL potential, and kinetic energy regularization into a single tractable objective:

θJFLAC(θ)=EPθ[α0112uθ(s,τ,Xτ)2dτQ(s,X1)]\min_{\theta} J_{\text{FLAC}}(\theta) = \mathbb{E}_{\mathbb{P}^\theta} \left[ \alpha \int_0^1 \frac{1}{2} \left\| u_\theta(s, \tau, X_\tau) \right\|^2 d\tau - Q(s, X_1) \right]

The objective minimizes kinetic energy (as an entropy proxy) while maximizing return — fully tractable with no density evaluation needed[FLAC: Maximum Entropy RL via Kinetic Energy Regularized Bridge Matching].

Getting Started

  1. Setup Conda Environment: Create an environment with

    conda create -n flac python=3.11
  2. Clone this Repository:

    git clone https://github.com/bytedance/FLAC.git
    cd FLAC
  3. Install FLAC Dependencies:

    pip install -r requirements.txt
  4. Training Examples:

    • Run parallel training:
      bash scripts/train_parallel.sh

License

This project is licensed under the Apache License 2.0. See the LICENSE file for details.

Citation

If you find FLAC useful for your research and applications, please consider giving us a star ⭐ or cite us using:

@article{lv2026flac,
  title={FLAC: Maximum Entropy RL via Kinetic Energy Regularized Bridge Matching},
  author={Lv, Lei and Li, Yunfei and Luo, Yu and Sun, Fuchun and Ma, Xiao},
  journal={arXiv preprint arXiv:2602.12829},
  year={2026}
}

About ByteDance Seed Team

Founded in 2023, ByteDance Seed Team is dedicated to crafting the industry’s most advanced AI foundation models. The team aspires to become a world-class research team and make significant contributions to the advancement of science and society.

邀请码
    Gitlink(确实开源)
  • 加入我们
  • 官网邮箱:gitlink@ccf.org.cn
  • QQ群
  • QQ群
  • 公众号
  • 公众号

版权所有:中国计算机学会技术支持:开源发展技术委员会
京ICP备13000930号-9 京公网安备 11010802032778号