Identity-GRPO: Optimizing Multi-Human
Identity-preserving Video Generation via
Reinforcement Learning
Xiangyu Meng*, Zixian Zhang*, Zhenghao Zhang†, Junchao Liao, Long Qin, Weizhi Wang
* equal contribution
💡 Abstract
While advanced methods like VACE and Phantom have advanced video generation for specific subjects in diverse scenarios, they struggle with multi-human identity preservation in dynamic interactions, where consistent identities acrossmultiple characters are critical. To address this, we propose a human feedback-driven optimization pipeline for refining multi-human identity-preserving videogeneration. First, we construct a video reward model trained on a large-scale preference dataset containing human-annotated and synthetic distortion data, with pairwise annotations focused on maintaining human consistency throughout the video. We then introduce Identity-GRPO, a GRPO variant tailored for multi-human consistency, which greatly enhances both VACE and Phantom. Through extensive ablation studies, we evaluate the impact of annotation quality and design choiceson policy optimization. Experiments show that Identity-GRPO achieves up to 18.9% improvement in human consistency metrics over baseline methods, offering actionable insights for aligning reinforcement learning with personalized videogeneration.
🚀 Quick Started
1. 🐍 Environment Set Up
Clone this repository and install packages.
git clone https://github.com/alibaba/identity-grpo.git
cd identity_grpo
conda create -n vace_grpo python=3.10.16
conda activate vace_grpo
pip install -e .
# Then you should download FlashAttention2 and install it.
pip install flash_attn-2.7.4.post1+cu12torch2.6cxx11abiFALSE-cp310-cp310-linux_x86_64.whl
Using the provided configuration, the resulting ID-Consistency reward curves of Identity-GRPO on VACE-1.3B and Phantom-1.3B are shown below. Both exhibit a clear upward trend.
If you find Identity-GRPO useful for your research or projects, we would greatly appreciate it if you could cite the following paper:
@article{meng2025identity,
title={Identity-GRPO: Optimizing Multi-Human Identity-preserving Video Generation via Reinforcement Learning},
author={Meng, Xiangyu and Zhang, Zixian and Zhang, Zhenghao and Liao, Junchao and Qin, Long and Wang, Weizhi},
journal={arXiv preprint arXiv:2510.14256},
year={2025}
}
Identity-GRPO:
Optimizing Multi-Human Identity-preserving Video Generation via Reinforcement Learning
* equal contribution
💡 Abstract
While advanced methods like VACE and Phantom have advanced video generation for specific subjects in diverse scenarios, they struggle with multi-human identity preservation in dynamic interactions, where consistent identities acrossmultiple characters are critical. To address this, we propose a human feedback-driven optimization pipeline for refining multi-human identity-preserving videogeneration. First, we construct a video reward model trained on a large-scale preference dataset containing human-annotated and synthetic distortion data, with pairwise annotations focused on maintaining human consistency throughout the video. We then introduce Identity-GRPO, a GRPO variant tailored for multi-human consistency, which greatly enhances both VACE and Phantom. Through extensive ablation studies, we evaluate the impact of annotation quality and design choiceson policy optimization. Experiments show that Identity-GRPO achieves up to 18.9% improvement in human consistency metrics over baseline methods, offering actionable insights for aligning reinforcement learning with personalized videogeneration.
🚀 Quick Started
1. 🐍 Environment Set Up
Clone this repository and install packages.
2. 📦 Model Download
Weights Folder Structure
Download Links
First, you need to download
Wan-AI/Wan2.1-VACE-1.3B-diffusersandQwen/Qwen2.5-VL-3B-Instructfrom huggingface.com.Identity-GRPO lora weights: Link. Unzip this finetuned lora weights in
outputs/identity_grpo/ckpt/vace/.Identity-reward weights: Link. Unzip this reward weights in
outputs/identity_rewarddirectly.Then, you can set the pretrained model path and reward_model path in
config/dgx.py.3. 🏁 Data Preparation
Following the format of
train.csv and test.csvindataset/generated_img, you can prepare your own dataset.Then, you can set your dataset path in
config/dgx.py.4. 🔄 Inference
First, you need to modify
train.lora_pathinconfig/base.pyto the finetuned grpo weight path:And you need to prepare the test data in
dataset/generated_img/test.csv.Then run the scripts:
You can refer to scripts/single_node/test.sh for more configuration parameters.
4.1. Identity-Reward Inference
If you only need to use the Identity-Reward model, you can refer to vace_reward/inference.py .
5. 🧠 Training
You can prepare the training data in
dataset/generated_img/train.csv, and set the dataset path inconfig/dgx.py.Then run the scripts:
You can refer to scripts/single_node/train.sh. More training configuration can be found in
config/dgx.py.Using the provided configuration, the resulting ID-Consistency reward curves of Identity-GRPO on VACE-1.3B and Phantom-1.3B are shown below. Both exhibit a clear upward trend.
🎞️ Showcases
All videos are available in this Link
🤝 Acknowledgements
This repo is based Flow-GRPO, VideoAlign, Wan2.1, VACE and Phantom. We thank the authors for their valuable contributions to the AIGC community.
📄 Our previous work
⭐Citation
If you find Identity-GRPO useful for your research or projects, we would greatly appreciate it if you could cite the following paper: