We propose OneReward, a novel RLHF methodology for the visual domain by employing Qwen2.5-VL as a generative reward model to enhance multitask reinforcement learning, significantly improving the policy model’s generation ability across multiple subtask. Building on OneReward, we develop Seedream 3.0 Fill, a unified SOTA image editing model capable of effec-tively handling diverse tasks including image fill, image extend, object removal, and text rendering. It surpasses several leading commercial and open-source systems, including Ideogram, Adobe Photoshop, and FLUX Fill [Pro]. Finally, based on FLUX Fill [dev], we are thrilled to release FLUX.1-Fill-dev-OneReward, which outperforms closed-source FLUX Fill [Pro] in inpainting and outpainting tasks, serving as a powerful new baseline for future research in unified image editing.
Image Fill
Image Extend with Prompt
Image Extend without Prompt
Object Removal
Seedream 3.0 Fill Performance Overview
Quick Start
Make sure your transformers>=4.51.3 (Supporting Qwen2.5-VL)
Install the latest version of diffusers (>=0.35.0)
pip install -U diffusers
The following contains a code snippet illustrating how to use the model to generate images based on text prompts and input mask, support inpaint(image-fill), outpaint(image-extend), eraser(object-removal). As the model is fully trained, FluxFillCFGPipeline with cfg is needed, you can find in pipeline_flux_fill_with_cfg.py.
import torch
from diffusers.utils import load_image
from diffusers import FluxTransformer2DModel
from src.pipeline_flux_fill_with_cfg import FluxFillCFGPipeline
transformer_onereward = FluxTransformer2DModel.from_pretrained(
"bytedance-research/OneReward",
subfolder="flux.1-fill-dev-OneReward-transformer",
torch_dtype=torch.bfloat16
)
pipe = FluxFillCFGPipeline.from_pretrained(
"black-forest-labs/FLUX.1-Fill-dev",
transformer=transformer_onereward,
torch_dtype=torch.bfloat16).to("cuda")
# Image Fill
image = load_image('assets/image.png')
mask = load_image('assets/mask_fill.png')
image = pipe(
prompt='the words "ByteDance", and in the next line "OneReward"',
negative_prompt="nsfw",
image=image,
mask_image=mask,
height=image.height,
width=image.width,
guidance_scale=1.0,
true_cfg=4.0,
num_inference_steps=50,
generator=torch.Generator("cpu").manual_seed(0)
).images[0]
image.save(f"image_fill.jpg")
As the base model flux fill have undergone heavy SFT for object generation which cause it gain only 15% usability for object removal, the improvement on removal is not obvious. we release a lora for object removal separately and might be helpful for you.
Code is licensed under Apache 2.0. Model is licensed under CC BY NC 4.0.
Citation
@article{gong2025onereward,
title={OneReward: Unified Mask-Guided Image Generation via Multi-Task Human Preference Learning},
author={Gong, Yuan and Wang, Xionghui and Wu, Jie and Wang, Shiyin and Wang, Yitong and Wu, Xinglong},
journal={arXiv preprint arXiv:2508.21066},
year={2025}
}
OneReward
Official implementation of OneReward: Unified Mask-Guided Image Generation via Multi-Task Human Preference Learning
🚀 TODO
FLUX.1-Fill-dev[OneReward]andFLUX.1-Fill-dev[OneRewardDynamic]mask-guided edit checkpoints.FLUX.1-dev[OneReward]text-to-image checkpoints.Introduction
We propose OneReward, a novel RLHF methodology for the visual domain by employing Qwen2.5-VL as a generative reward model to enhance multitask reinforcement learning, significantly improving the policy model’s generation ability across multiple subtask. Building on OneReward, we develop Seedream 3.0 Fill, a unified SOTA image editing model capable of effec-tively handling diverse tasks including image fill, image extend, object removal, and text rendering. It surpasses several leading commercial and open-source systems, including Ideogram, Adobe Photoshop, and FLUX Fill [Pro]. Finally, based on FLUX Fill [dev], we are thrilled to release FLUX.1-Fill-dev-OneReward, which outperforms closed-source FLUX Fill [Pro] in inpainting and outpainting tasks, serving as a powerful new baseline for future research in unified image editing.
Image Fill
Image Extend with Prompt
Image Extend without Prompt
Object Removal
Quick Start
Make sure your transformers>=4.51.3 (Supporting Qwen2.5-VL)
Install the latest version of diffusers (>=0.35.0)
The following contains a code snippet illustrating how to use the model to generate images based on text prompts and input mask, support inpaint(image-fill), outpaint(image-extend), eraser(object-removal). As the model is fully trained, FluxFillCFGPipeline with cfg is needed, you can find in pipeline_flux_fill_with_cfg.py.
input
output
Or you can run the whole inference demo in demo_one_reward.py and demo_one_reward_dynamic.py
Model
FLUX.1-Fill-dev[OneReward], trained with Alg.1 in paper
FLUX.1-Fill-dev[OneRewardDynamic], trained with Alg.2 in paper
Multi-task Usage
Image Extend with prompt
Image Extend without prompt
Object Removal
Object Removal with Lora
As the base model flux fill have undergone heavy SFT for object generation which cause it gain only 15% usability for object removal, the improvement on removal is not obvious. we release a lora for object removal separately and might be helpful for you.
License Agreement
Code is licensed under Apache 2.0. Model is licensed under CC BY NC 4.0.
Citation