PLCC comparisons between our proposed Q-Insight and existing IQA metrics (left) and three example applications of our Q-Insight (right) are presented. Q-Insight demonstrates significantly improved performance compared to existing methods, especially on out-of-domain datasets. Additionally, Q-Insight effectively supports quality score regression, image degradation perception, and zero-shot image comparison reasoning tasks.
VQ-Insight: Teaching VLMs for AI-Generated Video Quality Understanding via Progressive Visual Reinforcement Learning
We propose a reasoning-style vision-language model VQ-Insight, which accurately performs AIGC video preference comparison, AIGC video multi-dimension scoring, and natural video scoring, accompanied by detailed and reasonable reasoning processes. Our VQ-Insight can be applied to post-training of video generation models and zero-shot content repairing.
Reasoning as Representation: Rethinking Visual Reinforcement Learning in Image Quality Assessment
We revisit the reasoning mechanism in MLLM-based IQA model (such as Q-Insight) and propose a CLIP-based lightweight image scorer RALI. We verifies that through RL training, MLLMs leverage their reasoning capability to convert redundant visual representations into compact, cross-domain aligned text representations. This conversion is the source of the generalization exhibited by these reasoning-based IQA models. RALI uses only about 4% of Q-Insight’s parameters and inference time, while achieving comparable accuracy.
cd src/eval/qwen-vl-utils
pip install -e .[decord]
⚡ Quick Inference
Demo for RALI
Please download the RALI pretrained weights from the link. After downloading, place the checkpoint under Q-Insight/checkpoints, so that the directory structure becomes:
If Q-Insight Family is helpful, please help to ⭐ the repo.
If you find the code helpful in your research or work, please cite the following papers:
@article{li2025qinsight,
title={Q-Insight: Understanding Image Quality via Visual Reinforcement Learning},
author={Li, Weiqi and Zhang, Xuanyu and Zhao, Shijie and Zhang, Yabin and Li, Junlin and Zhang, Li and Zhang, Jian},
journal={Proceedings of the Advances in Neural Information Processing Systems (NeurIPS)},
year={2025}
}
@article{zhang2025vqinsight,
title={VQ-Insight: Teaching VLMs for AI-Generated Video Quality Understanding via Progressive Visual Reinforcement Learning},
author={Zhang, Xuanyu and Li, Weiqi and Zhao, Shijie and Li, Junlin and Zhang, Li and Zhang, Jian},
journal={Proceedings of the AAAI Conference on Artificial Intelligence (AAAI)},
year={2026}
}
@article{zhao2025reasoning,
title={Reasoning as Representation: Rethinking Visual Reinforcement Learning in Image Quality Assessment},
author={Zhao, Shijie and Zhang, Xuanyu and Li, Weiqi and Li, Junlin and Zhang, Li and Xue, Tianfan and Zhang, Jian},
journal={Proceedings of the International Conference on Learning Representations (ICLR)},
year={2026}
}
Q-Insight Family
🚩 Updates
🔥 Introduction
(✏️: * denotes equal contribution, # denotes project leader, † denotes corresponding author)
Q-Insight: Understanding Image Quality via Visual Reinforcement Learning
Weiqi Li, Xuanyu Zhang, Shijie Zhao#,†, Yabin Zhang, Junlin Li, Li Zhang and Jian Zhang†
PLCC comparisons between our proposed Q-Insight and existing IQA metrics (left) and three example applications of our Q-Insight (right) are presented. Q-Insight demonstrates significantly improved performance compared to existing methods, especially on out-of-domain datasets. Additionally, Q-Insight effectively supports quality score regression, image degradation perception, and zero-shot image comparison reasoning tasks.
VQ-Insight: Teaching VLMs for AI-Generated Video Quality Understanding via Progressive Visual Reinforcement Learning
Xuanyu Zhang*, Weiqi Li*, Shijie Zhao#,†, Junlin Li, Li Zhang, Jian Zhang†
We propose a reasoning-style vision-language model VQ-Insight, which accurately performs AIGC video preference comparison, AIGC video multi-dimension scoring, and natural video scoring, accompanied by detailed and reasonable reasoning processes. Our VQ-Insight can be applied to post-training of video generation models and zero-shot content repairing.
Reasoning as Representation: Rethinking Visual Reinforcement Learning in Image Quality Assessment
Shijie Zhao*,#, Xuanyu Zhang*, Weiqi Li, Junlin Li, Li Zhang, Tianfan Xue, Jian Zhang
We revisit the reasoning mechanism in MLLM-based IQA model (such as Q-Insight) and propose a CLIP-based lightweight image scorer RALI. We verifies that through RL training, MLLMs leverage their reasoning capability to convert redundant visual representations into compact, cross-domain aligned text representations. This conversion is the source of the generalization exhibited by these reasoning-based IQA models. RALI uses only about 4% of Q-Insight’s parameters and inference time, while achieving comparable accuracy.
🔧 Dependencies and Installation
To run VQ-Insight, install additional pacakages.
⚡ Quick Inference
Demo for RALI
Please download the RALI pretrained weights from the link. After downloading, place the checkpoint under
Q-Insight/checkpoints, so that the directory structure becomes:Then run the following code:
Demo for VQ-Insight
Natural Video Scoring
AIGC Video Multi-Dimension Scoring
AIGC Video Comparison
Demo for Q-Insight
Score Regression
Degradation Perception
Image Comparison Reasoning
📖 Dataset Preparation for Training Q-Insight
Score Regression
Download meta files from Data-DeQA-Score and the source images from the KONIQ dataset. Arrange the folders in
./src/open-r1-multimodal/dataas follows:Degradation Perception
Download the
refA_sd_briefsubset from KADIS-700K. Arrange the folders in./src/open-r1-multimodal/dataas follows:Image Comparison Reasoning
Download the validation dataset of DiffIQA. Arrange the folders in
./src/open-r1-multimodal/dataas follows:Training Q-Insight
Score Regression and Degradation Perception
Image Comparison Reasoning
Training VQ-Insight and RALI
The training code of VQ-Insight and RALI will be released at RALI and VQ-Insight.
✏️ To Do List
Acknowledgement
We appreciate the releasing codes and data of VLM-R1, DepictQA and DeQA-Score.
Citation
If Q-Insight Family is helpful, please help to ⭐ the repo.
If you find the code helpful in your research or work, please cite the following papers: