For other benchmarks, please follow their official instructions to construct the files; the overall pipeline is the same as evaluating in the MME benchmark.
Acknowledgement
This project is developed based on MiniGPT and BLIP2. Very sincere thanks to the contributors to these excellent codebases.
If you find our code helpful to your research, please consider citing us with this BibTeX:
@misc{wang2024pargobridgingvisionlanguagepartial,
title={ParGo: Bridging Vision-Language with Partial and Global Views},
author={An-Lan Wang and Bin Shan and Wei Shi and Kun-Yu Lin and Xiang Fei and Guozhi Tang and Lei Liao and Jingqun Tang and Can Huang and Wei-Shi Zheng},
year={2024},
eprint={2408.12928},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2408.12928},
}
License
The source code and pretrained weights are licensed under BSD-3-Clause
ParGo: Bridging Vision-Language with Partial and Global Views
Official PyTorch Implementation of ParGo: Bridging Vision-Language with Partial and Global Views. (AAAI 2025)
Paper, Model
Setup
Download Models
The LLM(internlm2-7b) and vision_encoder(eva-clip-l-14-336) need to be downloaded in advance.
Evaluation
MME Benchmark
Data
You can place the benchmark data in the benchmark directory. Data structure:
Json file in Data_json contains the image_name, question, answer, e.g.,
Evaluation
Step 1: Generate the response:
Step 2: Calculate the score:
For other benchmarks, please follow their official instructions to construct the files; the overall pipeline is the same as evaluating in the MME benchmark.
Acknowledgement
This project is developed based on MiniGPT and BLIP2. Very sincere thanks to the contributors to these excellent codebases.
If you find our code helpful to your research, please consider citing us with this BibTeX:
License
The source code and pretrained weights are licensed under BSD-3-Clause