⭐ If GenerateU is helpful to your projects, please help star this repo. Thanks! 🤗
Highlight
GenerateU is accepted by CVPR2024.
We introduce generative open-ended object detection, which is a more general and practical setting where categorical information is not explicitly defined. Such a setting is especially meaningful for scenarios where users lack precise knowledge of object cate- gories during inference.
Our GenerateU achieves comparable results to the open-vocabulary object detection method GLIP, even though the category names are not seen by GenerateU during inference.
Download our pretrained models from here to the weights folder. For training, prepare the backbone weight Swin-Tiny and Swin-Large following instruction in tools/convert-pretrained-swin-model-to-d2.py
By default, we train GenerateU using 16 A100 GPUs.
You can also train on a single node, but this might prevent you from reproducing the results presented in the paper.
Single-Node Training
When pretraining with VG, single node is enough.
On a single node with 8 GPUs, run
<MASTER_ADDRESS> should be the IP address of node 0. <PORT> should be the same among multiple nodes. If <PORT> is not specifed, programm will generate a random number as <PORT>.
Evaluation
To evaluate a model with a trained/ pretrained model, run
If you find our repo useful for your research, please consider citing our paper:
@inproceedings{lin2024generateu,
title={Generative Region-Language Pretraining for Open-Ended Object Detection},
author={Chuang, Lin and Yi, Jiang and Lizhen, Qu and Zehuan, Yuan and Jianfei, Cai},
booktitle={Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
year={2024}
}
Contact
If you have any questions, please feel free to reach me out at chuang.lin@monash.edu.
Acknowledgement
This code is based on UNINEXT. Some code are brought from FlanT5. Thanks for their awesome works.
Special thanks to Bin Yan and Junfeng Wu for their valuable contributions.
Generative Region-Language Pretraining for Open-Ended Object Detection
⭐ If GenerateU is helpful to your projects, please help star this repo. Thanks! 🤗
Highlight
Results
Zero-shot domain transfer to LVIS
Visualizations
👨🏻🎨 Pseudo-label Examples
🎨 Zero-shot LVIS
Overview
Dependencies and Installation
Clone Repo
Create Conda Environment and Install Dependencies
requirements.txtGet Started
Prepare pretrained models
Download our pretrained models from here to the
weightsfolder. For training, prepare the backbone weight Swin-Tiny and Swin-Large following instruction in tools/convert-pretrained-swin-model-to-d2.pyThe directory structure will be arranged as:
Dataset preparation
VG Dataset
LVIS Dataset
(Optional) GrIT-20M Dataset
Dataset strcture should look like:
Training
By default, we train GenerateU using 16 A100 GPUs. You can also train on a single node, but this might prevent you from reproducing the results presented in the paper.
Single-Node Training
When pretraining with VG, single node is enough. On a single node with 8 GPUs, run
Multiple-Node Training
<MASTER_ADDRESS>should be the IP address of node 0.<PORT>should be the same among multiple nodes. If<PORT>is not specifed, programm will generate a random number as<PORT>.Evaluation
To evaluate a model with a trained/ pretrained model, run
Citation
If you find our repo useful for your research, please consider citing our paper:
Contact
If you have any questions, please feel free to reach me out at
chuang.lin@monash.edu.Acknowledgement
This code is based on UNINEXT. Some code are brought from FlanT5. Thanks for their awesome works.
Special thanks to Bin Yan and Junfeng Wu for their valuable contributions.