FC-CLIP is an universal model for open-vocabulary image segmentation problems, consisting of a class-agnostic segmenter, in-vocabulary classifier, out-of-vocabulary classifier. With everything built upon a shared single frozen convolutional CLIP model, FC-CLIP not only achieves state-of-the-art performance on various open-vocabulary segmentation benchmarks, but also enjoys a much lower training (3.2 days with 8 V100) and testing costs compared to prior arts.
If you use FC-CLIP in your research, please use the following BibTeX entry.
@inproceedings{yu2023fcclip,
title={Convolutions Die Hard: Open-Vocabulary Segmentation with Single Frozen Convolutional CLIP},
author={Qihang Yu and Ju He and Xueqing Deng and Xiaohui Shen and Liang-Chieh Chen},
booktitle={NeurIPS},
year={2023}
}
Convolutions Die Hard: Open-Vocabulary Segmentation with Single Frozen Convolutional CLIP (NeurIPS 2023)
This repo contains the code for our paper Convolutions Die Hard: Open-Vocabulary Segmentation with Single Frozen Convolutional CLIP
FC-CLIP is an universal model for open-vocabulary image segmentation problems, consisting of a class-agnostic segmenter, in-vocabulary classifier, out-of-vocabulary classifier. With everything built upon a shared single frozen convolutional CLIP model, FC-CLIP not only achieves state-of-the-art performance on various open-vocabulary segmentation benchmarks, but also enjoys a much lower training (3.2 days with 8 V100) and testing costs compared to prior arts.
Installation
See installation instructions.
Getting Started
See Preparing Datasets for FC-CLIP.
See Getting Started with FC-CLIP.
We also support FC-CLIP with HuggingFace 🤗 Demo
Model Zoo
(A-847)
(PC-59)
(PC-459)
(PAS-21)
(PAS-20)
(training dataset)
Citing FC-CLIP
If you use FC-CLIP in your research, please use the following BibTeX entry.
Acknowledgement
Mask2Former
ODISE