CamoFormer: Masked Separable Attention for Camouflaged Object Detection
This official repository contains the source code, prediction results, and evaluation toolbox of paper ‘CamoFormer: Masked Separable Attention for Camouflaged Object Detection’. The technical report could be found at arXiv.
The whole benchmark results can be found at One Drive, Baidu Netdisk, or Google Drive.
Jittor version of this repositor is avaiable. Thanks to 2112529 for his contributions in implementing the Jittor version of the code.
Figure 1: Overall architecture of our CamoFormer model. First, a pretrained Transformer-based backbone is utilized to extract multi-scale features of the input image. Then, the features from the last three stages are aggregated to generate the coarse prediction. Next, the
progressive refinement decoder equipped with masked separable attention (MSA) is applied to gradually polish the prediction results. All
the predictions generated by our CamoFormer are supervised by the ground truth (GT).
1. 🔥 NEWS 🔥
[2022/12/09] Releasing the codebase of CamoFormer and the whole COD benchmarking results (21 models).
[2022/12/08] Creating repository.
We invite all to contribute in making it more acessible and useful. If you have any questions about our work, feel free to contact me via e-mail (bowenyin@mail.nankai.edu.cn). If you are using our code and evaluation toolbox for your research, please cite this paper (BibTeX).
Figure 2: Diagrammatic details of the proposed F-TA in our MSA. Our B-TA shares a similar structure except for the mask.
3.2 COD Benchmark Results:
The prediction of our CamoFprmer can be found in One Drive, Baidu Netdisk, or Google Drive. Here are quantitative performance comparison.
Figure 3: Comparison of our CamoFormer with the recent SOTA methods. ‘-R’: ResNet, ‘-C’: ConvNext, ‘-S’: Swin Transformer, ‘-P’: PVTv2. As can be seen, our CamoFormer-P performs much better than previous methods with either CNN- or
Transformer-based models. ‘↑’: the higher the better, ‘↓’: the lower the better.
Acknowlegement
Thanks mczhuge providing a friendly codebase for binary segmentation tasks. And our code is built based on it.
Reference
You may want to cite:
@article{yin2024camoformer,
title={Camoformer: Masked separable attention for camouflaged object detection},
author={Yin, Bowen and Zhang, Xuying and Fan, Deng-Ping and Jiao, Shaohui and Cheng, Ming-Ming and Van Gool, Luc and Hou, Qibin},
journal={IEEE Transactions on Pattern Analysis and Machine Intelligence},
year={2024},
publisher={IEEE}
}
CamoFormer: Masked Separable Attention for Camouflaged Object DetectionThis official repository contains the source code, prediction results, and evaluation toolbox of paper ‘CamoFormer: Masked Separable Attention for Camouflaged Object Detection’. The technical report could be found at arXiv. The whole benchmark results can be found at One Drive, Baidu Netdisk, or Google Drive.
Jittor version of this repositor is avaiable. Thanks to 2112529 for his contributions in implementing the Jittor version of the code.
Figure 1: Overall architecture of our CamoFormer model. First, a pretrained Transformer-based backbone is utilized to extract multi-scale features of the input image. Then, the features from the last three stages are aggregated to generate the coarse prediction. Next, the progressive refinement decoder equipped with masked separable attention (MSA) is applied to gradually polish the prediction results. All the predictions generated by our CamoFormer are supervised by the ground truth (GT).
1. 🔥 NEWS 🔥
2. Get Start
0. Install
1. Download Datasets and Checkpoints.
By default, you can put datasets into the folder ‘dataset’.
Baidu Netdisk, One Drive
By default, you can put datasets into the folder ‘checkpoint’.
CamoFormer: Baidu Netdisk, One Drive Backbone: Baidu Netdisk, One Drive
2. Test.
3. Eval.
3. Proposed CamoFormer
3.1. The F-TA in MSA:
Figure 2: Diagrammatic details of the proposed F-TA in our MSA. Our B-TA shares a similar structure except for the mask.
3.2 COD Benchmark Results:
The prediction of our CamoFprmer can be found in One Drive, Baidu Netdisk, or Google Drive. Here are quantitative performance comparison.
Figure 3: Comparison of our CamoFormer with the recent SOTA methods. ‘-R’: ResNet, ‘-C’: ConvNext, ‘-S’: Swin Transformer, ‘-P’: PVTv2. As can be seen, our CamoFormer-P performs much better than previous methods with either CNN- or Transformer-based models. ‘↑’: the higher the better, ‘↓’: the lower the better.
Acknowlegement
Thanks mczhuge providing a friendly codebase for binary segmentation tasks. And our code is built based on it.
Reference
You may want to cite:
License
Code in this repo is for non-commercial use only.