Download all the test dataset (listed in Table 1 and Table 2) and modify all the data_root in configs/textrecog/base/datasets. We may upload them later.
The 600k training data with character-wise annotations will be available later. But currently, the repository can also run well without this training data (i.e., you can perform in-context training with only Transform Strategy by modifying ‘JSON FILE FOR CHARACTER-WISE POSITION INFORMATION’ as None). Also refer to Table 4.
train
stage1: vanilla STR training
modify ‘MAE PRETRAIN WEIGHT PATH’ / ‘LM WEIGHT PATH’ / ‘CHECKPOINT SAVE PATH’ / ‘SAVE_NAME’ in configs/textrecog/icl_ocr/stage1.py
sh run_stage1.sh
stage2: in-context training
modify ‘STAGE-1 WEIGHT PATH’ / ‘LM WEIGHT PATH’ / ‘JSON FILE FOR CHARACTER-WISE POSITION INFORMATION’ / ‘CHECKPOINT SAVE PATH’ / ‘SAVE_NAME’ in configs/textrecog/icl_ocr/stage2.py
sh run_stage2.sh
evaluate
Construct the in-context pool (i.e., a json file) by randomly sample data from any target training set. The json file should be structured as follows:
[
{
'img_path': ,
'gt_text':
}
]
Modify ‘JSON FILE FOR IN-CONTEXT POOL’ in configs/textrecog/icl_ocr/stage2.py
If you find our models / code / papers useful in your research, please consider giving stars ⭐ and citations 📝
@article{zhao2023multi,
title={Multi-modal In-Context Learning Makes an Ego-evolving Scene Text Recognizer},
author={Zhao, Zhen and Huang, Can and Wu, Binghong and Lin, Chunhui and Liu, Hao and Zhang, Zhizhong and Tan, Xin and Tang, Jingqun and Xie, Yuan},
journal={CVPR},
year={2024}
}
E2STR
The official implementation of E2STR: Multi-modal In-Context Learning Makes an Ego-evolving Scene Text Recognizer (CVPR-2024) PDF
environment
install mmocr 1.0.0
install requirements.txt
data & model
Download Union14M-L from Union14M-L
Download the MAE pretrained ViT weight from MAERec
Download OPT-125M
Download all the test dataset (listed in Table 1 and Table 2) and modify all the data_root in configs/textrecog/base/datasets. We may upload them later.
The 600k training data with character-wise annotations will be available later. But currently, the repository can also run well without this training data (i.e., you can perform in-context training with only Transform Strategy by modifying ‘JSON FILE FOR CHARACTER-WISE POSITION INFORMATION’ as None). Also refer to Table 4.
train
modify ‘MAE PRETRAIN WEIGHT PATH’ / ‘LM WEIGHT PATH’ / ‘CHECKPOINT SAVE PATH’ / ‘SAVE_NAME’ in configs/textrecog/icl_ocr/stage1.py
modify ‘STAGE-1 WEIGHT PATH’ / ‘LM WEIGHT PATH’ / ‘JSON FILE FOR CHARACTER-WISE POSITION INFORMATION’ / ‘CHECKPOINT SAVE PATH’ / ‘SAVE_NAME’ in configs/textrecog/icl_ocr/stage2.py
evaluate
Citation
If you find our models / code / papers useful in your research, please consider giving stars ⭐ and citations 📝