目录

Bitwise Visual Tokenizer

The training and inference code of bitwise tokenizer used by Infinity.

BitVAE Model ZOO

We provide Infinity models for you to play with, which are on or can be downloaded from the following links:

Visual Tokenizer

vocabulary stride IN-256 rFID \downarrow IN-256 PSNR \uparrow IN-512 rFID \downarrow IN-512 PSNR \uparrow HF weights🤗
Vd=216V_d=2^{16} 16 1.22 20.9 0.31 22.6 infinity_vae_d16.pth
Vd=224V_d=2^{24} 16 0.75 22.0 0.30 23.5 infinity_vae_d24.pth
Vd=232V_d=2^{32} 16 0.61 22.7 0.23 24.4 infinity_vae_d32.pth
Vd=264V_d=2^{64} 16 0.33 24.9 0.15 26.4 infinity_vae_d64.pth
Vd=232V_d=2^{32} 16 0.75 21.9 0.32 23.6 infinity_vae_d32_reg.pth

Environment installation

bash scripts/prepare.sh

Download checkpoints and labels from Google Drive and put them under the project folder. If you want to use our trained model weights, please also download bitvae_results. We expect that the data is organized as below.

${PROJECT_ROOT}
    -- bitvae
    -- bitvae_results
        -- Infinity_d16_stage1
        -- Infinity_d16_stage2
        -- Infinity_d32_stage1
        -- Infinity_d32_stage2
    -- checkpoints
    -- labels
    -- scripts
    -- test
    ...

Training

Before training, please generate a labels/openimages/train.txt according to our provided labels/imagenet/val_example.txt. please replace with the real path on your system.

Tokenizer with hidden dimension 16

bash scripts/release/train_img_d16_stage1.sh # stage 1: single-scale pre-training
bash scripts/release/train_img_d16_stage2.sh # stage 2: multi-scale fine-tuning

Tokenizer with hidden dimension 32

bash scripts/release/train_img_d32_stage1.sh # stage 1: single-scale pre-training
bash scripts/release/train_img_d32_stage2.sh # stage 2: multi-scale fine-tuning

Testing & evaluation

Before testing, please generate a labels/imagenet/val.txt according to our provided labels/imagenet/val_example.txt. please replace with the real path on your system.

Tokenizer with hidden dimension 16

bash scripts/release/test_img_d16_stage1.sh
bash scripts/release/test_img_d16_stage2.sh

Tokenizer with hidden dimension 32

bash scripts/release/test_img_d32_stage1.sh
bash scripts/release/test_img_d32_stage2.sh

📖 Citation

If our work assists your research, feel free to give us a star ⭐ or cite us using:

@misc{Infinity,
    title={Infinity: Scaling Bitwise AutoRegressive Modeling for High-Resolution Image Synthesis}, 
    author={Jian Han and Jinlai Liu and Yi Jiang and Bin Yan and Yuqi Zhang and Zehuan Yuan and Bingyue Peng and Xiaobing Liu},
    year={2024},
    eprint={2412.04431},
    archivePrefix={arXiv},
    primaryClass={cs.CV},
    url={https://arxiv.org/abs/2412.04431}, 
}
@misc{VAR,
      title={Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction}, 
      author={Keyu Tian and Yi Jiang and Zehuan Yuan and Bingyue Peng and Liwei Wang},
      year={2024},
      eprint={2404.02905},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2404.02905}, 
}

License

This project is licensed under the MIT License - see the LICENSE file for details.

关于
284.0 KB
邀请码
    Gitlink(确实开源)
  • 加入我们
  • 官网邮箱:gitlink@ccf.org.cn
  • QQ群
  • QQ群
  • 公众号
  • 公众号

版权所有:中国计算机学会技术支持:开源发展技术委员会
京ICP备13000930号-9 京公网安备 11010802032778号