🔥 `TAR3D: Creating High-Quality 3D Assets via Next-Part Prediction`

framework_img

[Paper] [Project Page] [Jittor Version] [Demo]

🚩 Todo List

Source code of 3D VQVAE.
Source code of 3D GPT.
Source code of 3D evaluation.
10w uids of high-quality objaverse object.
Pretrained weights of 3D reconstruction.
Pretrained weights of image-to-3D generation.
Pretrained weights of text-to-3D generation.

BibTeX

If you find TAR3D useful for your research or applications, please give us a star and cite this paper:

@inproceedings{zhang2025tar3d,
  title={Tar3d: Creating high-quality 3d assets via next-part prediction},
  author={Zhang, Xuying and Liu, Yutong and Li, Yangguang and Zhang, Renrui and Liu, Yufei and Wang, Kai and Ouyang, Wanli and Xiong, Zhiwei and Gao, Peng and Hou, Qibin and Cheng, Ming-Ming},
  booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision},
  pages={5134--5145},
  year={2025}
}

⚙️ Setup

1. Dependencies and Installation

We recommend using Python>=3.10, PyTorch>=2.1.0, and CUDA>=12.1.

conda create --name tar3d python=3.10
conda activate tar3d
pip install -U pip

# Ensure Ninja is installed
conda install Ninja

# Install the correct version of CUDA
conda install cuda -c nvidia/label/cuda-12.1.0

# Install PyTorch and xformers
# You may need to install another xformers version if you use a different PyTorch version
pip install torch==2.1.0 torchvision==0.16.0 torchaudio==2.1.0 --index-url https://download.pytorch.org/whl/cu121
pip install xformers==0.0.22.post7

# For Linux users: Install Triton 
pip install triton

# Install other requirements
pip install -r requirements.txt

2. Downloading Datasets

3. Downloading Checkpoints

We are currently unable to access the ckpts stored on the aliyun space used during the internship.
We will retrain a version as soon as possible.

⚡ Quick Start

1. Reconstructing a 3D Geometry with 3D VQ-VAE

python infer_vqvae.py

2. Conditional 3D Generation

python run.py --gpt-type i23d

💻 Training

1. Training 3D VQ-VAE

python train_vqvae.py --base configs/vqvae3d.yaml --gpus 0,1,2,3,4,5,6,7 --num_nodes 1

In practice, we first train the encoder and decoder of our VQ-VAE according to the scheme of VAE.
Then, we add the vector quantization codebook and fine-tune the entire VQ-VAE.

2. Training 3D GPT

CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 torchrun \
--nnodes=1 \
--nproc_per_node=8 \
--node_rank=0 \
--master_addr='127.0.0.1' \
--master_port=29504 \
train_gpt.py \
--gpt-type i23d \
--global-batch-size 8 "$@"

💫 Evaluation

1. 2D Evaluation (PSNR, SSIM, Clip-Score, LPIPS)

python eval_2d.py

2. 3D Evaluation (Chamfer Distance, F-Score)

python eval_3d.py

🤗 Acknowledgements

We thank the authors of the following projects for their excellent contributions to 3D generative AI!

🔥 TAR3D: Creating High-Quality 3D Assets via Next-Part Prediction