For arbitrary images and videos, we use a face detector to detect and swap the corresponding face parts. Cropped images will be resized to 256*256 in order to fit to our model.
We modify the distributed traininig framework used in that of the First Order Motion Model. Instead of using torch.nn.DataParallel (DP), we adopt torch.distributed.DistributedDataParallel (DDP) for faster training and more balanced GPU memory load. The training procedure is divided into two steps: (1) Pretrain the 3DMM estimator, (2) End-to-end Training.
We use VoxCeleb1 to train and evaluate our model. Original Youtube videos are downloaded, cropped and splited following the instructions from video-preprocessing.
a. To obtain the facial landmark meta data from the preprocessed videos, run:
b. (Optional) Extract images from videos for 3DMM pretraining:
python extract_imgs.py
Citation
If you find our work useful to your research, please consider citing:
@article{wang2021safa,
title={SAFA: Structure Aware Face Animation},
author={Wang, Qiulin and Zhang, Lu and Li, Bo},
journal={arXiv preprint arXiv:2111.04928},
year={2021}
}
SAFA: Structure Aware Face Animation (3DV2021)
Official Pytorch Implementation of 3DV2021 paper: SAFA: Structure Aware Face Animation.
Getting Started
Installation
Python 3.6 or higher is recommended.
1. Install PyTorch3D
Follow the guidance from: https://github.com/facebookresearch/pytorch3d/blob/master/INSTALL.md.
2. Install Other Dependencies
To install other dependencies run:
Usage
1. Preparation
a. Download FLAME model, choose FLAME 2020 and unzip it, put
generic_model.pklunder./modules/data.b. Download
head_template.obj,landmark_embedding.npy,uv_face_eye_mask.pnganduv_face_mask.pngfrom DECA/data, and put them under./module/data.c. Download SAFA model checkpoint from Google Drive and put it under
./ckpt.d. (Optional, required by the face swap demo) Download the pretrained face parser from face-parsing.PyTorch and put it under
./face_parsing/cp.2. Demos
We provide demos for animation and face swap.
a. Animation demo
b. Face swap demo We adopt face-parsing.PyTorch for indicating the face regions in both the source and driving images.
For preprocessed source images and driving videos, run:
For arbitrary images and videos, we use a face detector to detect and swap the corresponding face parts. Cropped images will be resized to 256*256 in order to fit to our model.
Training
We modify the distributed traininig framework used in that of the First Order Motion Model. Instead of using torch.nn.DataParallel (DP), we adopt torch.distributed.DistributedDataParallel (DDP) for faster training and more balanced GPU memory load. The training procedure is divided into two steps: (1) Pretrain the 3DMM estimator, (2) End-to-end Training.
3DMM Estimator Pre-training
End-to-end Training
Evaluation / Inference
Video Reconstrucion
Image Animation
3D Face Reconstruction
Dataset and Preprocessing
We use VoxCeleb1 to train and evaluate our model. Original Youtube videos are downloaded, cropped and splited following the instructions from video-preprocessing.
a. To obtain the facial landmark meta data from the preprocessed videos, run:
b. (Optional) Extract images from videos for 3DMM pretraining:
Citation
If you find our work useful to your research, please consider citing:
License
Please refer to the LICENSE file.
Acknowledgement
Here we provide the list of external sources that we use or adapt from: