mirrors/mm-diff

👉 MM-Diff: High-Fidelity Image Personalization via Multi-Modal Condition Integration

Zhichao Wei, Qingkun Su, Long Qin, Weizhi Wang

🔥 Examples

Result

🎇 Pipeline

method

We propose MM-Diff, a unified and tuning-free image personalization framework capable of generating high-fidelity images of both single and multiple subjects in seconds. On the left, the vision-augmented text embeddings and a small set of detail-rich subject embeddings are injected into the diffusion model through the well-designed multi-modal cross-attention. On the right, we illustrate the details of the innovative implementation of cross-attention with LoRAs, as well as the attention constraints that facilitate multi-subject generation.

🔧 Preparations

Environment Setup

conda create -n mmdiff python=3.9
conda activate mmdiff
pip install -r requirements.txt

Download Models

We provide the pretrained checkpoints. One can download and put them in the root path of the current project. To run the demo, you should also download the following models:

Training Data Annotation (Optional)

We provide the demo code for training data annotation in data_annotation. To avoid package conflicts, it is best to configure a new conda or docker environment.

python data_labeling_imagenet.py --data_path="path_to_data"

✨ Customized Generation

Currently, we provide two ways to customize your images as follows. We also provide some reference images in demo_data.

Use Jupyter Notebook

mmdiff_demo, image generation with single reference image.
mmdiff_multiple_reference_demo, image generation with multiple reference images.
mmdiff_id_mixing_demo, image generation with identity mixing.

Start a Gradio Demo

python mmdiff_gradio_demo.py

🚩 Updates

[2024/05/30] Fuse lora weights into orignal weights to improve inference speed.
[2024/05/29] Release an enhanced version of MM-Diff for portrait generation, employing face embeddings to improve subject fidelity.

Citation

If you find MM-Diff useful for your research, please cite our paper:

@article{wei2024mm,
  title={MM-Diff: High-Fidelity Image Personalization via Multi-Modal Condition Integration},
  author={Wei, Zhichao and Su, Qingkun and Qin, Long and Wang, Weizhi},
  journal={arXiv preprint arXiv:2403.15059},
  year={2024}
}

Acknowledgements

This code is built on some excellent repos, including diffusers, FastComposer, PhotoMaker and IP-Adapter. Thanks for their great work!