We propose MM-Diff, a unified and tuning-free image personalization framework capable of generating high-fidelity images of both single and multiple subjects in seconds. On the left, the vision-augmented text embeddings and a small set of detail-rich subject embeddings are injected into the diffusion model through the well-designed multi-modal cross-attention. On the right, we illustrate the details of the innovative implementation of cross-attention with LoRAs, as well as the attention constraints that facilitate multi-subject generation.
We provide the pretrained checkpoints. One can download and put them in the root path of the current project. To run the demo, you should also download the following models:
We provide the demo code for training data annotation in data_annotation. To avoid package conflicts, it is best to configure a new conda or docker environment.
[2024/05/30] Fuse lora weights into orignal weights to improve inference speed.
[2024/05/29] Release an enhanced version of MM-Diff for portrait generation, employing face embeddings to improve subject fidelity.
Citation
If you find MM-Diff useful for your research, please cite our paper:
@article{wei2024mm,
title={MM-Diff: High-Fidelity Image Personalization via Multi-Modal Condition Integration},
author={Wei, Zhichao and Su, Qingkun and Qin, Long and Wang, Weizhi},
journal={arXiv preprint arXiv:2403.15059},
year={2024}
}
Zhichao Wei, Qingkun Su, Long Qin, Weizhi Wang
🔥 Examples
🎇 Pipeline
We propose MM-Diff, a unified and tuning-free image personalization framework capable of generating high-fidelity images of both single and multiple subjects in seconds. On the left, the vision-augmented text embeddings and a small set of detail-rich subject embeddings are injected into the diffusion model through the well-designed multi-modal cross-attention. On the right, we illustrate the details of the innovative implementation of cross-attention with LoRAs, as well as the attention constraints that facilitate multi-subject generation.
🔧 Preparations
Environment Setup
Download Models
We provide the pretrained checkpoints. One can download and put them in the root path of the current project. To run the demo, you should also download the following models:
Training Data Annotation (Optional)
We provide the demo code for training data annotation in data_annotation. To avoid package conflicts, it is best to configure a new conda or docker environment.
✨ Customized Generation
Currently, we provide two ways to customize your images as follows. We also provide some reference images in demo_data.
Use Jupyter Notebook
Start a Gradio Demo
🚩 Updates
Citation
If you find MM-Diff useful for your research, please cite our paper:
Acknowledgements
This code is built on some excellent repos, including diffusers, FastComposer, PhotoMaker and IP-Adapter. Thanks for their great work!