Loopy supports various visual and audio styles. It can generate vivid motion details from audio alone, such as non-speech movements like sighing, emotion-driven eyebrow and eye movements, and natural head movements.
* Note that all results in this page use the first frame as reference image and conditioned on audio only without need of spatial conditions as templates.
Loopy can generate motion-adapted synthesis results for the same reference image based on different audio inputs, whether they are rapid, soothing, or realistic singing performances.
Additional results demonstrating singing
Additional results about non-human realistic images
Loopy also supports input images with side profiles effectively
More results about realistic portrait inputs
The purpose of this work is only for research. The images and audios used in these demos are from public sources. If there are any concerns, please contact us (jianwen.alan@gmail.com) and we will delete it in time.
Some figures about film and interview in More Video Results are from Celebv-HQ. Some test audios and images are from Echomimic, EMO, and VASA-1 et al. Thanks to these great work!
If you find this project is useful to your research, please cite us:
@article{jiang2024loopy,
title={Loopy: Taming Audio-Driven Portrait Avatar with Long-Term Motion Dependency},
author={Jiang, Jianwen and Liang, Chao and Yang, Jiaqi and Lin, Gaojie and Zhong, Tianyun and Zheng, Yanbo},
journal={arXiv preprint arXiv:2409.02634},
year={2024}
}