TL;DR: AvatarGen is a 3D-aware human generative model that enables not only geometry-aware human synthesis with high-fidelity appearances but also disentangled animation controllability, while only requiring 2D images for training.
We generate avatars via AvatarGen and render them from 360-degree viewpoints (left). The avatars can be animated given a SMPL sequence (right).
Abstract
Unsupervised generation of 3D-aware clothed humans with diverse appearances and animatable geometry is essential for creating virtual human avatars and other AR/VR applications. However, existing methods are limited either to modeling rigid objects or lack generative ability, thus making it challenging to produce high-quality virtual humans and animate them. In this work, we propose AvatarGen, which generates 3D-aware clothed human with high-fidelity appearances and disentangled controllability, while using only 2D images for training.
Specifically, our method decomposes generative 3D human synthesis into SMPL-guided mapping from observation to canonical space and canonical human generation with a pre-defined pose and shape. Such decomposition enables the explicit driving of the canonical human to different poses and shapes, while preserving its identity.
AvatarGen further introduces a deformation network to learn residual deformations to better generate the fine-grained geometric details and pose-dependent dynamics.
To enhance the geometry quality of the generated avatars, AvatarGen leverages signed distance fields as geometric proxy, allowing for more direct regularization from the 3D geometric priors of SMPL.
With these designs, AvatarGen significantly outperforms previous 3D GAN methods in terms of generation quality and controllability for 3D human avatars.
Moreover, AvatarGen is competent for various applications, e.g., single-view reconstruction, re-animation, and text-guided synthesis/editing.
Method Overview
Appearance / Geometry Visualization
Drag the blue separator to see the pixel-aligned RGB and geometry. Refresh the page if RGB and geometry are out of sync.
Novel Pose Generation
Drag the blue separator to see the pixel-aligned RGB and geometry. Refresh the page if RGB and geometry are out of sync.
Given different SMPL sequences, AvatarGen can animate the generated avatars accordingly, while preserving their identities.
Portrait image reconstruction
Input and inversion
Geometry
Novel pose
Geometry
Given a target portrait, we reconstruct its 3D-aware appearance and geometry, which can then be rendered under novel camera views and re-posed using novel SMPL parameters as control signals.
Please kindly refer to Section 4.3 for more details.
Text-guided Synthesis/Editing
Text prompts
Light blue jeans
Long dress
Text-guided synthesis results of AvatarGen with multi-view rendering. We optimize the latent code of the synthesized images with a sequence of text prompts that specify different cloth styles.
Please kindly refer to Section 4.3 for more details.
Music-driven Avatar Animation
Given an audio sequence, we use the open-source audio-to-motion method Bailando to generate the corresponding SMPL sequence, and then apply it to animate the generated avatars.
We fix avatars' orientation and position for better visualization (click to play, video with audio).
Bibtex
@article{Avatargen2023,
title={AvatarGen: A 3D Generative Model for Animatable Human Avatars},
author={Zhang, Jianfeng and Jiang, Zihang and Yang, Dingdong and Xu, Hongyi and Shi, Yichun and Song, Guoxian and Xu, Zhongcong and Wang, Xinchao and Feng, Jiashi},
journal={ArXiv},
year={2023}
}