AvatarGen: A 3D Generative Model for Animatable Human

1National University of Singapore, 2ByteDance
ArXiv 2023
TL;DR: AvatarGen is a 3D-aware human generative model that enables not only geometry-aware human synthesis with high-fidelity appearances but also disentangled animation controllability, while only requiring 2D images for training.

AvatarGen can synthesize 3D-aware human avatars with detailed geometries and diverse appearances under disentangled control over (a) camera viewpoints, (b) human poses and (c) shapes. (d) Moreover, given the SMPL control signals, the generated avatars can be animated accordingly.

We generate avatars via AvatarGen and render them from 360-degree viewpoints (left). The avatars can be animated given a SMPL sequence (right).

Unsupervised generation of 3D-aware clothed humans with diverse appearances and animatable geometry is essential for creating virtual human avatars and other AR/VR applications. However, existing methods are limited either to modeling rigid objects or lack generative ability, thus making it challenging to produce high-quality virtual humans and animate them. In this work, we propose AvatarGen, which generates 3D-aware clothed human with high-fidelity appearances and disentangled controllability, while using only 2D images for training. Specifically, our method decomposes generative 3D human synthesis into SMPL-guided mapping from observation to canonical space and canonical human generation with a pre-defined pose and shape. Such decomposition enables the explicit driving of the canonical human to different poses and shapes, while preserving its identity. AvatarGen further introduces a deformation network to learn residual deformations to better generate the fine-grained geometric details and pose-dependent dynamics. To enhance the geometry quality of the generated avatars, AvatarGen leverages signed distance fields as geometric proxy, allowing for more direct regularization from the 3D geometric priors of SMPL. With these designs, AvatarGen significantly outperforms previous 3D GAN methods in terms of generation quality and controllability for 3D human avatars. Moreover, AvatarGen is competent for various applications, e.g., single-view reconstruction, re-animation, and text-guided synthesis/editing.
Method Overview
Music-driven Avatar Animation

Given an audio sequence, we use the open-source audio-to-motion method Bailando to generate the corresponding SMPL sequence, and then apply it to animate the generated avatars. We fix avatars' orientation and position for better visualization (click to play, video with audio).

    title={AvatarGen: A 3D Generative Model for Animatable Human Avatars},
    author={Zhang, Jianfeng and Jiang, Zihang and Yang, Dingdong and Xu, Hongyi and Shi, Yichun and Song, Guoxian and Xu, Zhongcong and Wang, Xinchao and Feng, Jiashi},