AvatarGen - Project Page

¹National University of Singapore, ²ByteDance

ArXiv 2023

TL;DR: AvatarGen is a 3D-aware human generative model that enables not only geometry-aware human synthesis with high-fidelity appearances but also disentangled animation controllability, while only requiring 2D images for training.

We generate avatars via AvatarGen and render them from 360-degree viewpoints (left). The avatars can be animated given a SMPL sequence (right).

Abstract

Unsupervised generation of 3D-aware clothed humans with diverse appearances and animatable geometry is essential for creating virtual human avatars and other AR/VR applications. However, existing methods are limited either to modeling rigid objects or lack generative ability, thus making it challenging to produce high-quality virtual humans and animate them. In this work, we propose AvatarGen, which generates 3D-aware clothed human with high-fidelity appearances and disentangled controllability, while using only 2D images for training. Specifically, our method decomposes generative 3D human synthesis into SMPL-guided mapping from observation to canonical space and canonical human generation with a pre-defined pose and shape. Such decomposition enables the explicit driving of the canonical human to different poses and shapes, while preserving its identity. AvatarGen further introduces a deformation network to learn residual deformations to better generate the fine-grained geometric details and pose-dependent dynamics. To enhance the geometry quality of the generated avatars, AvatarGen leverages signed distance fields as geometric proxy, allowing for more direct regularization from the 3D geometric priors of SMPL. With these designs, AvatarGen significantly outperforms previous 3D GAN methods in terms of generation quality and controllability for 3D human avatars. Moreover, AvatarGen is competent for various applications, e.g., single-view reconstruction, re-animation, and text-guided synthesis/editing.

Method Overview

Appearance / Geometry Visualization

Drag the blue separator to see the pixel-aligned RGB and geometry. Refresh the page if RGB and geometry are out of sync.

Novel Pose Generation

Drag the blue separator to see the pixel-aligned RGB and geometry. Refresh the page if RGB and geometry are out of sync.

Given different SMPL sequences, AvatarGen can animate the generated avatars accordingly, while preserving their identities.

Portrait image reconstruction

Input and inversion	Geometry	Novel pose	Geometry

Given a target portrait, we reconstruct its 3D-aware appearance and geometry, which can then be rendered under novel camera views and re-posed using novel SMPL parameters as control signals. Please kindly refer to Section 4.3 for more details.

Text-guided Synthesis/Editing

Text prompts
Light blue jeans
Long dress

Text-guided synthesis results of AvatarGen with multi-view rendering. We optimize the latent code of the synthesized images with a sequence of text prompts that specify different cloth styles. Please kindly refer to Section 4.3 for more details.

Music-driven Avatar Animation

Given an audio sequence, we use the open-source audio-to-motion method Bailando to generate the corresponding SMPL sequence, and then apply it to animate the generated avatars. We fix avatars' orientation and position for better visualization (click to play, video with audio).

Bibtex

@article{Avatargen2023,
    title={AvatarGen: A 3D Generative Model for Animatable Human Avatars},
    author={Zhang, Jianfeng and Jiang, Zihang and Yang, Dingdong and Xu, Hongyi and Shi, Yichun and Song, Guoxian and Xu, Zhongcong and Wang, Xinchao and Feng, Jiashi},
    journal={ArXiv},
    year={2023}
}