1Harbin Institute of Technology 2Tsinghua University 3Communication University of China
*Corresponding author †Work done during an internship at Tsinghua University
The ability to animate photo-realistic head avatars reconstructed from monocular portrait video sequences represents a crucial step in bridging the gap between the virtual and real worlds. Recent advancements in head avatar techniques, including explicit 3D morphable meshes (3DMM), point clouds, and neural implicit representation have been exploited for this ongoing research. However, 3DMM-based methods are constrained by their fixed topologies, point-based approaches suffer from a heavy training burden due to the extensive quantity of points involved, and the last ones suffer from limitations in deformation flexibility and rendering efficiency. In response to these challenges, we propose MonoGaussianAvatar (Monocular Gaussian Point-based Head Avatar), a novel approach that harnesses 3D Gaussian point representation coupled with a Gaussian deformation field to learn explicit head avatars from monocular portrait videos. We define our head avatars with Gaussian points characterized by adaptable shapes, enabling flexible topology. These points exhibit movement with a Gaussian deformation field in alignment with the target pose and expression of a person, facilitating efficient deformation. Additionally, the Gaussian points have controllable shape, size, color, and opacity combined with Gaussian splatting, allowing for efficient training and rendering. Experiments demonstrate the superior performance of our method, which achieves state-of-the-art results among previous methods.
Overview of MonoGaussianAvatar. We model the human head as a learned parametric deformed 3D Gaussian points, comprising the mean position, the color, the opacity, the rotation, and the scale. These parameters collectively characterize the subject’s geometric and intrinsic appearance in the deformed space. The left module details the initialization process of our Gaussian representation, explaining how we obtain the initial mean position, derive the other Gaussian parameters, and transform the mean position xc from the initialized space to the canonical space. The middle module introduces the transformation process of the mean position from the canonical space to the deformed space using LBS. It also describes how the other Gaussian parameters in the deformed space are adjusted to fit the transformation of the mean position through the Gaussian parameter deformation field. The right-top module presents the rendering process with 3D Gaussian parameters in the deformed space. The right-bottom demonstrates the strategy of point insertion and deletion.
@article{chen2023monogaussianavatar,
title={MonoGaussianAvatar: Monocular Gaussian Point-based Head Avatar},
author={Chen, Yufan and Wang, Lizhen and Li, Qijing and Xiao, Hongjiang and Zhang, Shengping and Yao, Hongxun and Liu, Yebin},
journal={arXiv},
year={2023}
}