TY - JOUR
T1 - Learning character-agnostic motion for motion retargeting in 2D
AU - Aberman, Kfir
AU - Wu, Rundi
AU - Lischinski, Dani
AU - Chen, Baoquan
AU - Cohen-Or, Daniel
N1 - Publisher Copyright:
© 2019 Association for Computing Machinery.
PY - 2019/7
Y1 - 2019/7
N2 - Analyzing human motion is a challenging task with a wide variety of applications in computer vision and in graphics. One such application, of particular importance in computer animation, is the retargeting of motion from one performer to another. While humans move in three dimensions, the vast majority of human motions are captured using video, requiring 2D-to-3D pose and camera recovery, before existing retargeting approaches may be applied. In this paper, we present a new method for retargeting video-captured motion between different human performers, without the need to explicitly reconstruct 3D poses and/or camera parameters. In order to achieve our goal, we learn to extract, directly from a video, a high-level latent motion representation, which is invariant to the skeleton geometry and the camera view. Our key idea is to train a deep neural network to decompose temporal sequences of 2D poses into three components: motion, skeleton, and camera view-angle. Having extracted such a representation, we are able to re-combine motion with novel skeletons and camera views, and decode a retargeted temporal sequence, which we compare to a ground truth from a synthetic dataset. We demonstrate that our framework can be used to robustly extract human motion from videos, bypassing 3D reconstruction, and outperforming existing retargeting methods, when applied to videos in-the-wild. It also enables additional applications, such as performance cloning, video-driven cartoons, and motion retrieval.
AB - Analyzing human motion is a challenging task with a wide variety of applications in computer vision and in graphics. One such application, of particular importance in computer animation, is the retargeting of motion from one performer to another. While humans move in three dimensions, the vast majority of human motions are captured using video, requiring 2D-to-3D pose and camera recovery, before existing retargeting approaches may be applied. In this paper, we present a new method for retargeting video-captured motion between different human performers, without the need to explicitly reconstruct 3D poses and/or camera parameters. In order to achieve our goal, we learn to extract, directly from a video, a high-level latent motion representation, which is invariant to the skeleton geometry and the camera view. Our key idea is to train a deep neural network to decompose temporal sequences of 2D poses into three components: motion, skeleton, and camera view-angle. Having extracted such a representation, we are able to re-combine motion with novel skeletons and camera views, and decode a retargeted temporal sequence, which we compare to a ground truth from a synthetic dataset. We demonstrate that our framework can be used to robustly extract human motion from videos, bypassing 3D reconstruction, and outperforming existing retargeting methods, when applied to videos in-the-wild. It also enables additional applications, such as performance cloning, video-driven cartoons, and motion retrieval.
KW - And Phrases: Motion retargeting
KW - Autoencoder
KW - Motion analysis
UR - http://www.scopus.com/inward/record.url?scp=85067833855&partnerID=8YFLogxK
U2 - 10.1145/3306346.3322999
DO - 10.1145/3306346.3322999
M3 - ???researchoutput.researchoutputtypes.contributiontojournal.article???
AN - SCOPUS:85067833855
SN - 0730-0301
VL - 38
JO - ACM Transactions on Graphics
JF - ACM Transactions on Graphics
IS - 4
M1 - 75
ER -