Abstract
We present ShapeFormer, a transformer-based network that produces a distribution of object completions, conditioned on incomplete, and possibly noisy, point clouds. The resultant distribution can then be sampled to generate likely completions, each exhibiting plausible shape details while being faithful to the input. To facilitate the use of transformers for 3D, we introduce a compact 3D representation, vector quantized deep implicit function (VQDIF), that utilizes spatial sparsity to represent a close approximation of a 3D shape by a short sequence of discrete variables. Experiments demonstrate that ShapeFormer outperforms prior art for shape completion from ambiguous partial inputs in terms of both completion quality and diversity. We also show that our approach effectively handles a variety of shape types, incomplete patterns, and real-world scans.
Original language | English |
---|---|
Title of host publication | Proceedings - 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2022 |
Publisher | IEEE Computer Society |
Pages | 6229-6239 |
Number of pages | 11 |
ISBN (Electronic) | 9781665469463 |
DOIs | |
State | Published - 2022 |
Event | 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2022 - New Orleans, United States Duration: 19 Jun 2022 → 24 Jun 2022 |
Publication series
Name | Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition |
---|---|
Volume | 2022-June |
ISSN (Print) | 1063-6919 |
Conference
Conference | 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2022 |
---|---|
Country/Territory | United States |
City | New Orleans |
Period | 19/06/22 → 24/06/22 |
Bibliographical note
Publisher Copyright:© 2022 IEEE.
Keywords
- 3D from multi-view and sensors
- Representation learning
- Vision + graphics