← Back

Zero123-6D: Zero-Shot Novel View Synthesis for RGB Category-Level 6D Pose Estimation

Francesco Di Felice, Alberto Remus, Stefano Gasperini, Benjamin Busam, Lionel Ott, Federico Tombari, Roland Siegwart, Carlo Alberto Avizzano

PDF

Key figure (auto-extracted from paper)

Abstract

Estimating the pose of objects through vision is essential to make robotic platforms interact with the environ- ment. Yet, it presents many challenges, often related to the lack of flexibility and generalizability of state-of-the-art solu- tions. Diffusion models are a cutting-edge neural architecture transforming 2D and 3D computer vision, outlining remarkable performances in zero-shot novel-view synthesis. Such a use case is particularly intriguing for reconstructing 3D objects. However, localizing objects in unstructured environments is rather unexplored. To this end, this work presents Zero123- 6D, the first work to demonstrate the utility of Diffusion Model-based novel-view-synthesizers in enhancing RGB 6D pose estimation at category-level, by integrating them with feature extraction techniques. Novel View Synthesis allows to obtain a coarse pose that is refined through an online optimization method introduced in this work to deal with intra- category geometric differences. In such a way, the outlined method shows reduction in data requirements, removal of the necessity of depth information in zero-shot category-level 6D pose estimation task, and increased performance, quantitatively demonstrated through experiments on the CO3D dataset.

Index terms

RGB-D Perception Deep Learning for Visual Perception Deep Learning Methods