3D Object Representations for Robot Perception
Reasoning about 3D objects is one of the most critical perception problems robots face; outside of navigation, most interactions between a robot and its environment are object-centric. Object-centric robot perception has long relied on maintaining an explicit database of 3D object models with the assumption that encountered objects will be exact copies of entries in the database; however, as robots move into unstructured environments such as human homes, the variation of encountered objects increases and maintaining an explicit object database becomes infeasible. This thesis introduces a general-purpose 3D object representation that allows the joint estimation of a previously unencountered object's class, pose, and 3D shape---crucial foundational tasks for general robot perception.
We present the first method capable of performing all three of these tasks simultaneously, Bayesian Eigenobjects (BEOs), and show that it outperforms competing approaches which estimate only object shape and class given a known object pose. BEOs use an approximate Bayesian version of Principal Component Analysis to learn an explicit low-dimensional subspace containing the 3D shapes of objects of interest, which allows for efficient shape inference at high object resolutions. We then extend BEOs to produce Hybrid Bayesian Eigenobjects (HBEOs), a fusion of linear subspace methods with modern convolutional network approaches, enabling realtime inference from a single depth image. Because HBEOs use a Convolutional Network to project partially observed objects onto the learned subspace, they allow the object to be larger and more expressive without impacting the inductive power of the model. Experimentally, we show that HBEOs offer significantly improved performance on all tasks compared to their BEO predecessors. Finally, we leverage the explicit 3D shape estimate produced by BEOs to further extend the state-of-the-art in category level pose estimation by fusing probabilistic pose predictions with a silhouette-based reconstruction prior. We also illustrate the advantages of combining both probabilistic pose estimation and shape verification, via an ablation study, and show that both portions of the system contribute to its performance. Taken together, these methods comprise a significant step towards creating a general-purpose 3D perceptual foundation for robotics systems, upon which problem-specific systems may be built.