Category-Level 6-D Object Pose Estimation With Learnable Prior Embeddings for Robotic Grasping

Sheng Yu, Di Hua Zhai*, Jian Yin, Yuanqing Xia

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

Abstract

Category-level object pose estimation is crucial for predicting the poses of unknown objects within known categories. While methods relying on category-level object pose estimation with category priors necessitate prior training on datasets to acquire object priors, approaches for category-level object pose estimation without category priors lack relevant geometric information. To address these challenges, this article introduces a category-level object pose estimation method, PENet, based on learnable priors. The method utilizes a learnable category prior embedding to represent prior features and proposes a transformer-based prior embedding deformation module to initially deform the prior embedding from a global perspective to match the actual target object. Additionally, it introduces a transformer-based correspondence module to establish correspondence between instances and priors from a global perspective in order to further align the deformed feature embedding with the scene point cloud features. Experimental results demonstrate that the proposed method surpasses existing methods, achieving state-of-the-art performance on the dataset. Furthermore, the generalization ability of the proposed method is evaluated by applying PENet to object pose estimation on the Wild6D dataset, where it outperforms all related methods. Finally, the application of PENet to robotic grasping experiments on a real UR3 robot results in a higher success rate compared to previous methods.

Original languageEnglish
JournalIEEE Transactions on Industrial Electronics
DOIs
Publication statusAccepted/In press - 2025
Externally publishedYes

Keywords

  • Category-level 6-D pose
  • grasping detection
  • object pose estimation
  • robot

Fingerprint

Dive into the research topics of 'Category-Level 6-D Object Pose Estimation With Learnable Prior Embeddings for Robotic Grasping'. Together they form a unique fingerprint.

Cite this