Category-Level 6-D Object Pose Estimation With Learnable Prior Embeddings for Robotic Grasping

Sheng Yu; Di Hua Zhai; Jian Yin; Yuanqing Xia

doi:10.1109/TIE.2025.3555019

Category-Level 6-D Object Pose Estimation With Learnable Prior Embeddings for Robotic Grasping

Sheng Yu, Di Hua Zhai^*, Jian Yin, Yuanqing Xia

^*Corresponding author for this work

Research output: Contribution to journal › Article › peer-review

Abstract

Category-level object pose estimation is crucial for predicting the poses of unknown objects within known categories. While methods relying on category-level object pose estimation with category priors necessitate prior training on datasets to acquire object priors, approaches for category-level object pose estimation without category priors lack relevant geometric information. To address these challenges, this article introduces a category-level object pose estimation method, PENet, based on learnable priors. The method utilizes a learnable category prior embedding to represent prior features and proposes a transformer-based prior embedding deformation module to initially deform the prior embedding from a global perspective to match the actual target object. Additionally, it introduces a transformer-based correspondence module to establish correspondence between instances and priors from a global perspective in order to further align the deformed feature embedding with the scene point cloud features. Experimental results demonstrate that the proposed method surpasses existing methods, achieving state-of-the-art performance on the dataset. Furthermore, the generalization ability of the proposed method is evaluated by applying PENet to object pose estimation on the Wild6D dataset, where it outperforms all related methods. Finally, the application of PENet to robotic grasping experiments on a real UR3 robot results in a higher success rate compared to previous methods.

Original language	English
Journal	IEEE Transactions on Industrial Electronics
DOIs	http://doi.org/10.1109/TIE.2025.3555019
Publication status	Accepted/In press - 2025
Externally published	Yes

Keywords

Category-level 6-D pose
grasping detection
object pose estimation
robot

Access to Document

10.1109/TIE.2025.3555019

Cite this

@article{92ae2345be9e40399b736e399c055271,

title = "Category-Level 6-D Object Pose Estimation With Learnable Prior Embeddings for Robotic Grasping",

abstract = "Category-level object pose estimation is crucial for predicting the poses of unknown objects within known categories. While methods relying on category-level object pose estimation with category priors necessitate prior training on datasets to acquire object priors, approaches for category-level object pose estimation without category priors lack relevant geometric information. To address these challenges, this article introduces a category-level object pose estimation method, PENet, based on learnable priors. The method utilizes a learnable category prior embedding to represent prior features and proposes a transformer-based prior embedding deformation module to initially deform the prior embedding from a global perspective to match the actual target object. Additionally, it introduces a transformer-based correspondence module to establish correspondence between instances and priors from a global perspective in order to further align the deformed feature embedding with the scene point cloud features. Experimental results demonstrate that the proposed method surpasses existing methods, achieving state-of-the-art performance on the dataset. Furthermore, the generalization ability of the proposed method is evaluated by applying PENet to object pose estimation on the Wild6D dataset, where it outperforms all related methods. Finally, the application of PENet to robotic grasping experiments on a real UR3 robot results in a higher success rate compared to previous methods.",

keywords = "Category-level 6-D pose, grasping detection, object pose estimation, robot",

author = "Sheng Yu and Zhai, \{Di Hua\} and Jian Yin and Yuanqing Xia",

note = "Publisher Copyright: {\textcopyright} 1982-2012 IEEE.",

year = "2025",

doi = "10.1109/TIE.2025.3555019",

language = "English",

journal = "IEEE Transactions on Industrial Electronics",

issn = "0278-0046",

publisher = "IEEE Industrial Electronics Society",

}

TY - JOUR

T1 - Category-Level 6-D Object Pose Estimation With Learnable Prior Embeddings for Robotic Grasping

AU - Yu, Sheng

AU - Zhai, Di Hua

AU - Yin, Jian

AU - Xia, Yuanqing

PY - 2025

Y1 - 2025

N2 - Category-level object pose estimation is crucial for predicting the poses of unknown objects within known categories. While methods relying on category-level object pose estimation with category priors necessitate prior training on datasets to acquire object priors, approaches for category-level object pose estimation without category priors lack relevant geometric information. To address these challenges, this article introduces a category-level object pose estimation method, PENet, based on learnable priors. The method utilizes a learnable category prior embedding to represent prior features and proposes a transformer-based prior embedding deformation module to initially deform the prior embedding from a global perspective to match the actual target object. Additionally, it introduces a transformer-based correspondence module to establish correspondence between instances and priors from a global perspective in order to further align the deformed feature embedding with the scene point cloud features. Experimental results demonstrate that the proposed method surpasses existing methods, achieving state-of-the-art performance on the dataset. Furthermore, the generalization ability of the proposed method is evaluated by applying PENet to object pose estimation on the Wild6D dataset, where it outperforms all related methods. Finally, the application of PENet to robotic grasping experiments on a real UR3 robot results in a higher success rate compared to previous methods.

AB - Category-level object pose estimation is crucial for predicting the poses of unknown objects within known categories. While methods relying on category-level object pose estimation with category priors necessitate prior training on datasets to acquire object priors, approaches for category-level object pose estimation without category priors lack relevant geometric information. To address these challenges, this article introduces a category-level object pose estimation method, PENet, based on learnable priors. The method utilizes a learnable category prior embedding to represent prior features and proposes a transformer-based prior embedding deformation module to initially deform the prior embedding from a global perspective to match the actual target object. Additionally, it introduces a transformer-based correspondence module to establish correspondence between instances and priors from a global perspective in order to further align the deformed feature embedding with the scene point cloud features. Experimental results demonstrate that the proposed method surpasses existing methods, achieving state-of-the-art performance on the dataset. Furthermore, the generalization ability of the proposed method is evaluated by applying PENet to object pose estimation on the Wild6D dataset, where it outperforms all related methods. Finally, the application of PENet to robotic grasping experiments on a real UR3 robot results in a higher success rate compared to previous methods.

KW - Category-level 6-D pose

KW - grasping detection

KW - object pose estimation

KW - robot

UR - http://www.scopus.com/pages/publications/105004691253

U2 - 10.1109/TIE.2025.3555019

DO - 10.1109/TIE.2025.3555019

M3 - Article

AN - SCOPUS:105004691253

SN - 0278-0046

JO - IEEE Transactions on Industrial Electronics

JF - IEEE Transactions on Industrial Electronics

ER -

Category-Level 6-D Object Pose Estimation With Learnable Prior Embeddings for Robotic Grasping

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this