FLPC: Fusing language and point cloud for 3D object classification

Xiaozheng Gan; Chengtian Song; Jili Li; Lizhi Pan; Keyu Xu

doi:10.1016/j.eswa.2025.128430

FLPC: Fusing language and point cloud for 3D object classification

Xiaozheng Gan, Chengtian Song^*, Jili Li, Lizhi Pan, Keyu Xu

^*此作品的通讯作者

机电学院

科研成果: 期刊稿件 › 文章 › 同行评审

摘要

This study enhances the accuracy of point cloud classification by introducing novel fusion architecture that fuses language with point cloud, drawing inspiration from recent advancements in multimodal fusion. Conventional neural networks depend extensively on images as intermediaries between language and point clouds, a methodology that lacks robustness and undermines accuracy. To tackle this, we propose FLPC, a groundbreaking fusion method for point cloud classification that integrates semantic information from textual descriptions with geometric features extracted from point cloud data using an attention mechanism. Our approach leverages a pre-trained model to extract both geometric and semantic features from the input data. These features are subsequently integrated through a classifier module, which is designed to effectively utilize the two types of visual features to enhance classification performance. Within the classifier module, three distinct fusion attention architectures (CFA, SFA, PFA) are proposed. This innovative design, which combines point cloud features with language features, results in a significant improvement in overall performance. A comprehensive set of extensive experiments reveals that both CFA and SFA showcase competitive performance. Significantly, PFA not only markedly outperforms the previous multimodal classification baseline model but also eclipses traditional unimodal classification models, achieving state-of-the-art accuracy. Specifically, on the ModelNet40 benchmark, the proposed FLPC method elevates the performance of PointMLP by approximately 1.5 %. Correspondingly, on the ScanObjectNN benchmark, it surpasses PointMLP by 8.7 %. These results underscore the efficacy of FLPC in leveraging multimodal information for 3D classification tasks, setting a new benchmark in the field.

源语言	英语
文章编号	128430
期刊	Expert Systems with Applications
卷	296
DOI	http://doi.org/10.1016/j.eswa.2025.128430
出版状态	已出版 - 15 1月 2026

访问文件

10.1016/j.eswa.2025.128430

其它文件与链接

链接到 Scopus 的出版物

引用此

@article{5866c58f2d534a06909fcd322024e710,

title = "FLPC: Fusing language and point cloud for 3D object classification",

abstract = "This study enhances the accuracy of point cloud classification by introducing novel fusion architecture that fuses language with point cloud, drawing inspiration from recent advancements in multimodal fusion. Conventional neural networks depend extensively on images as intermediaries between language and point clouds, a methodology that lacks robustness and undermines accuracy. To tackle this, we propose FLPC, a groundbreaking fusion method for point cloud classification that integrates semantic information from textual descriptions with geometric features extracted from point cloud data using an attention mechanism. Our approach leverages a pre-trained model to extract both geometric and semantic features from the input data. These features are subsequently integrated through a classifier module, which is designed to effectively utilize the two types of visual features to enhance classification performance. Within the classifier module, three distinct fusion attention architectures (CFA, SFA, PFA) are proposed. This innovative design, which combines point cloud features with language features, results in a significant improvement in overall performance. A comprehensive set of extensive experiments reveals that both CFA and SFA showcase competitive performance. Significantly, PFA not only markedly outperforms the previous multimodal classification baseline model but also eclipses traditional unimodal classification models, achieving state-of-the-art accuracy. Specifically, on the ModelNet40 benchmark, the proposed FLPC method elevates the performance of PointMLP by approximately 1.5 \%. Correspondingly, on the ScanObjectNN benchmark, it surpasses PointMLP by 8.7 \%. These results underscore the efficacy of FLPC in leveraging multimodal information for 3D classification tasks, setting a new benchmark in the field.",

keywords = "Attention mechanism, Classification, Multimodal fusion, Point cloud",

author = "Xiaozheng Gan and Chengtian Song and Jili Li and Lizhi Pan and Keyu Xu",

note = "Publisher Copyright: {\textcopyright} 2025 Elsevier Ltd",

year = "2026",

month = jan,

day = "15",

doi = "10.1016/j.eswa.2025.128430",

language = "English",

volume = "296",

journal = "Expert Systems with Applications",

issn = "0957-4174",

publisher = "Elsevier Ltd.",

}

TY - JOUR

T1 - FLPC

T2 - Fusing language and point cloud for 3D object classification

AU - Gan, Xiaozheng

AU - Song, Chengtian

AU - Li, Jili

AU - Pan, Lizhi

AU - Xu, Keyu

PY - 2026/1/15

Y1 - 2026/1/15

N2 - This study enhances the accuracy of point cloud classification by introducing novel fusion architecture that fuses language with point cloud, drawing inspiration from recent advancements in multimodal fusion. Conventional neural networks depend extensively on images as intermediaries between language and point clouds, a methodology that lacks robustness and undermines accuracy. To tackle this, we propose FLPC, a groundbreaking fusion method for point cloud classification that integrates semantic information from textual descriptions with geometric features extracted from point cloud data using an attention mechanism. Our approach leverages a pre-trained model to extract both geometric and semantic features from the input data. These features are subsequently integrated through a classifier module, which is designed to effectively utilize the two types of visual features to enhance classification performance. Within the classifier module, three distinct fusion attention architectures (CFA, SFA, PFA) are proposed. This innovative design, which combines point cloud features with language features, results in a significant improvement in overall performance. A comprehensive set of extensive experiments reveals that both CFA and SFA showcase competitive performance. Significantly, PFA not only markedly outperforms the previous multimodal classification baseline model but also eclipses traditional unimodal classification models, achieving state-of-the-art accuracy. Specifically, on the ModelNet40 benchmark, the proposed FLPC method elevates the performance of PointMLP by approximately 1.5 %. Correspondingly, on the ScanObjectNN benchmark, it surpasses PointMLP by 8.7 %. These results underscore the efficacy of FLPC in leveraging multimodal information for 3D classification tasks, setting a new benchmark in the field.

AB - This study enhances the accuracy of point cloud classification by introducing novel fusion architecture that fuses language with point cloud, drawing inspiration from recent advancements in multimodal fusion. Conventional neural networks depend extensively on images as intermediaries between language and point clouds, a methodology that lacks robustness and undermines accuracy. To tackle this, we propose FLPC, a groundbreaking fusion method for point cloud classification that integrates semantic information from textual descriptions with geometric features extracted from point cloud data using an attention mechanism. Our approach leverages a pre-trained model to extract both geometric and semantic features from the input data. These features are subsequently integrated through a classifier module, which is designed to effectively utilize the two types of visual features to enhance classification performance. Within the classifier module, three distinct fusion attention architectures (CFA, SFA, PFA) are proposed. This innovative design, which combines point cloud features with language features, results in a significant improvement in overall performance. A comprehensive set of extensive experiments reveals that both CFA and SFA showcase competitive performance. Significantly, PFA not only markedly outperforms the previous multimodal classification baseline model but also eclipses traditional unimodal classification models, achieving state-of-the-art accuracy. Specifically, on the ModelNet40 benchmark, the proposed FLPC method elevates the performance of PointMLP by approximately 1.5 %. Correspondingly, on the ScanObjectNN benchmark, it surpasses PointMLP by 8.7 %. These results underscore the efficacy of FLPC in leveraging multimodal information for 3D classification tasks, setting a new benchmark in the field.

KW - Attention mechanism

KW - Classification

KW - Multimodal fusion

KW - Point cloud

UR - http://www.scopus.com/pages/publications/105010852452

U2 - 10.1016/j.eswa.2025.128430

DO - 10.1016/j.eswa.2025.128430

M3 - Article

AN - SCOPUS:105010852452

SN - 0957-4174

VL - 296

JO - Expert Systems with Applications

JF - Expert Systems with Applications

M1 - 128430

ER -

FLPC: Fusing language and point cloud for 3D object classification

摘要

访问文件

其它文件与链接

指纹

引用此