A novel multimodal computer-aided diagnostic model for pulmonary embolism based on hybrid transformer-CNN and tabular transformer

Wei Zhang, Yu Gu*, Hao Ma, Lidong Yang, Baohua Zhang, Jing Wang, Meng Chen, Xiaoqi Lu, Jianjun Li, Xin Liu, Dahua Yu, Ying Zhao, Siyuan Tang, Qun He

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

Abstract

Pulmonary embolism (PE) is a life-threatening clinical problem where early diagnosis and prompt treatment are essential to reducing morbidity and mortality. While the combination of CT images and electronic health records (EHR) can help improve computer-aided diagnosis, there are many challenges that need to be addressed. The primary objective of this study is to leverage both 3D CT images and EHR data to improve PE diagnosis. First, for 3D CT images, we propose a network combining Swin Transformers with 3D CNNs, enhanced by a Multi-Scale Feature Fusion (MSFF) module to address fusion challenges between different encoders. Secondly, we introduce a Polarized Self-Attention (PSA) module to enhance the attention mechanism within the 3D CNN. And then, for EHR data, we design the Tabular Transformer for effective feature extraction. Finally, we design and evaluate three multimodal attention fusion modules to integrate CT and EHR features, selecting the most effective one for final fusion. Experimental results on the RadFusion dataset demonstrate that our model significantly outperforms existing state-of-the-art methods, achieving an AUROC of 0.971, an F1 score of 0.926, and an accuracy of 0.920. These results underscore the effectiveness and innovation of our multimodal approach in advancing PE diagnosis.

Original languageEnglish
JournalPhysical and Engineering Sciences in Medicine
DOIs
Publication statusAccepted/In press - 2025
Externally publishedYes

Keywords

  • 3DCNN
  • EHR
  • Multimodal diagnoses
  • Pulmonary embolism
  • Swin transformer

Fingerprint

Dive into the research topics of 'A novel multimodal computer-aided diagnostic model for pulmonary embolism based on hybrid transformer-CNN and tabular transformer'. Together they form a unique fingerprint.

Cite this