A Latent Diffusion Model for Heart Sound Synthesis

Yang Tan; Haojie Zhang; Yi Chang; Qingrong Wu; Kun Qian; Bin Hu; Bjorn W. Schuller; Yoshiharu Yamamoto

doi:10.1109/ISCAIT64916.2025.11010636

A Latent Diffusion Model for Heart Sound Synthesis

Yang Tan, Haojie Zhang, Yi Chang, Qingrong Wu, Kun Qian^*, Bin Hu^*, Bjorn W. Schuller, Yoshiharu Yamamoto

^*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › peer-review

Abstract

There are already many analyses of heart sounds used to develop diagnostic systems for the heart condition. However, the existing heart sound database state is not sufficient to train a large number of deep learning models and is not balanced - the normal heart sounds are always more present than abnormal heart sounds. Hence, we are interested in algorithms that can generate heart sounds as they can enhance the current database state. We enter the field of large-scale modelling in medical synthesis by proposing a latent diffusion model for heart sound generation, which can generate highly realistic heart sounds. We further guide the synthesis process through text prompts and labels, revealing the research field of prompted heart sound synthesis. In terms of experimental results, our proposed method achieves good results compared to existing mathematical models. At the same time, when the generated audio is used as an unlabelled dataset in semi-supervised learning, it shows an improvement effect on the classification model. This indicates that the heart sounds generated by the diffusion model bear great value.

Original language	English
Title of host publication	2025 4th International Symposium on Computer Applications and Information Technology, ISCAIT 2025
Publisher	Institute of Electrical and Electronics Engineers Inc.
Pages	1314-1319
Number of pages	6
ISBN (Electronic)	9798331542856
DOIs	http://doi.org/10.1109/ISCAIT64916.2025.11010636
Publication status	Published - 2025
Externally published	Yes
Event	4th International Symposium on Computer Applications and Information Technology, ISCAIT 2025 - Xi'an, China Duration: 21 Mar 2025 → 23 Mar 2025

Publication series

Name	2025 4th International Symposium on Computer Applications and Information Technology, ISCAIT 2025

Conference

Conference	4th International Symposium on Computer Applications and Information Technology, ISCAIT 2025
Country/Territory	China
City	Xi'an
Period	21/03/25 → 23/03/25

Keywords

Data augmentation
Heart sound
Latent diffusion models
Semi-supervised learning
Sound synthesis

Access to Document

10.1109/ISCAIT64916.2025.11010636

Cite this

Tan, Y., Zhang, H., Chang, Y., Wu, Q., Qian, K., Hu, B., Schuller, B. W., & Yamamoto, Y. (2025). A Latent Diffusion Model for Heart Sound Synthesis. In 2025 4th International Symposium on Computer Applications and Information Technology, ISCAIT 2025 (pp. 1314-1319). (2025 4th International Symposium on Computer Applications and Information Technology, ISCAIT 2025). Institute of Electrical and Electronics Engineers Inc.. http://doi.org/10.1109/ISCAIT64916.2025.11010636

Tan, Yang ; Zhang, Haojie ; Chang, Yi et al. / A Latent Diffusion Model for Heart Sound Synthesis. 2025 4th International Symposium on Computer Applications and Information Technology, ISCAIT 2025. Institute of Electrical and Electronics Engineers Inc., 2025. pp. 1314-1319 (2025 4th International Symposium on Computer Applications and Information Technology, ISCAIT 2025).

@inproceedings{d8982021ed464fd28d57fc13bab10492,

title = "A Latent Diffusion Model for Heart Sound Synthesis",

abstract = "There are already many analyses of heart sounds used to develop diagnostic systems for the heart condition. However, the existing heart sound database state is not sufficient to train a large number of deep learning models and is not balanced - the normal heart sounds are always more present than abnormal heart sounds. Hence, we are interested in algorithms that can generate heart sounds as they can enhance the current database state. We enter the field of large-scale modelling in medical synthesis by proposing a latent diffusion model for heart sound generation, which can generate highly realistic heart sounds. We further guide the synthesis process through text prompts and labels, revealing the research field of prompted heart sound synthesis. In terms of experimental results, our proposed method achieves good results compared to existing mathematical models. At the same time, when the generated audio is used as an unlabelled dataset in semi-supervised learning, it shows an improvement effect on the classification model. This indicates that the heart sounds generated by the diffusion model bear great value.",

keywords = "Data augmentation, Heart sound, Latent diffusion models, Semi-supervised learning, Sound synthesis",

author = "Yang Tan and Haojie Zhang and Yi Chang and Qingrong Wu and Kun Qian and Bin Hu and Schuller, \{Bjorn W.\} and Yoshiharu Yamamoto",

note = "Publisher Copyright: {\textcopyright} 2025 IEEE.; 4th International Symposium on Computer Applications and Information Technology, ISCAIT 2025 ; Conference date: 21-03-2025 Through 23-03-2025",

year = "2025",

doi = "10.1109/ISCAIT64916.2025.11010636",

language = "English",

series = "2025 4th International Symposium on Computer Applications and Information Technology, ISCAIT 2025",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

pages = "1314--1319",

booktitle = "2025 4th International Symposium on Computer Applications and Information Technology, ISCAIT 2025",

address = "United States",

}

Tan, Y, Zhang, H, Chang, Y, Wu, Q, Qian, K , Hu, B, Schuller, BW & Yamamoto, Y 2025, A Latent Diffusion Model for Heart Sound Synthesis. in 2025 4th International Symposium on Computer Applications and Information Technology, ISCAIT 2025. 2025 4th International Symposium on Computer Applications and Information Technology, ISCAIT 2025, Institute of Electrical and Electronics Engineers Inc., pp. 1314-1319, 4th International Symposium on Computer Applications and Information Technology, ISCAIT 2025, Xi'an, China, 21/03/25. http://doi.org/10.1109/ISCAIT64916.2025.11010636

A Latent Diffusion Model for Heart Sound Synthesis. / Tan, Yang; Zhang, Haojie; Chang, Yi et al.
2025 4th International Symposium on Computer Applications and Information Technology, ISCAIT 2025. Institute of Electrical and Electronics Engineers Inc., 2025. p. 1314-1319 (2025 4th International Symposium on Computer Applications and Information Technology, ISCAIT 2025).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › peer-review

TY - GEN

T1 - A Latent Diffusion Model for Heart Sound Synthesis

AU - Tan, Yang

AU - Zhang, Haojie

AU - Chang, Yi

AU - Wu, Qingrong

AU - Qian, Kun

AU - Hu, Bin

AU - Schuller, Bjorn W.

AU - Yamamoto, Yoshiharu

PY - 2025

Y1 - 2025

N2 - There are already many analyses of heart sounds used to develop diagnostic systems for the heart condition. However, the existing heart sound database state is not sufficient to train a large number of deep learning models and is not balanced - the normal heart sounds are always more present than abnormal heart sounds. Hence, we are interested in algorithms that can generate heart sounds as they can enhance the current database state. We enter the field of large-scale modelling in medical synthesis by proposing a latent diffusion model for heart sound generation, which can generate highly realistic heart sounds. We further guide the synthesis process through text prompts and labels, revealing the research field of prompted heart sound synthesis. In terms of experimental results, our proposed method achieves good results compared to existing mathematical models. At the same time, when the generated audio is used as an unlabelled dataset in semi-supervised learning, it shows an improvement effect on the classification model. This indicates that the heart sounds generated by the diffusion model bear great value.

AB - There are already many analyses of heart sounds used to develop diagnostic systems for the heart condition. However, the existing heart sound database state is not sufficient to train a large number of deep learning models and is not balanced - the normal heart sounds are always more present than abnormal heart sounds. Hence, we are interested in algorithms that can generate heart sounds as they can enhance the current database state. We enter the field of large-scale modelling in medical synthesis by proposing a latent diffusion model for heart sound generation, which can generate highly realistic heart sounds. We further guide the synthesis process through text prompts and labels, revealing the research field of prompted heart sound synthesis. In terms of experimental results, our proposed method achieves good results compared to existing mathematical models. At the same time, when the generated audio is used as an unlabelled dataset in semi-supervised learning, it shows an improvement effect on the classification model. This indicates that the heart sounds generated by the diffusion model bear great value.

KW - Data augmentation

KW - Heart sound

KW - Latent diffusion models

KW - Semi-supervised learning

KW - Sound synthesis

UR - http://www.scopus.com/pages/publications/105010227491

U2 - 10.1109/ISCAIT64916.2025.11010636

DO - 10.1109/ISCAIT64916.2025.11010636

M3 - Conference contribution

AN - SCOPUS:105010227491

T3 - 2025 4th International Symposium on Computer Applications and Information Technology, ISCAIT 2025

SP - 1314

EP - 1319

BT - 2025 4th International Symposium on Computer Applications and Information Technology, ISCAIT 2025

PB - Institute of Electrical and Electronics Engineers Inc.

T2 - 4th International Symposium on Computer Applications and Information Technology, ISCAIT 2025

Y2 - 21 March 2025 through 23 March 2025

ER -

Tan Y, Zhang H, Chang Y, Wu Q, Qian K , Hu B et al. A Latent Diffusion Model for Heart Sound Synthesis. In 2025 4th International Symposium on Computer Applications and Information Technology, ISCAIT 2025. Institute of Electrical and Electronics Engineers Inc. 2025. p. 1314-1319. (2025 4th International Symposium on Computer Applications and Information Technology, ISCAIT 2025). doi: 10.1109/ISCAIT64916.2025.11010636

A Latent Diffusion Model for Heart Sound Synthesis

Abstract

Publication series

Conference

Keywords

Access to Document

Other files and links

Fingerprint

Cite this