Brief Review — Construction of CNNs for Abnormal Heart Sound Detection using Data Augmentation


Sik-Ho Tsang
3 min readMar 3, 2024

Construction of CNNs for Abnormal Heart Sound Detection using Data Augmentation
, by Kagoshima College
2021 IMECS (Sik-Ho Tsang @ Medium)

Heart Sound Classification
2023 [2LSTM+3FC, 3CONV+2FC] [NRC-Net] [Log-MelSpectrum + Modified VGGNet] [CNN+BiGRU] [CWT+MFCC+DWT+CNN+MLP] [LSTM U-Net (LU-Net)]
==== My Other Paper Readings Are Also Over Here ====

  • Annotated PCGs are insufficient. In this paper, two data augmentation (DA) methods are proposed.
  • One is Window Slicing with Spectrogram (WSS), which slices single PCG to make multiple signals and transforms the signals into spectrogram data.
  • The other is Synthetic Spectrogram based GANs (SSG), which generates synthetic data using generative adversarial networks (GANs).


  1. Overall Architecture
  2. Window Slicing with Spectrogram (WSS)
  3. Synthetic Spectrogram based GANs (SSG)
  4. Results

1. Overall Architecture

Overall Architecture
  • The Process of Heart Sound Classification:
  1. Measure a PCG at sampling rate of 2000 Hz.
  2. Transform a test PCG into a spectrogram data.
  3. The classifier classifies a spectrogram data into abnormal or normal.
  • ResNet-18 is used as the classifier.

2. Window Slicing with Spectrogram (WSS)

Window Slicing with Spectrogram (WSS)
  • A single PCG is handled to generate the single spectrogram data only.
  • So, if we can obtain the multiple spectrogram data from the single PCG, we can increase the number of training data.

Window slicing can make multiple time-series data by slicing single time-series data into a specific length (slice length).

  • The slice length of window slicing is set to 16384 samples in this paper.
  • Further the movement length of each slice is set where slice length × slice ratio.

3. Synthetic Spectrogram based GANs (SSG)

  • GANs based on the DA is often leveraged for compensating insufficient medical data such as the MRI classification [15], the CT classification [16] and so on.
  1. The trained GANs generates 100 × 10³ synthetic spectrogram data.
  2. Each synthetic spectrogram data (128 × 128 dimensional vector) is transformed into the 512 dimensional vector (synthetic vector).
  3. The original spectrogram data generated by the trained GANs is transformed into the vector (original vector) same as Step2.
  4. Each synthetic spectrogram data is scored by calculating the score using the synthetic vector and the original vectors.
  5. 5 × 10³ (10 × 10³) synthetic spectrogram data are chosen from the order of the highest score.

4. Results

  • 5 slice ratios of [0, 0.2, 0.4, 0.6, 0.8] are used as the parameter of WSS.

WSS improves the classification performance of the classifier.

  • SSG_x represents that the GANs generate the number of x synthetic spectrogram data of abnormal and normal respectively.

SSG improves the accuracy, the sensitivity and the specificity. Yet, the accuracy of SSG decreases by about 1 % compared with the accuracy of WSS.



Sik-Ho Tsang

PhD, Researcher. I share what I learn. :) Linktree: for Twitter, LinkedIn, etc.