Brief Review — Classification of Heart Sounds Using Convolutional Neural Network

497 Features + 1D-CNN

Sik-Ho Tsang
5 min readApr 13, 2024

Classification of Heart Sounds Using Convolutional Neural Network
497 Features + 1D-CNN
, by Dalian University of Technology, RWTH Aachen University, and University of Jyväskylä
2020 MDPI Appl. Sci., Over 70 Citations (Sik-Ho Tsang @ Medium)

Phonocardiogram (PCG)/Heart Sound Classification
20132023 [2LSTM+3FC, 3CONV+2FC] [NRC-Net] [Log-MelSpectrum+Modified VGGNet] [CNN+BiGRU] [CWT+MFCC+DWT+CNN+MLP] [LSTM U-Net (LU-Net)] [DL Overview] [MFCC + k-NN / RF / ANN / SVM + Grid Search] [Long-Short Term Features (LSTF)] [WST+1D-CNN and CST+2D-CNN Ensemble] [CTENN] [Bispectrum + ViT]
==== My Other Paper Readings Are Also Over Here ====

  • First, 497 features were extracted from 8 domains.
  • Then, these features are fed into the 1-D convolutional neural network (CNN). Considering the class imbalance, the class weights were set in the loss function.


  1. 497 Features
  2. 1D-CNN
  3. Results

1. 497 Features

Summary of 497 Features
  • 497 features are extracted in different domains.
  • (Please skip to Section 2 for quick read.)

1.1. Time Domain

Time Domain
  • The methods in [7,29] are used to extract the features in index 1–16.
  • Another four features (index 17–20) are added. So, 20 features in total were extracted from the time domain.

1.2. State Amplitude Domain

  • First, the absolute values of the amplitude were normalized. Then, the ratios of the absolute amplitude between di erent states were calculated. Furthermore, the mean and standard deviation of the ratio were taken.
  • A total of 12 amplitude features were extracted from four states

1.3. Energy Domain

  • There are two kinds of features in the energy domain — (1) the energy ratio of the band-passed signal to the original signal; and (2) the energy ratio of one state to another.
  • 42 frequency bands from 10 Hz to 430 Hz ([10 20] Hz, [20 30] Hz,
  • [30 40] Hz, …, and [420 430] Hz). So, there were 42 features for each of these bands.
  • First, the Butterworth filter is applied to the raw signals. Then, the energy ratio of each band was calculated
  • The energy ratio between any two states is also required as features. If we have N cycles in a PCG recording and each cycle contains n discrete time indices, then the “Ratio_energy_state” can be defined as:
  • The mean and the SD of Ratio_energy_stateS1_cycle in the ith cycle were calculated as 2 features. The energy ratios of state S1 to the S2, systole and diastole states are also obtained respectively. So, for state S1, 8 features were extracted.
  • Similarly, 6, 4 and 2 features for the S2, systole and diastole states, respectively, are extracted.

1.4. Higher-Order Statistics Domain

  • The skewness and kurtosis of each state (si) in a cycle:
  • The mean and SD of the skewness and kurtosis of each state are extracted as 2 separate features. Therefore, there were 16 features in this domain.

1.5. Cepstrum Domain

  • By DFT, log, then IDFT, the first 13 cepstral coecients were extracted from the cepstrum of the new discrete sequence.
  • The same operation was performed on the new discrete sequences
  • generated by all of the S2, systole and diastole states of a PCG recording. Finally, a total of 65 (13 + 13 × 4) features was obtained.

1.6. Frequency Domain

  • The mean frequency spectrum of each state over all cycles in a PCG recording. The spectrum values from 20 Hz to 400 Hz with a 5 Hz interval were extracted as features. Therefore, we could obtain 77 features from each state.
  • A total of 308 (77 × 4) features were obtained.

1.7. Cyclostationarity Domain

  • Ting et al. in Reference [37] have discussed the degree of cyclostationarity, which indicates the level of signal repetition in a PCG recording.
  • γ(α) as the cycle frequency spectral density (CFSD) of a PCG recording at cycle frequency and γ(η) as the CFSD of the PCG recording at the cycle frequency, which is defined by the main peak location of γ(α). The degree of cyclostationarity is defined as:
  • The mean and SD, based on the degree of cyclostationarity of all subsequences, are calculated. The ratio of the maximum CFSD to the median CFSD is also calculated.
  • Similarly, the mean and SD of the peak_sharpness of all of the subsequences in a PCG recording can be calculated as another 2 separate features.

1.8. Entropy Domain

  • Sample entropy and fuzzy measure entropy are measured.
  • 2 entropy features are also extracted in the cepstrum domain.

2. 1D-CNN

  • As shown in Figure 2, the proposed model is composed of 3 Conv-blocks, a global average pooling (GAP) layer and a classification layer with the sigmoid function.

3. Results

  • PhysioNet dataset is used.
  • The best outcomes obtained from 4th-fold are 94%.
Class Weight
  • With class weight applied to loss function, performance is improved.



Sik-Ho Tsang

PhD, Researcher. I share what I learn. :) Linktree: for Twitter, LinkedIn, etc.