Brief Review — Feature extraction and classification of heart sound using 1D convolutional neural networks


Sik-Ho Tsang
4 min readJan 11, 2024
Yuwell electronic stethoscope

Feature extraction and classification of heart sound using 1D convolutional neural networks
DAE+1D CNN, by Beijing Institute of Technology
2019 EURASIP J. ASP, Over 100 Citations (Sik-Ho Tsang @ Medium)

Heart Sound Classification
2013 … 2023 [2LSTM+3FC, 3CONV+2FC] [NRC-Net] [Log-MelSpectrum+Modified VGGNet] [CNN+BiGRU] [CWT+MFCC+DWT+CNN+MLP]
==== My Other Paper Readings Are Also Over Here ====

  • MFCC is usually used as feature input to CNN.
  • In this paper, denoising autoencoder (DAE) is used to extract the deep feature of heart sounds, as the input feature of 1D CNN.


  1. DAE+1D CNN
  2. Results


1.1. Problems of MFCC

(a) MFCC (b) Spectrogram
  • The MFCC of the heart sound is depicted in Fig. 5a. No intuitive signal frequency characteristic was observed in the time domain.
  • Moreover, the heart sound exhibited a diverse acquisition environment and the signal itself was relatively weak in comparison with the noise.

Therefore, the MFCC, as a speech feature parameter, cannot fully represent the characteristic parameters of heart sounds.

1.2. Denoising Autoencoder (DAE)

DAE Overall Architecture
  • In this work, DAE network is used which inputs a spectrogram (x) to extract the feature of the heart sound signal.

The encoder is to map x to y:

y is then mapped back to the reconstruction vector z, which is called decoder:

  • The entire process can be considered a reconstruction process. The loss function is either Squared Difference:
  • Or Cross Entropy:
  • To prevent overfitting, noise is added to the input data (the input layer of the network), thereby making the learned encoder robust and enhancing the generalization capability of the model.
  • The DAE is trained to reconstruct a clean “repaired” input from a corrupted input signal.

After training, the feature at the middle of DAE is extracted to 1D CNN.

1.3. 1D CNN

  • A vector of 1 × 132 dimension is utilized as the 1D CNN feature input.

The model uses two 1D Conv and one dense layer as above.

1.4. 2D CNN as Baseline

2D CNN as Baseline
  • The above 2D CNN model is used as baseline for comparison.

Authors think that the 2D CNNs cannot adapt well to the 1D characteristics of speech because two dimensions have completely different physical meanings.

2. Results

2.1. Ablation Studies

Left: Different Convolution Kernels, Right: Different Features in 1D CNNs

Table 1: Under the same convolution kernel size, when the convolution kernel is increased from 1D to 2D, the recognition accuracy is reduced.

Table 2: Deep features extracted by DAE in this study exhibit favorable recognition rates.

Different Number of Convolution Layers in 1D CNN

The performance of the network remains the same when the number of layers exceeds 5.

2.2. Comparisons With Other Classification Methods

Comparisons With Other Classification Methods

In comparison with other classification methods, such as BP neural network, the recognition accuracy is 89.75%, and the final recognition rate of the method used in this paper is increased by nearly 7%.



Sik-Ho Tsang

PhD, Researcher. I share what I learn. :) Linktree: for Twitter, LinkedIn, etc.