Brief Review — Heart Sound Multiclass Analysis Based on Raw Data and Convolutional Neural Network

Raw PCG Signal Inputs to 5-Layer 1D-CNN

Sik-Ho Tsang
3 min readFeb 12, 2024
PCG Signal

Heart Sound Multiclass Analysis Based on Raw Data and Convolutional Neural Network
5-Layer 1D-CNN
, by University of Catania
2020 IEEE Sensors Letter (Sik-Ho Tsang @ Medium)

Phonocardiogram (PCG), Heart Sound Classification
2023 [2LSTM+3FC, 3CONV+2FC] [NRC-Net] [Log-MelSpectrum + Modified VGGNet] [CNN+BiGRU] [CWT+MFCC+DWT+CNN+MLP] [LSTM U-Net (LU-Net)]
==== My Other Paper Readings Are Also Over Here ====

  • PCG signals are fed in to the 5-Layer 1D-CNN, which can bypass the transformations from time domain to frequency domain.


  1. 5-Layer 1D-CNN
  2. Results

1. 5-Layer 1D-CNN

1.1. Model Architecture

  • The neural network architecture used in this study is based on the 1-D CNN and, in particular, on the “M5 (0.5 M)” model, described in [16].
  • After some tests, it was found that 5 layers are sufficient to extract the features necessary to have a good performance in the training phase.
  • The first 4 layers are convolutions, and the last one is the output layer.
  • The first layer is the only one that has a kernel size composed of 80 elements. The kernel size is set to three elements for all the other layers, in order to reduce the computational cost.
  • After each convolutional layer, two more operations take place: batch normalization (BatchNorm1D) and max polling (MaxPool1D).
  • Batch normalization is applied after each convolutional layer before performing the ReLU.
  • The last layer is the output layer. It performs a 1-D average pool with a kernel size of 30 elements.

1.2. Post Decision

  • A filter called “recurrence filter” is used to improve the performance by analyzing No successive decisions of the CNN network.
  • More specifically, the filter acts on a circular vector →Ck, which always contains the latest No decisions, after a transition phase lasting Win(No−1) seconds.
  • The most common among the classes in buffer →Ck is selected according to the following criterion:
  • where the Hist function creates a histogram bar chart of the elements in the vector →Ck sorted into Nc equally spaced bins.

2. Results

Confusion Matrix
  • An average classification accuracy level of 89.6% is obtained.
SOTA Comparisons

The superior performance of the proposed method over [12, 17] despite using raw data in input to the CNN network which, thereby, succeeds in perfectly grasping the differences between the characteristics of the 5 classes taken into consideration in the time domain.



Sik-Ho Tsang

PhD, Researcher. I share what I learn. :) Linktree: for Twitter, LinkedIn, etc.