Brief Review — Classifying Heart-Sound Signals Based on CNN Trained on MelSpectrum and Log-MelSpectrum Features

Modified VGGNet Using Log-MelSpectrum As Input Features

Sik-Ho Tsang
3 min readDec 14, 2023

Classifying Heart-Sound Signals Based on CNN Trained on MelSpectrum and Log-MelSpectrum Features
VGGNet, by Nantong University
2023 MDPI Bioengineering (Sik-Ho Tsang @ Medium)

Heart Sound Classification
2013 … 2022
[CirCor Dataset] [CNN-LSTM] [DsaNet] [Modified Xception] [Improved MFCC+Modified ResNet] [Learnable Features + VGGNet/EfficientNet] [DWT + SVM] [MFCC+LSTM] [DWT+ 1D-CNN] [CNN+Attention] 2023 [2LSTM+3FC, 3CONV+2FC] [NRC-Net]
==== My Other Paper Readings Are Also Over Here ====

  • MelSpectrum and Log-MelSpectrum features of heart-sound signals combined with a mathematical model of cardiac-sound acquisition were analysed theoretically.
  • Results demonstrated that the Log-MelSpectrum features can reduce the classification difference between domains and improve the performance of CNNs.


  1. MelSpectrum & Log-MelSpectrum Feature Extraction
  2. Results

1. MelSpectrum & Log-MelSpectrum Feature Extraction

MelSpectrum & Log-MelSpectrum Feature Extraction
  1. The heart-sound signals are resampled from 25 Hz to 950 Hz using a Butterworth filter with a sampling frequency of 2000 Hz.
  2. The signals are then passed through a Savitzky–Golay filter to improve the smoothness of the time-frequency feature graph and reduce noise interference.
  3. The filtered signals are framed and windowed using a Hanning window function to fix the signals into a selected frame length.
  4. Frames are transformed into the periodogram estimate of the power spectrum using STFT.
  5. Each periodogram estimate is mapped onto the Mel-scale using Mel filters, which consist of several triangular filters. The output of the Mel filter is called the MelSpectrum.

Logarithmic transformation is applied to the MelSpectrum features to obtain the Log-MelSpectrum.

Detailed Parameters for MelSpectrum & Log-MelSpectrum Feature Extraction
  • The detailed parameters are shown above.
MelSpectrum & Log-MelSpectrum Feature Visualization
  • Examples of MelSpectrum and Log-MelSpectrum feature maps from normal heartsound fragment are shown above.
Cardiac-sound collection model
  • Heart-sound signals are easily disturbed by additive and multiplicative noise during the acquisition process.

The stethoscope-induced multiplicative component can be converted into an additive term in the Log-MelSpectrum domain. Therefore, Log-Melspectrum feature maps are easier to improve the classification performance of CNN.

2. Results

2.1. PhysioNet Dataset

PhysioNet Dataset

2.2. Modified VGGNet Model

Modified VGG-16
  • The input feature vector size was modified to 128×128.
  • The output layer is corresponding to 2 classes: normal and abnormal.

2.3. Training Hyperparameters

Training Hyperparameters

2.4. Performance

Validation Performance

The accuracies of the Log-MelSpectrum and MelSpectrum time-frequency characteristic diagram are 91.74%±3.72% and 87.42%±3.99%, respectively.

Average Se, Sp, and MAcc

The model trained by the Log-MelSpectrum feature maps has higher average Se, Sp, and MAcc than that trained by the MelSpectrum feature maps.



Sik-Ho Tsang

PhD, Researcher. I share what I learn. :) Linktree: for Twitter, LinkedIn, etc.