Brief Review — Multi-classification neural network model for detection of abnormal heartbeat audio signals

MFCC+LSTM for Heart Sound Classification

Sik-Ho Tsang
3 min readDec 11, 2023

Multi-classification neural network model for detection of abnormal heartbeat audio signals
, by University of Management and Technology, National College of Business Administration and Economics Lahore
2022 JBEA (Sik-Ho Tsang @ Medium)

Heart Sound Classification
20132022 [CirCor Dataset] [CNN-LSTM] [DsaNet] [Modified Xception] [Improved MFCC+Modified ResNet] [Learnable Features + VGGNet/EfficientNet] [DWT + SVM] 2023 [2LSTM+3FC, 3CONV+2FC] [NRC-Net]
==== My Other Paper Readings Are Also Over Here ====

  • Mel frequency cepstrum coefficient (MFCC) is applied to extract the dominant features, and a bandpass filter is used to remove the noise.
  • RNN using LSTM is proposed for classification.


  1. Dataset, Preprocessing & MFCC
  2. Proposed LSTM Model
  3. Results

1. Dataset, Preprocessing & MFCC

1.1. Dataset

Different visual representations of heartbeat sound signals; (a) Normal, (b) AFib, and (c) Noisy.
PASCAL & PhysioNet Datasets

PASCAL & PhysioNet Datasets are used.

1.2. Preprocessing

Visual depiction of power spectrogram of heartbeat audio sound signal.
  • Mel frequency cepstrum coefficient (MFCC) is extracted.
  • The power spectrogram of the first 5 s of the heartbeat signals is shown above.
  • The downsampling technique reduces the sampling frequency of each heartbeat audio file to the sizes of 20,000 Hz and 300 Hz frame rate for PASCAL and PhysioNet challenge databases, respectively.
  • In addition, heartbeat signals have normalized by removing noise using a bandpass filter, and then the zero-padding process has applied.
  • Using an 8 × 8 low pass filter, the sampling transforms a 50 kHz sampling frame rate to an 800 Hz sound signal frequency for the PASCAL Challenge dataset and 300 Hz for the PhysioNet dataset.
  • The low pass filter allows low-frequency signals to pass easily and is rated at a cutoff frequency of 1.6% of the sampling frame rate.
  • (Please read the paper for more details.)

2. Proposed LSTM Model

Proposed LSTM
Proposed LSTM

The designed model consists of multiple layers like LSTM, Dropout, Dense, and Softmax layers, as above. The cross-entropy loss is used as the loss function.

3. Results

3.1. ML Technique Comparisons

ML Technique Comparisons on Left: PASCAL, Right: PhysioNet

Table 4: On PASCAL, the proposed deep learning algorithm attained a better classification accuracy of 99.71%, specificity of 99.3%, the sensitivity of 98.6%, and 98.9% of f1-score as compared to the MLP as well as traditional ML classifiers.

Table 5: On the PhysioNet dataset, the overall accuracy of 98.7%, specificity of 99%, sensitivity of 98.5%, and f1-score of 98.8% were gained by the proposed model

3.2. SOTA Comparisons

SOTA Comparisons on Left: PASCAL, Right: PhysioNet

The proposed RNN (LSTM) model achieved the best classification accuracy in terms of different parameters like accuracy, sensitivity, specificity, and f1-score.



Sik-Ho Tsang

PhD, Researcher. I share what I learn. :) Linktree: for Twitter, LinkedIn, etc.