Brief Review — Phonocardiographic Sensing using Deep Learning for Abnormal Heartbeat Detection

Recurrent Neural Network (RNN) Variants are Evaluated

Sik-Ho Tsang
4 min readOct 27, 2023

Phonocardiographic Sensing using Deep Learning for Abnormal Heartbeat Detection
RNN Variants
, by Information Technology University (ITU)-Punjab, COMSATS Institute of Information Technology, University of Southern Queensland
2018 IEEE Sensors Journal, Over 140 Citations (Sik-Ho Tsang @ Medium)

Heart Sound Classification
2013 [PASCAL] 2023 [2LSTM+3FC, 3CONV+2FC] [NRC-Net]
==== My Other Paper Readings Are Also Over Here ====

  • The problem of automatic cardiac auscultation is complicated due to the requirement of reliability and high accuracy, and due to the presence of background noise in the heartbeat sound.
  • In this paper, a Recurrent Neural Networks (RNNs) is proposed to classify normal or abnormal heart sound.


  1. Database, Preprocessing and Segmentation
  2. Proposed Recurrent Neural Network (RNN)
  3. Results

1. Database, Preprocessing and Segmentation

  • In phonocardiography (PCG), heart sound is recorded from the chest wall using a digital stethoscope and this sound is analyzed to detect whether the heart is functioning normally or the patient should be referred to an expert for further diagnosis.

1.1. Database

The “Physionet Challenge 2016” dataset [22] is used. The Physionet dataset consists of six databases (A through F) containing a total of 3240 raw heart sound recordings.

  • These recordings were independently collected by different research teams using heterogeneous sensing equipment from different countries both in clinical and nonclinical (i.e., home visits) settings.
  • The dataset contains both clean and noisy heart sound recordings. The recordings were collected both from healthy subjects and patients with a variety of heart conditions, especially coronary artery disease and heart valve disease.
  • The subjects were from different age groups including children, adults and elderly. The length of heart sound recordings varied from 5 seconds to just over 120 seconds.
  • For the experiments, all 6 databases containing normal and abnormal heart sound recordings are used.

1.2. Preprocessing and Segmentation

Block Diagram of Proposed Approach
  • The detection of the exact locations of the first and second heart sounds (i.e., S1 and S2) within PCG is known as the segmentation process.
  • Logistic Regression-Hidden Semi-Markov Models (HSMM) [35] are used for identification of heart states. The working of Logistic Regression-HSMM is similar to SVM based emission probabilities [38] and it allows for greater discrimination between different states. Logistic regression is a binary classifier.
  • The Logistic Regression-HSMM algorithm use the combination of 4 type of features: Homomorphic envelope, Hilbert envelope, Wavelet envelope and Power spectral density envelope.
4 States
  • Figure 3 shows the detected 4 states (i.e., S1, S2, systole, and asystole) of two heart cycles. Note that, generally, it is called as S1 and S2 detection, although it detects all the 4 states: S1, S2, systole, and asystole.
Left: Normal, Right: Abnormal
  • The heart signal is segmented into 3 sequences of heart cycles: 2, 5, and 8.
  • The 5 cycles of normal and abnormal heart sound, respectively. The abnormal heart sound is different from the normal one in temporal context. It has heart cycle states of longer duration in the segment.
Using 5 Cycles is the Best
  • Using 5 cycles is found to be the best in the experiment.

2. Proposed Recurrent Neural Network (RNN)

2.1. Feature Extraction

  • Mel-frequency cepstral coefficients (MFCCs) from 25ms of the window with a step size of 10ms are used.

The first 13 MFCCs are selected for compact representation of PCG signal.

2.2. Models

(a) LSTM (b) GRU, (c) Bidirectional LSTM
  • The best classification results using 2 gated layers for both LSTM and GRU models. Therefore, LSTM and BLSTM models consist of 2 LSTM layer with tanh function [46] as activation.
  • For each heartbeat, the outputs of LSTM or BLSTM layers were given to the dense layer and the outputs of this layer were given to the softmax layer for classification.
  • Batch normalization is used after the dense layer for normalization of learned distribution to improve the training efficiency.
  • Models were trained on 75% of data, 15% data was used for validation and remaining 10% of data was used for testing.
  • To predict the score for full instance, an averaging was performed on posterior probabilities of the respective chunks.

3. Results

3.1. Comparison with Conventional ML

Comparison with Conventional ML

RNNs significantly outperform all 3 models in every performance measures.

3.2. Comparison with DL

RNNs outperform all deep learning models used on heart sound classification with a significant improvement.

3.3. Comparison of RNNs

(a) LSTM and GRU, (a) BLSTM and BiGRU

BLSTMs perform consistently well.

Despite having a simpler architecture compared to LSTM, the performance of GRU is also promising on PCG data.



Sik-Ho Tsang

PhD, Researcher. I share what I learn. :) Linktree: for Twitter, LinkedIn, etc.