Brief Review — Analysis of heart sound anomalies using ensemble learning

Wavelet and Statistical Features, LogitBoost + Bagging. Bagging Classifier Also Classifies Signals If It is Noisy

Sik-Ho Tsang
6 min readNov 16, 2023

Analysis of heart sound anomalies using ensemble learning
Ensemble Learning
, by Beirut Research and Innovation Center, American University of Beirut
2020 J. BSPC, Over 20 Citations (Sik-Ho Tsang @ Medium)

Heart Sound Classification
2020 [1D-CNN] [WaveNet] [Power Features+KNN] [Improved MFCC + CRNN] 2021 [CardioXNet] 2023 [2LSTM+3FC, 3CONV+2FC] [NRC-Net]
==== My Other Paper Readings Are Also Over Here ====

  • The PCG signals are first filtered and segmented into different parts, then analyzed by applying a feature extraction process, followed by classifying the signal as that of a healthy or unhealthy person.
  • The extracted optimal features subset includes statistical components and wavelet-based features.
  • The classification mainly relies on bagging and boosting algorithms, as well as adequately preparing the data in order to yield an enhanced ensemble classifier.


  1. PhysioNet PCG database & Prior Arts
  2. Preprocessing, Segmentation & Feature Extraction
  3. Classification
  4. Results

1. Data & Prior Arts

1.1. PhysioNet PCG Database

Fig. 1: Each heartbeat contains two major sounds called first heart sound (S1) and second heart sound (S2).

Fig. 2: Recorded PCG signals exhibit similar characteristics for normal or pathological subjects.

  • The database consists of a total of 3153 heart sound recordings from 764 subjects, with each onelasting from 5s to above 120s with a varying number of unknown segments since the data is not annotated.
  • The test set also included data from 6 databases containing a total of 1277 heart-sound recordings from 308 subjects or patients, lasting between 6–104 seconds.
  • All the data is initially collected at a frequency of 2000 Hz that is afterwards resampled to 1000 Hz. Several of these records were considered noisy.
  • Table 2: The noisy samples in Table 2 originally belong to normal orpathological patients but were considered too noisy to be well classified.
  • Table 3: further shows the average duration and standard deviation for the signals from each source.

1.2. Prior Arts on PhysioNet PCG Database

Prior Arts on PhysioNet PCG Database

The best result is 86% which is by [21].

2. Preprocessing, Segmentation & Feature Extraction

Proposed Pipeline
  • The Approach startswith the input signal that is pre-processed into 6 signals with each one segmented to perform feature extraction to obtain the feature vector.
  • The feature vector is classified using two well-prepared classification models whose outputs are combined to determine the final output which indicates either a healthy or pathological patient.
  • (In this paper, the proposed approach is described in detail. I will only describe it very briefly here.)

2.1. Preprocessing

Preprocessing by Filtering
  • The objective of preprocessing is to prepare the raw data for the segmentation phase by reducing the effect of noise, resampling the data, and correcting the mean values of all recordings so that all preprocessed recordings have a mean value of zero. In addition, 5 regular hamming window-based band-pass filters are applied.

(1) Lowpass Filter with cutoff frequency at 90 Hz. (2) Bandpass Filter between 90 Hz–160 Hz. (3) Bandpass Filter between 400 Hz–600 Hz. (4) Bandpass Filter between 600 Hz–800 Hz. (5) Highpass Filter with initial frequency at 800 Hz.

2.2. Segmentation


Fig. 6: In this work, a segmentation technique [12] based on Hidden Semi-Markov Models (HSMM) is applied. This technique delivers accurate results with noisy real-world PCG recordings.

  • (Please feel free to read the paper for details.)
  • An example is shown above.

2.3. Feature Extraction

2.3.1. Wavelet-Based Features

  • It was found that 10 beats are sufficient to represent the sound recording, and wavelet coefficients of each beat are calculated and concatenated to form the features subset.

The extracted features are the first 40 wavelet coefficients at level 3, 25 wavelet coefficients at level 4, and 25 approximation coefficients of level 4 which makes a total of 90 wavelet-based features.

2.3.2. Statistical- and Signal-Based Features

Statistical Features
  • The features extraction technique is applied to both the segmented and non-segmented data and is based on the calculation of the mean, standard deviation and other statistical features for each cardiac cycle or interval.
  • The same features are computed then added for each of the 5 filtered signal.
  • A total set of 130 + 5 * 130 = 780 time, frequency and statistical-based features extracted from the signal. After adding the 900 wavelet-based features, the final number of features becomes equal to 1680, which is considered relatively high.
  • (Please read the paper for details of each feature.)

3. Classification

3.1. Data Preparation

Data Balancing
  • The original data consists of 3240 recordings of pathological and normal patients [18]. This number was later reduced to 3153 recordings after eliminating samples with high noise.
  • (Fig. 11) The third technique allows for well balancing the training data based on sources and classes and enables it to better resemble the hidden testing data which could potentially lead to better accuracy.
  • (Please read the paper for details.)

3.2. Combined Classifier

Combined Classifier
  • The bagging and boosting ensemble classifiers were combined to obtain an enhanced classifier.

Each classifier generates an output or a class label (−1 and 1 forthe boosting classifiers, −1, 1, and 0 (for the bagging classifier) in addition to a probability score (between 0 and 1). Recall that −1 corresponds to the normal class, 1 to the pathological class, and 0 fornoisy samples.

The obtained outputs are then combined by checking if they are matching, which provides a more confident output decision while the output is set to pathological if there is a mismatch.

4. Results

  • A total of 15 — or 3 distinct five-fold cross-validation runs is performed for each approach.

Advanced segmentation + XgBoost obtain the best combined score of 90.2%.

LogitBoost + Bagging outperforms [21] and [22].

LogitBoost + Bagging takes 2.7s which is the longest.

  • (Please read the paper for more results.)



Sik-Ho Tsang

PhD, Researcher. I share what I learn. :) Linktree: for Twitter, LinkedIn, etc.