Review — Automatic Segmentation and Classification of Heart Sounds Using Modified Empirical Wavelet Transform and Power Features

Heart Sound Segmentation and Classification on PASCAL Dataset

Sik-Ho Tsang
5 min readNov 12, 2023
Normal heart sound

Automatic Segmentation and Classification of Heart Sounds Using Modified Empirical Wavelet Transform and Power Features
Power Features+KNN
, by Universidad del Norte
2020 MDPI J. Appl. Sci. (Sik-Ho Tsang @ Medium)

Heart Sound Classification
2013 … 2020 [1D-CNN] [WaveNet] [Power Features+KNN] 2023 [2LSTM+3FC, 3CONV+2FC] [NRC-Net]
==== My Other Paper Readings Are Also Over Here ====

  • The modified empirical wavelet transform (EWT) and the normalized Shannon average energy are used in pre-processing and automatic segmentation to identify the systolic and diastolic intervals.
  • 6 power characteristics are extracted (3 for the systole and 3 for the diastole)
  • Different machine learning models are tried (SVM, KNN, RF and MLP).

Outline

  1. Heart Sound Segmentation
  2. Heart Sound Classification
  3. Results

1. Heart Sound Segmentation

  • Consider that the frequency range of the S1 and S2 sounds is between 20–200 Hz. A modified edge selection method is used to detect the number of components.
  • Initially, all the signals taken from the databases are decimated to a sampling frequency of 4 kHz and amplitude-normalized.
  • The first stage of the system decomposes the signal into different frequency bands using the modified Empirical Wavelet Transform (mEWT) method.
  • To achieve this, the first step is to identify the maximum value of the Fast Fourier Transform (FFT) of the signal, which is in the range of 20 Hz to 150 Hz. This is taken as the center frequency for a filter with a bandwidth of 40 Hz.
Decomposition of heart sound using EWT
  • In the above example, the maximum amplitude is approximately 60 Hz. The frequency band is 40–80 Hz (Green, Figure 2B).
  • The low-frequency segment is defined between 1–40 Hz (Blue, Figure 2C); in the segment from 80 Hz to 350 Hz, some kind of murmur is expected (Red, Figure 2E).
  • It is expected that high-frequency noises that intervene in the recording can be observed (Cyan, Figure 2F).
  • The Shannon energy equation is defined as:
  • where x(i) represents the samples of the signal and E is Shannon’s energy.
  • The normalized average Shannon energy (NASE) [47], is defined as follows:
Stages of Segmentation
  • Figure 3B: After NASE, the negative values are equaled to zero and the signal is normalized.
  • Figure 3C: Then, the edges of each lobe of the signal are identified; this helps to determine the beginning and the end of the sound S1 or S2.
  • When lobes close to each other, the lobe that has less energy is eliminated. This process is repeated 3 times. Lobes of short duration and low amplitude are also eliminated.
  • The peaks in each lobe are calculated as shown in Figure 3D. Unwanted peaks are removed using the following steps:
  1. Calculate the average of the intervals between peak (i) and peak (i + 1).
  2. Eliminate the peaks that belong to an interval less than 0.25 * on average.
  3. Eliminate the peaks that belong to an interval less than 0.3 * on average.
  4. Eliminate the peaks that belong to an interval less than 0.4 * on average.
  5. Eliminate the peaks that belong to an interval less than 0.55 * on average.
  • (Authors also mentioned failure case in Figure 4, please feel free to read the paper directly.)

2. Heart Sound Classification

For the systolic and diastolic intervals, each interval are divided into 3 segments and the signal power in each segment is calculated, obtaining a total of 6 characteristics. (3 in the systole and 3 in the diastole; see Figure 6).

  • The power of a signal is defined as the amount of energy consumed in a time interval:
  • For model, SVM, KNN, RF, and MLP are tried.

3. Results

3.1. Dataset

PASCAL Dataset

3.2. Segmentation Results

Segmentation Results
  • The evaluation metric is the error that exists between manual segmentation labels provided by the database and those obtained by the proposed method. Total errors are measured.
  • The proposed method obtained a total error of 843,440.8 for dataset A and 17,074.1 for dataset B. These results are the best compared to the state-of-the-art approaches.
  • (There are visualizations for segmentation in Figure 7, please read the paper for more details.)

3.3. Classification Results

Comparisons with Conventional Approaches

Compared with [25–27], an accuracy of 99.25%, a specificity of 100%, a sensitivity of 98.57% and an AUC of 91.81% are obtained using the KNN classifier, which was the best result obtained.

Comparisons with Deep Learning Approaches

Deep learning approaches are also compared.

A good performance was obtained in the detection of normal heart sounds.

--

--

Sik-Ho Tsang

PhD, Researcher. I share what I learn. :) Linktree: https://linktr.ee/shtsang for Twitter, LinkedIn, etc.