Brief Review — Heart sound classification based on improved mel-frequency spectral coefficients and deep residual learning

Improved MFCC + Modified ResNet

Sik-Ho Tsang
3 min readNov 26, 2023

Heart sound classification based on improved mel-frequency spectral coefficients and deep residual learning
Improved MFCC + Modified ResNet
, by Anhui University of Finance and Economics, and University of Science and Technology of China,
2022. J. Front. Physiol. (Sik-Ho Tsang @ Medium)

Heart Sound Classification
20132021 [CardioXNet] 2022 [CirCor] [CNN-LSTM] [DsaNet] [Modified Xception] 2023 [2LSTM+3FC, 3CONV+2FC] [NRC-Net]
==== My Other Paper Readings Are Also Over Here ====

  • A new heart sound classification method is proposed, which is based on improved mel-frequency cepstrum coefficient (MFCC) features and deep residual learning (ResNet).

Outline

  1. Motivations & Conttributions
  2. Proposed Approach
  3. Results

1. Motivations & Contributions

Figure 2 shows the waveform representation of S1, S2, S3, and S4 sounds in systole and diastole intervals.

3 Datasets
Motivations & Contributions
  1. Tables 1–3 & Figure 3 Left: Lack of large authoritative open heart sound datasets restricts the performance of the model. This paper incorporates 3 of the most widely used heart sound datasets.
  2. Figure 3 Middle: Most of these are shallow structures and the features used are insufficient to fully express the information of heart sounds. In this paper, the improved MFCC is improved as input features to more comprehensively represent the static and dynamic characteristics.
  3. Figure 3 Right: A residual neural network (ResNet) which alleviates gradient disappearance and degradation during training.

2. Proposed Approach

Proposed Approach

2.1. Improved MFCC Features

The Mel-frequency cepstrums reflect the nonlinear relationship between the human ear and the frequency of the sound heard.

  • (Please read the paper directly for MFCC features.)

After obtaining MFCC coefficients which reflect the static characteristics of the heart sound signal, the D(n) and D2(n) are also extracted, which are the first and the second differences of MFCC:

  • where k=2.

The size of each is all (199, 13), they are concatenated to form the feature of size (199, 39) as the input of neural network.

2.2. Modified ResNet

Modified ResNet
Proposed Modified ResNet & Other Baselines
  • 4 other models are also used for comparisons.

3. Results

3.1. Feature Study

Feature Study

Improved MFCC’s sensitivity, specificity, and accuracy are higher than other features, the precision is lower than MFCC.

3.2. CNN vs RNN

CNN vs RNN

CNN and ResNet obtain higher accuracy.

3.3. SOTA Comparisons

SOTA Comparisons

The proposed method achieves an accuracy rate of 94.43% on the constructed dataset, which is higher than the state-of-the-art methods.

--

--

Sik-Ho Tsang

PhD, Researcher. I share what I learn. :) Linktree: https://linktr.ee/shtsang for Twitter, LinkedIn, etc.