Brief Review — Heart sound classification based on improved mel-frequency spectral coefficients and deep residual learning
Improved MFCC + Modified ResNet
Heart sound classification based on improved mel-frequency spectral coefficients and deep residual learning
Improved MFCC + Modified ResNet, by Anhui University of Finance and Economics, and University of Science and Technology of China,
2022. J. Front. Physiol. (Sik-Ho Tsang @ Medium)Heart Sound Classification
2013 … 2021 [CardioXNet] 2022 [CirCor] [CNN-LSTM] [DsaNet] [Modified Xception] 2023 [2LSTM+3FC, 3CONV+2FC] [NRC-Net]
==== My Other Paper Readings Are Also Over Here ====
- A new heart sound classification method is proposed, which is based on improved mel-frequency cepstrum coefficient (MFCC) features and deep residual learning (ResNet).
Outline
- Motivations & Conttributions
- Proposed Approach
- Results
1. Motivations & Contributions
Figure 2 shows the waveform representation of S1, S2, S3, and S4 sounds in systole and diastole intervals.
- Tables 1–3 & Figure 3 Left: Lack of large authoritative open heart sound datasets restricts the performance of the model. This paper incorporates 3 of the most widely used heart sound datasets.
- Figure 3 Middle: Most of these are shallow structures and the features used are insufficient to fully express the information of heart sounds. In this paper, the improved MFCC is improved as input features to more comprehensively represent the static and dynamic characteristics.
- Figure 3 Right: A residual neural network (ResNet) which alleviates gradient disappearance and degradation during training.
2. Proposed Approach
2.1. Improved MFCC Features
The Mel-frequency cepstrums reflect the nonlinear relationship between the human ear and the frequency of the sound heard.
- (Please read the paper directly for MFCC features.)
After obtaining MFCC coefficients which reflect the static characteristics of the heart sound signal, the D(n) and D2(n) are also extracted, which are the first and the second differences of MFCC:
- where k=2.
The size of each is all (199, 13), they are concatenated to form the feature of size (199, 39) as the input of neural network.
2.2. Modified ResNet
- ResNet is modified as above, with Batch Norm and Separable Conv (MobileNetV1) used.
- 4 other models are also used for comparisons.
3. Results
3.1. Feature Study
Improved MFCC’s sensitivity, specificity, and accuracy are higher than other features, the precision is lower than MFCC.
3.2. CNN vs RNN
CNN and ResNet obtain higher accuracy.
3.3. SOTA Comparisons
The proposed method achieves an accuracy rate of 94.43% on the constructed dataset, which is higher than the state-of-the-art methods.