Brief Review — Heart sound classification based on log Mel-frequency spectral coefficients features and convolutional neural networks
MFCC + CNN
Heart sound classification based on log Mel-frequency spectral coefficients features and convolutional neural networks
MFCC+CNN, by Yunnan University, Fuwai Yunnan Cardiovascular Hospital, Kunming Medical University
2021 J. BSPC, Over 30 Citations (Sik-Ho Tsang @ Medium)Heart Sound Classification
2013 … 2022 [CirCor Dataset] [CNN-LSTM] [DsaNet] [Modified Xception] [Improved MFCC+Modified ResNet] [Learnable Features + VGGNet/EfficientNet] 2023 [2LSTM+3FC, 3CONV+2FC] [NRC-Net]
==== My Other Paper Readings Are Also Over Here ====
- Firstly, the heart sound signals were de-noised by using the wavelet algorithm. Subsequently, the improved duration-dependent hidden Markov model (DHMM) was used to segment the heart sound signal according to the heart cycle.
- Then, the dynamic frame length method was used to extract log Mel-frequency spectral coefficients (MFSC) features from the heart sound signal based on the heart cycle.
- Afterward, the convolution neural network (CNN) was used to classify the MFSC features. Finally, majority vote algorithm is used to get the prediction.
Outline
- Dataset,Preprocessing & Segmentation
- Feature Extraction, CNN & Majority Vote
- Results
1. Dataset, Preprocessing & Segmentation
- The dataset is collected by authors. A total of 1800 cases of the heart sound are used, which included patent ductus arteriosus (PDA), ventricular septal defect (VSD), atrial septal defect (ASD), and normal (N) four stets.
- The sampling frequency is 5000 Hz.
- 5-layer decomposition based on db2 wavelet is used as a denoising method.
- Improved duration-dependent hidden Markov model (Improved DHMM) is used for segmentation.
- (Please read the paper directly for more details of segmentations.)
2. Feature Extraction, CNN & Majority Vote
2.1. Feature Extraction
- Framing: The signal is divided into short time frames. In order to achieve a smooth transition between frames, a 50% overlap between consecutive frames is used. The total number of frames after framing is M.
- MFSC: Mel-Frequency Spectral Coefficients (MFSC) is a special form of MFCC that omits the step of Discrete Cosine Transform (DCT).
2.2. CNN
- The duration of recording heart sound is 20s, which means that 20 to 33 MFSC feature maps can be obtained after heart sound segmentation for each volunteer.
- 4 Convolutional layers, 3 max pooling layers, 2 FC layers and then softmax layer. ReLU is used.
2.3. Majority Voting Algorithm
- The heart sound signal is quasi-periodic, the difference between each cardiac cycle is minimal.
- However, due to the inevitable introduction of some non-pathological noise (such as breath sound, friction sound, etc.) in the collection process, these factors will lead to the misjudgment of the classifier.
- The majority voting algorithm is used to determine the final classification results for individuals. The process is:
- Count the sequence length of the classification results.
- Ergodic sequence and count the number of occurrences of each element in the sequence.
- The bubbling sorting method is used to sort the occurrence frequency of each element.
- The element with the most frequent occurrence is taken as the final classification result.
3. Results
3.1. Dataset Split
- After segmentation, much more samples are obtained.
3.2. 2-Class Results
3.3. 4-Class Results
3.4. SOTA Comparison on 2-Class Problem
The algorithm proposed in this paper is superior to other algorithms in sensitivity, specificity, and accuracy.