Brief Review — A lightweight hybrid deep learning system for cardiac valvular disease classification
Augmented Sound Dataset + CNN-LSTM
A lightweight hybrid deep learning system for cardiac valvular disease classification
CNN-LSTM, by Yarmouk University
2022 Nature Sci. Rep., Over 20 Citations (Sik-Ho Tsang @ Medium)Heart Sound Classification
2013 … 2022 [CirCor] 2023 [2LSTM+3FC, 3CONV+2FC] [NRC-Net]
==== My Other Paper Readings Are Also Over Here ====
- A combined CNN and LSTM model is proposed for 5-class phonocardiogram (PCG) signal classification, which utilizes either augmented or non-augmented datasets.
Outline
- Datasets, Preprocessing & Data Augmentation
- Proposed CNN-LSTM & FFT-CNN-LSTM
- Results
1. Datasets, Preprocessing & Data Augmentation
1.1. Datasets
- The model was trained using the publicly available open heart sounds GitHub Dataset. 1000 recordings. with 5 classes. Each class has 200 recordings.
- PhysioNet/CinC Challenge 2016 was the second dataset utilized in this research to further examine the suggested model. This dataset contains normal and abnormal classes only.
- Some examples are shown above for 2 datasets.
1.2. Preprocessing
- Fourier transform of PCG signals was clipped to contain only 350 Hz from the 4000 Hz spectrum.
- Each PCG record in the first dataset is downsampled by a factor of 8, and each PCG record in the second dataset is downsampled by a factor of 2.
- Therefore, the highest frequency content is 500 Hz in all heart conditions, as shown above.
1.3. Data Augmentation
Similar to images, there are several techniques to augment audio signals, and these techniques are usually applied to the raw audio signals.
- Time stretch: randomly slow down or speed up the sound.
- Time shift: shift audio to the left or the right by a random amount.
- Add noise: add some random values to the sound.
- Control volume: randomly increasing or decreasing the volume of the audio.
2. Proposed CNN-LSTM & FFT-CNN-LSTM
2.1. CNN-LSTM
In brief, deep feature extraction and selection from the PCG signals are handled by CNN blocks, particularly the 1D convolutional layers, the batch normalization layers, the ReLU layers, and the max-pooling layers.
Utilizing the LSTM component produce a richer and more concentrated model compared to the pure CNN models, resulting in higher performance with fewer parameters.
2.2. FFT-CNN-LSTM
- Using the FFT input, the model becomes a FFT-CNN-LSTM model.
3. Results
3.1. Non-Augmented Data vs Augmented Data
- 10-fold cross-validation is used.
For the non-augmented data, the accuracy was 98.5%.
For the augmented data, the accuracy was 99.87%.
For the binary dataset, the accuracy was 93.77%.
- (Please read the paper directly for more experimental results.)
3.2. SOTA Comparisons
The proposed architecture outperforms all models for all important performance metrics. The accuracy of the new model is 99.87% which is 0.27% higher than the accuracy of the second-best model built by Shuvo et al. in 2021.
The new system outperformed the previous state-of-the-art models for all performance metrics. The obtained accuracy is 6.45% higher than the 87.31% accuracy reported by Alkhodari et al. in 2021.
3.3. Time Measurement
The result shows that it is a lightweight model that can be implemented using embedded systems.