Brief Review — DsaNet: Imbalanced Heart Sound Signal Classification Based on Two‑Stage Trained DsaNet
DsaNet, Random Cropping, 2-Stage Training
Imbalanced Heart Sound Signal Classification Based on Two‑Stage Trained DsaNet
DsaNet, by Wuhan University of Technology, Huazhong University of Science and Technology, and North University of China
2022 J. Cognitive Computation (Sik-Ho Tsang @ Medium)Heart Sound Classification
2013 … 2021 [CardioXNet] 2022 [CirCor] [CNN-LSTM] 2023 [2LSTM+3FC, 3CONV+2FC] [NRC-Net]
==== My Other Paper Readings Are Also Over Here ====
- DsaNet is proposed, which uses depthwise separable convolution and the attention module (EcaNet). DsaNet can directly classify PCG signals without complicated feature engineering processes.
- To address the long-tail distribution problem in the PCG dataset, a novel imbalanced learning approach (two-stage training) is adopted for training.
- A random cropping operation is proposed to increase the amount and diversity of the data in the training stage. Random cropping is also combined with the idea of integration to improve test accuracy in the testing stage.
Outline
- DsaNet
- Random Cropping & 2-Stage Training
- Results
1. DsaNet
1.1. Overall Architecture
- The DsaNet uses bottleneck module.
- Inside bottleneck, Depthwise separable convolution and EcaNet are involved.
1.2. Bottleneck
- 1D conv is used to expand the dimention.
- Then Depthwise separable convolution is used.
- 1D conv is used again, but this time reduce the dimension.
- Finally, EcaNet is used for attention-based mechanism.
1.3. EcaNet
2. Random Cropping & 2-Stage Training
2.1. Random Cropping
Given a PCG signal X, it is divided into subsequence S:
Each subsequence S can obtain a classification result p:
- Here, random cropping is used crop the subsequence S.
Integrated voting is conducted on p1, p2,⋯, p to obtain the final classification result p of a PCG signal X:
Random cropping to the entire training set.
The testing data is also cut randomly, similar to training data.
2.2. 2-Stage Training
- In the first stage, the original imbalanced training set is used to train the model.
- In the second stage, keeping the model representations unchanged, a class-balanced dataset is used to train the classifier for a small number of epochs, and the weights and biases of the classifier are fine-tuned.
3. Results
3.1. Dataset
- PhysioNet dataset is used.
- Butterworth filter with cutoff frequencies of 25 Hz and 400 Hz is used.
- To make each time series equal in length to the longest time series, the low-amplitude random numbers are padded at the end of each time series.
- Each PCG signal is resampled to 500 Hz and the recording time is fixed to 3s.
- z-normalization is used so that its average value is 0 and its standard deviation is 1.
3.2. SOTA Comparisons
- The two-stage training method is applied in all models in this paper to ensure fairness.
DsaNet obtained the best accuracy of 90.70% on this dataset, DsaNet outperforms the suboptimal model MobileNetV3-Large by 2.04% in terms of accuracy.
3.3. Ablation Studies
Testing ensemble size of 3 is the best.
Training Crop size of 5 is the best.
EcaNet is the best.
- SMOTEEN is the best. (No mentioning of “SMOTEEN” in the proposed method in the paper.)