Brief Review — DsaNet: Imbalanced Heart Sound Signal Classification Based on Two‑Stage Trained DsaNet

DsaNet, Random Cropping, 2-Stage Training

4 min readNov 21, 2023

Imbalanced Heart Sound Signal Classification Based on Two‑Stage Trained DsaNet
DsaNet, by Wuhan University of Technology, Huazhong University of Science and Technology, and North University of China
2022 J. Cognitive Computation (Sik-Ho Tsang @ Medium)
Heart Sound Classification
2013 … 2021 [CardioXNet] 2022 [CirCor] [CNN-LSTM] 2023 [2LSTM+3FC, 3CONV+2FC] [NRC-Net]
==== My Other Paper Readings Are Also Over Here ====

DsaNet is proposed, which uses depthwise separable convolution and the attention module (EcaNet). DsaNet can directly classify PCG signals without complicated feature engineering processes.
To address the long-tail distribution problem in the PCG dataset, a novel imbalanced learning approach (two-stage training) is adopted for training.
A random cropping operation is proposed to increase the amount and diversity of the data in the training stage. Random cropping is also combined with the idea of integration to improve test accuracy in the testing stage.

Outline

DsaNet
Random Cropping & 2-Stage Training
Results

1. DsaNet

1.1. Overall Architecture

The DsaNet uses bottleneck module.
Inside bottleneck, Depthwise separable convolution and EcaNet are involved.

1.2. Bottleneck

1D conv is used to expand the dimention.
Then Depthwise separable convolution is used.
1D conv is used again, but this time reduce the dimension.
Finally, EcaNet is used for attention-based mechanism.

1.3. EcaNet

Rather than squeezing and expanding dimensions like SENet, EcaNet only uses 1D conv (k=5) without squeezing and expanding the dimensions.

2. Random Cropping & 2-Stage Training

2.1. Random Cropping

Given a PCG signal X, it is divided into subsequence S:

Each subsequence S can obtain a classification result p:

**Random Cropping With Integrated Voting**

Here, random cropping is used crop the subsequence S.

Integrated voting is conducted on p1, p2,⋯, p to obtain the final classification result p of a PCG signal X:

Random cropping to the entire training set.
The testing data is also cut randomly, similar to training data.

2.2. 2-Stage Training

In the first stage, the original imbalanced training set is used to train the model.
In the second stage, keeping the model representations unchanged, a class-balanced dataset is used to train the classifier for a small number of epochs, and the weights and biases of the classifier are fine-tuned.

3. Results

3.1. Dataset

PhysioNet dataset is used.
Butterworth filter with cutoff frequencies of 25 Hz and 400 Hz is used.
To make each time series equal in length to the longest time series, the low-amplitude random numbers are padded at the end of each time series.
Each PCG signal is resampled to 500 Hz and the recording time is fixed to 3s.
z-normalization is used so that its average value is 0 and its standard deviation is 1.

3.2. SOTA Comparisons

The two-stage training method is applied in all models in this paper to ensure fairness.

DsaNet obtained the best accuracy of 90.70% on this dataset, DsaNet outperforms the suboptimal model MobileNetV3-Large by 2.04% in terms of accuracy.