Brief Review — DsaNet: Imbalanced Heart Sound Signal Classification Based on Two‑Stage Trained DsaNet

DsaNet, Random Cropping, 2-Stage Training

Sik-Ho Tsang
4 min readNov 21, 2023
(a) Normal (b) Abnormal PCG Signal

Imbalanced Heart Sound Signal Classification Based on Two‑Stage Trained DsaNet
, by Wuhan University of Technology, Huazhong University of Science and Technology, and North University of China
2022 J. Cognitive Computation (Sik-Ho Tsang @ Medium)

Heart Sound Classification
2021 [CardioXNet] 2022 [CirCor] [CNN-LSTM] 2023 [2LSTM+3FC, 3CONV+2FC] [NRC-Net]
==== My Other Paper Readings Are Also Over Here ====

  • DsaNet is proposed, which uses depthwise separable convolution and the attention module (EcaNet). DsaNet can directly classify PCG signals without complicated feature engineering processes.
  • To address the long-tail distribution problem in the PCG dataset, a novel imbalanced learning approach (two-stage training) is adopted for training.
  • A random cropping operation is proposed to increase the amount and diversity of the data in the training stage. Random cropping is also combined with the idea of integration to improve test accuracy in the testing stage.


  1. DsaNet
  2. Random Cropping & 2-Stage Training
  3. Results

1. DsaNet

1.1. Overall Architecture

Overall Architecture

1.2. Bottleneck

  • 1D conv is used to expand the dimention.
  • Then Depthwise separable convolution is used.
  • 1D conv is used again, but this time reduce the dimension.
  • Finally, EcaNet is used for attention-based mechanism.

1.3. EcaNet

  • Rather than squeezing and expanding dimensions like SENet, EcaNet only uses 1D conv (k=5) without squeezing and expanding the dimensions.

2. Random Cropping & 2-Stage Training

2.1. Random Cropping

Random Cropping

Given a PCG signal X, it is divided into subsequence S:

Each subsequence S can obtain a classification result p:

Random Cropping With Integrated Voting
  • Here, random cropping is used crop the subsequence S.

Integrated voting is conducted on p1, p2,⋯, p to obtain the final classification result p of a PCG signal X:

Random cropping to the entire training set.

The testing data is also cut randomly, similar to training data.

2.2. 2-Stage Training

2-Stage Training
  • In the first stage, the original imbalanced training set is used to train the model.
  • In the second stage, keeping the model representations unchanged, a class-balanced dataset is used to train the classifier for a small number of epochs, and the weights and biases of the classifier are fine-tuned.

3. Results

3.1. Dataset

PhysioNet Dataset
  • PhysioNet dataset is used.
  • Butterworth filter with cutoff frequencies of 25 Hz and 400 Hz is used.
  • To make each time series equal in length to the longest time series, the low-amplitude random numbers are padded at the end of each time series.
  • Each PCG signal is resampled to 500 Hz and the recording time is fixed to 3s.
  • z-normalization is used so that its average value is 0 and its standard deviation is 1.

3.2. SOTA Comparisons

SOTA Comparisons
  • The two-stage training method is applied in all models in this paper to ensure fairness.

DsaNet obtained the best accuracy of 90.70% on this dataset, DsaNet outperforms the suboptimal model MobileNetV3-Large by 2.04% in terms of accuracy.

3.3. Ablation Studies

Testing Ensemble Size

Testing ensemble size of 3 is the best.

Training Crop Size

Training Crop size of 5 is the best.

Different Attention Modules

EcaNet is the best.

Different Balancing Methods
  • SMOTEEN is the best. (No mentioning of “SMOTEEN” in the proposed method in the paper.)



Sik-Ho Tsang

PhD, Researcher. I share what I learn. :) Linktree: for Twitter, LinkedIn, etc.