Brief Review — Audio for Audio is Better? An Investigation on Transfer Learning Models for Heart Sound Classication

Pretraining Using Audio Data Instead of Image Data

Sik-Ho Tsang
3 min readFeb 25, 2024

Audio for Audio is Better? An Investigation on Transfer Learning Models for Heart Sound Classication
Pretrained PANN
, by The University of Tokyo, University of Surrey, Imperial College London, University of Augsburg
2020 EMBC, Over 50 Citations (Sik-Ho Tsang @ Medium)

Heart Sound Classification
2013
2023 [2LSTM+3FC, 3CONV+2FC] [NRC-Net] [Log-MelSpectrum + Modified VGGNet] [CNN+BiGRU] [CWT+MFCC+DWT+CNN+MLP] [LSTM U-Net (LU-Net)]
==== My Other Paper Readings Are Also Over Here ====

  • A novel transfer learning (TL) model pre-trained on large scale audio data is proposed for a heart sound classication task.
  • To the best of authors’ knowledge, it is the first time an audio based pre-trained TL model is used for heart sound classication.

Outline

  1. Pretrained PANN
  2. Results

1. Pretrained PANN

PANN CNN14 for Heart Sound Classification
  • A 14-layer CNN was transferred and fine-tuned on several audio pattern tasks. Their CNN pretrained on AudioSet is generalised well in many audio pattern recognition tasks.
  • CNN14 has 5 blocks of 3 x 3 convolutional filters, batch normalization and ReLU as shown in Table I.
PANN CNN14 for Heart Sound Classification
  • The whole system structure is shown in Fig. 1.
  • The loss function are binary cross-entropy or log loss:
Spectrogram vs Log mel spectrogram
  • Features in higher frequency range in the spectrogram are relatively coarse as compared to those in lower frequency with respect to spectral resolution.
  • Log mel spectrogram is used as the input.

2. Results

UAR
Specificity & Sensitivity
  • CNNs except PANNs accept spectrogram and Log Mel spectrogram as an input, while PANNs accept a raw waveform.

The proposed PANN-based model achieves the highest UAR at 89.7 %, specificity at 88.6%, and sensitivity at 96.9%.

--

--

Sik-Ho Tsang
Sik-Ho Tsang

Written by Sik-Ho Tsang

PhD, Researcher. I share what I learn. :) Linktree: https://linktr.ee/shtsang for Twitter, LinkedIn, etc.

No responses yet