Brief Review — Audio for Audio is Better? An Investigation on Transfer Learning Models for Heart Sound Classication

Pretraining Using Audio Data Instead of Image Data

3 min readFeb 25, 2024

Audio for Audio is Better? An Investigation on Transfer Learning Models for Heart Sound Classication
Pretrained PANN, by The University of Tokyo, University of Surrey, Imperial College London, University of Augsburg
2020 EMBC, Over 50 Citations (Sik-Ho Tsang @ Medium)
Heart Sound Classification
2013 … 2023 [2LSTM+3FC, 3CONV+2FC] [NRC-Net] [Log-MelSpectrum + Modified VGGNet] [CNN+BiGRU] [CWT+MFCC+DWT+CNN+MLP] [LSTM U-Net (LU-Net)]
==== My Other Paper Readings Are Also Over Here ====

A novel transfer learning (TL) model pre-trained on large scale audio data is proposed for a heart sound classication task.
To the best of authors’ knowledge, it is the first time an audio based pre-trained TL model is used for heart sound classication.

Outline

Pretrained PANN
Results

1. Pretrained PANN

**PANN** **CNN14 for Heart Sound Classification**

A 14-layer CNN was transferred and fine-tuned on several audio pattern tasks. Their CNN pretrained on AudioSet is generalised well in many audio pattern recognition tasks.
CNN14 has 5 blocks of 3 x 3 convolutional filters, batch normalization and ReLU as shown in Table I.

The whole system structure is shown in Fig. 1.
The loss function are binary cross-entropy or log loss:

Features in higher frequency range in the spectrogram are relatively coarse as compared to those in lower frequency with respect to spectral resolution.
Log mel spectrogram is used as the input.