Brief Review — Deep Feature Learning for Medical Acoustics

Learnable Features + VGGNet, EfficientNet-B0

Sik-Ho Tsang
3 min readDec 3, 2023

Deep Feature Learning for Medical Acoustics
Learnable Features + VGGNet, EfficientNet
, by University of Milano
2022 ICAAN (Sik-Ho Tsang @ Medium)

Heart Sound Classification
2022 [CirCor Dataset] [CNN-LSTM] [DsaNet] [Modified Xception] [Improved MFCC+Modified ResNet] 2023 [2LSTM+3FC, 3CONV+2FC] [NRC-Net]
==== My Other Paper Readings Are Also Over Here ====

  • A framework is proposed to classify human respiratory sounds and heartbeats in two categories, i.e. healthy or affected by pathologies
  • This paper proposes to classify the sounds using two learnable state-of-art frontends {LEAF and nnAudio} plus a non-learnable baseline frontend, i.e. Mel-filterbanks.
  • The computed features are then fed into 2 different CNN models, namely VGG16 and EfficientNet.


  1. Feature extractor Frontends
  2. VGG16 and EfficientNet-B0
  3. Results

1. Feature Extractor Frontends

  • LEAF [26] and nnAudio are features extractors that, unlike Mel-filterbank, are completely trainable.

1.1. LEAF [26]

  • LEAF is a neural network.
  • This frontend learns all operations of audio features extraction, from filtering to pooling, compression and normalization.

1.2. nnAudio [5]

  • nnAudio is also a neural network.
  • It uses convolutional neural networks (CNNs) to perform the conversion from time domain to frequency domain, and it can be trained together with any classifier.

2. VGG16 and EfficientNet-B0

  • Either VGG16 and EfficientNet-B0 is used as the model to predict the class of sounds.

3. Results

3.1. Datasets

Respiratory Sound [3] and PhysioNet Heart Sound [4] Datasets
  • The audio is segmented into shorter files, filtered, and resampled.
  • 75% of the dataset for the train set, 15% for the validation set, and 10% for the test set.

3.2. Respiratory Sound Dataset Results

VGG16 on ICBHI Dataset
EfficientNet-B0 on ICBHI Dataset

Surprisingly, it is found that with VGG16 the baseline method outperforms the learnable frontends, proving the well-design of old Log-Mel spectrograms compared to newer neural network frameworks.

3.3. Heart Sound Dataset Results

VGG16 on Heartsound Dataset
EfficientNet-B0 on Heartsound Dataset

LEAF achieves the better accuracy using both VGG16 and EfficientNet.

However, when using EfficientNet, the best TNR was achieved by nnAudio. Note that TNR is particularly important in first-screening diagnosis.



Sik-Ho Tsang

PhD, Researcher. I share what I learn. :) Linktree: for Twitter, LinkedIn, etc.