Brief Review — Deep Feature Learning for Medical Acoustics

Learnable Features + VGGNet, EfficientNet-B0

3 min readDec 3, 2023

Deep Feature Learning for Medical Acoustics
Learnable Features + VGGNet, EfficientNet, by University of Milano
2022 ICAAN (Sik-Ho Tsang @ Medium)
Heart Sound Classification
2013 … 2022 [CirCor Dataset] [CNN-LSTM] [DsaNet] [Modified Xception] [Improved MFCC+Modified ResNet] 2023 [2LSTM+3FC, 3CONV+2FC] [NRC-Net]
==== My Other Paper Readings Are Also Over Here ====

A framework is proposed to classify human respiratory sounds and heartbeats in two categories, i.e. healthy or affected by pathologies
This paper proposes to classify the sounds using two learnable state-of-art frontends {LEAF and nnAudio} plus a non-learnable baseline frontend, i.e. Mel-filterbanks.
The computed features are then fed into 2 different CNN models, namely VGG16 and EfficientNet.

Outline

Feature extractor Frontends
VGG16 and EfficientNet-B0
Results

1. Feature Extractor Frontends

LEAF [26] and nnAudio are features extractors that, unlike Mel-filterbank, are completely trainable.

1.1. LEAF [26]

LEAF is a neural network.
This frontend learns all operations of audio features extraction, from filtering to pooling, compression and normalization.

1.2. nnAudio [5]

nnAudio is also a neural network.
It uses convolutional neural networks (CNNs) to perform the conversion from time domain to frequency domain, and it can be trained together with any classifier.

2. VGG16 and EfficientNet-B0

Either VGG16 and EfficientNet-B0 is used as the model to predict the class of sounds.

3. Results

3.1. Datasets

**Respiratory Sound [3] and** **PhysioNet Heart Sound [4]** **Datasets**

The audio is segmented into shorter files, filtered, and resampled.
75% of the dataset for the train set, 15% for the validation set, and 10% for the test set.

3.2. Respiratory Sound Dataset Results

Surprisingly, it is found that with VGG16 the baseline method outperforms the learnable frontends, proving the well-design of old Log-Mel spectrograms compared to newer neural network frameworks.

3.3. Heart Sound Dataset Results

**EfficientNet-B0 on Heartsound Dataset**

LEAF achieves the better accuracy using both VGG16 and EfficientNet.
However, when using EfficientNet, the best TNR was achieved by nnAudio. Note that TNR is particularly important in first-screening diagnosis.

Brief Review — Deep Feature Learning for Medical Acoustics

Learnable Features + VGGNet, EfficientNet-B0

Outline

1. Feature Extractor Frontends

1.1. LEAF [26]

1.2. nnAudio [5]

2. VGG16 and EfficientNet-B0

3. Results

3.1. Datasets

3.2. Respiratory Sound Dataset Results

3.3. Heart Sound Dataset Results

Sign up to discover human stories that deepen your understanding of the world.

Free

Membership

Written by Sik-Ho Tsang

No responses yet