Brief Review — Towards the classification of heart sounds based on convolutional deep neural network

AlexNet, VGG16, and VGG19 for Feature Extraction, SVM for Classification

Sik-Ho Tsang
3 min readNov 7, 2023

Towards the classification of heart sounds based on convolutional deep neural network
{AlexNet, VGG} + SVM
, by Abant Izzet Baysal University
2019 Health Inf. Sci. Syst., Over 60 Citations (Sik-Ho Tsang @ Medium)

20132018 [RNN Variants] [SVM, DNN, kNN] [LSTM] [Chakir JSVIP’18] 2023 [2LSTM+3FC, 3CONV+2FC] [NRC-Net]
==== My Other Paper Readings Are Also Over Here ====

  • The proposed method employs 3 successive stages, such as spectrogram generation, deep feature extraction, and classification.
  • The deep features are extracted from three different pre-trained CNN models such as AlexNet, VGG16, and VGG19.
  • Support vector machine (SVM) is used as classifier.


  1. Proposed Approach
  2. Results

1. Proposed Approach

Proposed Approach

The system is composed of 3 main components, such as image construction, feature extraction, and concatenation and feature classification.

1.1. Image Construction

Spectrogram Images Using Short Time Fourier Transform (STFT)

The Short Time Fourier Transform (STFT) is used to construct the spectrogram images.

  • Given x is the signal, F is the corresponding STFT.
  • where ω(i) is the window function.

The magnitude squared of the STFT representation |F(n, ω)|² is called spectrogram.

1.2. Deep Feature Extraction

Pre-trained CNN models such as VGG16, VGG19, and AlexNet are used for feature extraction.

  • Spectrogram images are the input for the feature extraction architecture. The images are resized to (224 × 224 × 3) for AlexNet and (227 × 227 × 3) for VGG16 and VGG19 models.

The feature vectors at FC6 of pretrained models are concatenated.

1.3. Classification

The SVM classifier with homogenous mapping and LIBLINEAR library with the L2-regularised L2-loss dual solver is considered.

2. Results

2.1. PASCAL Dataset

PASCAL Dataset

PASCAL datasets A and B are used.

2.2. Dataset A Results

Dataset A Results

Except the Normal category, VGG16 produces the highest precision scores for all categories.

Furthermore, AlexNetVGG16 also produces the highest precision scores for Normal, Extra Heart Sound, and Artifact categories.

2.3. Dataset B Results

Dataset B Results
  • Finally, 2.15 total precision, which is the highest, is obtained by VGG16–VGG19.
  • The second highest total precision score is produced by AlexNetVGG16–VGG19, where the precision is 1.70.



Sik-Ho Tsang

PhD, Researcher. I share what I learn. :) Linktree: for Twitter, LinkedIn, etc.