Brief Review — Recognizing Abnormal Heart Sounds Using Deep Learning

MFCC+CNN

Sik-Ho Tsang
3 min readDec 24, 2023

Recognizing Abnormal Heart Sounds Using Deep Learning
MFCC+CNN arXiv’17
, by Philips Research North America; PARC, A Xerox Company
2017 arXiv, Over 120 Citations (Sik-Ho Tsang @ Medium)

Heart Sound Classification
20132023 [2LSTM+3FC, 3CONV+2FC] [NRC-Net] [Log-MelSpectrum+Modified VGGNet] [CNN+BiGRU] [CWT+MFCC+DWT+CNN+MLP]
==== My Other Paper Readings Are Also Over Here ====

  • An automated heart sound classification algorithm is proposed which combines the use of time-frequency heat map representations with a deep convolutional neural network (CNN).

Outline

  1. MFCC+CNN
  2. Results

1. MFCC+CNN

1.1. Segmentation

  • A logistic regression hidden semi-Markov model [Springer et al., 2016] is used to predict the most likely sequence of heart sound states (S1 > Systole > S2 > Diastole) by incorporating information about expected state durations.
  • The final model used a segment length of T = 3 seconds, and overlapping segments are used as this led to performance improvements during initial training and validation.

1.2. MFCC

MFCC
  • Then, the heart sound is converted to MFCC as input feature to CNN.
  • A single channel 6×300 MFCC heat map is used as input and a binary classification is the output.

1.3. CNN

CNN
  • The first convolutional layer learns 64 2×20 kernels, using same-padding. This is followed by applying a 1×20 maxpooling filter, using a horizontal stride of 5, which has the effect of reducing each of the 64 feature maps to a dimension of 6×60.
  • A second convolutional layer applies 64 2×10 kernels over the previous layer, once again using same padding.
  • This is again followed by a max-pooling operation using a filter size of 1x4 and a stride of 2, further reducing each feature map to a dimension of 6×30.
  • At this stage in the architecture a flattening operation is applied that unrolls each of the 64 6×30 feature maps into a single dimensional vector of size 11,520.
  • This feature vector is fed into a first fully connected layer consisting of 1024 hidden units, followed by a second layer of 512 hidden units and finally a binary classification output.

2. Results

Results
  • Table 2 shows a selected subset of the results for the 2016 PhysioNet Computing in Cardiology Challenge. For each selected entry, sensitivity, specificity and overall scores are shown, as well as the entry’s final ranking and a brief description of its approach. In total, 348 entries were submitted by 48 teams.

The proposed entry by this paper, as described by the algorithm presented in this paper, was ranked 8th with a sensitivity of 0.7278 and specificity of 0.9521, giving an overall score of 0.8399.

--

--

Sik-Ho Tsang

PhD, Researcher. I share what I learn. :) Linktree: https://linktr.ee/shtsang for Twitter, LinkedIn, etc.