Brief Review — Heart sound classification based on improved MFCC features and convolutional recurrent neural networks

Improved MFCC + CRNN

Sik-Ho Tsang
4 min readNov 14, 2023
The Structure of the human heart.

Heart sound classification based on improved MFCC features and convolutional recurrent neural networks
Improved MFCC + CRNN
, by Guangdong University of Technology, Hangzhou Dianzi University, University of Alberta, University of Sydney, The People’s Hospital of Yangjiang
2020 J. NeuNet, Over 180 Citations (Sik-Ho Tsang @ Medium)

Heart Sound Classification
2020 [1D-CNN] [WaveNet] 2023 [2LSTM+3FC, 3CONV+2FC] [NRC-Net]
==== My Other Paper Readings Are Also Over Here ====

  • Improved Mel-frequency cepstrum coefficient (MFCC) features are extracted.
  • Improved MFCC features are fed into convolutional recurrent neural networks (CRNNs) for heart sound classification.


  1. Improved MFCC Feature Extraction
  2. CRNN and PRCNN Models
  3. Results

1. Improved MFCC Feature Extraction

1.1. Overall Flowchart

Overall Flowchart

Heart sound signal is first preprocessed, then feature extracted.

The extracted features are then fed into deep learning model for training and testing.

1.2. Preprocessing

The fifth order Butterworth bandpass filter (pass-band: 25–400 Hz) is used to remove low-frequency artifacts, baseline wandering and high-frequency interference.

  • Segmentation is also performed but it is not the focus of this paper.

1.3. Improved MFCC Features

Improved MFCC Features

The Mel-frequency cepstrums reflect the nonlinear relationship between the human ear and the frequency of the sound heard.

  • (Please read the paper directly for MFCC features.)
  • After obtaining MFCC coefficients which reflect the static characteristics of the heart sound signal, the ΔMFCC and Δ2MFCC are also extracted, which are the first and the second differences of MFCC.
  • The first difference coefficients of MFCC (ΔMFCC) is:
  • where k=2.
  • The second difference MFCC parameters (Δ2MFCC) is:

A 39 dimensional feature vector is obtained for further feature learning through the proposed CRNN model.

2. CRNN and PRCNN Models

2.1. CRNN

  • 3 convolutional layers are used, with ReLU and Max-Pooling used.

Authors also mentioned that they use 3 residual blocks (In the figure, it is a simple conv layer.) for the corresponding convolutional and pooling layers. Within a residual block, a batch normalization (BN) layer and a Dropout layer are constructed followed by a max-pooling layer.

  • (I guess the above first 3 convs might be the 3 residual blocks?)

At the end, a Long Short-term Memory (LSTM) layer is applied to learn the temporal features among the obtained feature maps, and a fully connected (FC) layer with 64 neurons is performed to learn the global features.

  • (It is seldom to see LSTM is being used at the classification head part.)

Finally, a softmax layer is adopted to derive the probability distribution across two classes corresponding to normal and abnormal heart sounds.

2.2. PRCNN

  • Top Branch: The Conv path, which is the same as the one in CRNN.

Bottom Branch: In the RNN path below, a max-pooling layer is adopted to perform dimensionality reduction and a LSTM layer is used for heart sound signal temporal feature learning.

  • Head: The outputs of the CNNs block and RNN block are concatentated into one feature vector, fully connected, and softmaxed.

2.3. Baselines


2 baseline models, CNN and RNN, are also developed for comparisons.

3. Results

3.1. Dataset

3.2. Model Variants

Model Variants
  • CRNN and PRCNN variants are shown above.
  • One is LSTM version, one is GRU version.

3.3. Ablation Study

Different Dropout Rates

The best performance can be obtained by using the Dropout rate of 0.5 in the experiment.

3.4. Model Variant Comparisons

Comparisons of Model Variants

CRNN-a and PRCNN-a achieve the best results in terms of accuracy and F1.

  • The CRNN-a model attains an accuracy of 98.34%, a recall of 98.66%, a precision of 98.01% and F1 score of 98.34%.
  • The PRCNNa model attains an accuracy of 97.34% and F1 score of 97.33%.

3.5. SOTA Comparisons

SOTA Comparisons

Proposed CRNN-a and PRCNN-a perform best.



Sik-Ho Tsang

PhD, Researcher. I share what I learn. :) Linktree: for Twitter, LinkedIn, etc.