Brief Review — Classification of Heart Sounds Using Machine Learning

RF Classifer Using MFCC and Patient Information

Sik-Ho Tsang
2 min readMay 4, 2024

Classification of Heart Sounds Using Machine Learning
MFCC + Patient Features + RF
, by Sheridan College
2023 ICDH (Sik-Ho Tsang @ Medium)

Phonocardiogram (PCG)/Heart Sound Classification
2013 …
2023 … [CTENN] [Bispectrum + ViT]
==== My Other Paper Readings Are Also Over Here ====

  • MFCC and patient information are used as input to random forest (RF) classifier for heart sound classification.


  1. MFCC + Patient Features + RF
  2. Results

1. MFCC + Patient Features + RF

  • PhysioNet dataset is used.
  • An open source MFCC algorithm is used convert the audio files into the CSV format.
  • The dataset included 394 entries consisting of information about the patient and their heart sound.
  • The dataset was tested on 15 different supervised classifiers: Random Forest, Decision Tree, Extra Trees, Ada Boost, Ridge, Logistic Regression, Naive Bayes, K Neighbors, SVM — Linear Kernel, Dummy, Gradient Boosting, Light Gradient Boosting, Extreme Gradient Boosting, Quadratic Discriminant and Linear Discriminant.
  • 10-fold cross validation is used. An 80/20 training/testing split was used resulting in 314 rows for training and 79 rows for testing.
  • There were 26 features present in the dataset. Five features (Challenge record name, Database, Original record name, Diagnosis, Subject ID) were hidden, which make 20 features remained.
  • The first test used the 20 remaining features in the dataset. The second also ignored the Age (year), Gender, Weight (kg), and Height (m) features.

2. Results

Feature Importance
  • The first test, with 20 features being used in classification resulted in an average F1 score of 0.8773. Figure 1 shows Age and Gender were consistently important features.
Confusion matrix
  • Confusion matrix is also shown.



Sik-Ho Tsang

PhD, Researcher. I share what I learn. :) Linktree: for Twitter, LinkedIn, etc.