Brief Review — Heart murmur detection from phonocardiogram recordings: The George B. Moody PhysioNet Challenge 2022

PhysioNet Challenge 2022

Sik-Ho Tsang
4 min read6 days ago

Heart murmur detection from phonocardiogram recordings: The George B. Moody PhysioNet Challenge 2022
PhysioNet Challenge 2022
, by Emory University, University of the Basque Country UPV/EHU, Universidade Portucalense, Universidade do Porto, ResMed, Harvard Medical School, Real Hospital Português, and Emory University and the Georgia Institute of Technology
2023 J. PLOS Digital Health, Over 100 Citations (Sik-Ho Tsang @ Medium)

Phonocardiogram (PCG)/Heart Sound Classification
2013 …
2024 [MWRS-BFSC + CNN2D] [ML & DL Model Study on HSS] [Audio Data Analysis Tool]
My Healthcare and Medical Related Paper Readings and Tutorials
==== My Other Paper Readings Are Also Over Here ====

  • The George B. Moody PhysioNet Challenge 2022 invited teams to develop algorithmic approaches for detecting heart murmurs and abnormal cardiac function from phonocardiogram (PCG) recordings of heart sounds.
  • The challenge received 779 algorithms from 87 teams during the Challenge, resulting in 53 working codebases.

Outline

  1. Dataset
  2. Challenges
  3. Results

1. Dataset

Auscultation locations for the CirCor DigiScope dataset

The CirCor DigiScope dataset was used for the George B. Moody PhysioNet Challenge 2022. This dataset consists of 5268 PCG recordings from one or more auscultation locations during 1568 patient encounters with 1452 distinct patients.

  • Up to four auscultation locations on the body are having recordings as above:
  • Aortic valve: second intercostal space, right sternal border;
  • Pulmonic valve: second intercostal space, left sternal border;
  • Tricuspid valve: lower left sternal border; and
  • Mitral valve: fifth intercostal space, midclavicular line (cardiac apex).

60% of the recordings are released in a public training set and 10% of the recordings are retained in a private validation set and 30% of the recordings are retained in a private test set.

Demographic, murmur, and clinical outcome information in the Challenge training, validation, and/or test sets
  • There are demographic, murmur, and clinical outcome information provided for training as above.

2. Challenges

Screening and diagnosis pipeline for the Challenge

There are 2 tasks: Heart murmur detection and clinical outcome identification.

  • The murmurs were directly observable from the PCG recordings, but the clinical outcomes were determined by a more comprehensive diagnostic screening, including the interpretation of an echocardiogram. However, despite these differences, the teams are asked to perform both tasks using only the PCGs and routine demographic data.

2.1. Weighted Accuracy Metric

Confusion matrixMfor murmur detection with three classes: Murmur present, murmur unknown, and murmur absent
  • For heart murmur detection, there are 3 classes, Present, Unknown, and Absent.
  • Weighted accuracy metric is used inwhich it assigned more weight to patients that had or potentially had murmurs than to patients that did not have murmurs:

2.2. Cost-based Evaluation Metric

Confusion matrix N for clinical outcome detection with two classes: Clinical outcome abnormal and clinical outcome normal

If the algorithm inferred normal cardiac function, then it would not refer the patient to an expert, and the patient would not receive treatment, even if the patient had abnormal cardiac function that would have been detected by a human expert.

The total cost of diagnosis and treatment with algorithmic pre-screening is defined as:

  • where n_patients is the total number of patients.
  • The costs for algorithmic pre-screening, timely treatments, and missed or late treatments were linear. The total cost of s pre-screenings by an algorithm is:
  • The total cost of s treatments:
  • The total cost of s missed or delayed treatments:
  • To better capture the potential benefits of algorithmic pre-screening, the cost for expert screening was non-linear:
  • To more easily compare costs across databases with different numbers of patients, e.g., the training, validation, and test sets, the mean per-patient cost of diagnosis and treatment with algorithmic pre-screening as:

The lower the mean per-patient cost, the better the algorithm.

3. Results

Scores of the officially ranked methods on the test set for the murmur detection task

Of the 53 working entries, a total of 40 teams were officially ranked.

  • Additionally, the common gradient-boosting trees (GBT) and random forests (RF) models are trained by considering the the discrete model outputs from different teams.
  • The GBT and RF voting algorithms performed slightly better than the highest-ranked entry for the murmur detection task.
Scores of the officially ranked methods on the test set for the clinical outcome identification task
  • Yet, the GBT and RF voting algorithms performed slightly worse than the highest-ranked entries for the clinical outcome identification task.
  • (Hope I can read some of them in the coming future.)

--

--

Sik-Ho Tsang

PhD, Researcher. I share what I learn. :) Linktree: https://linktr.ee/shtsang for Twitter, LinkedIn, etc.