Brief Review — HeartWave: A Multiclass Dataset of Heart Sounds for Cardiovascular Diseases Detection

HeartWave, 1353 Recordings, 9 Classes

Sik-Ho Tsang
4 min readJun 8, 2024
Heart auscultation collection positions.

HeartWave: A Multiclass Dataset of Heart Sounds for Cardiovascular Diseases Detection
, by King Abdulaziz University
2023 ACCESS (Sik-Ho Tsang @ Medium)

Phonocardiogram (PCG)/Heart Sound Classification
2013 …
2023 … [CTENN] [Bispectrum + ViT] 2024 [MWRS-BFSC + CNN2D]
==== My Other Paper Readings Are Also Over Here ====

  • HeartWave dataset is proposed, which is a comprehensive heart sound dataset comprising recordings from 9 distinct classes of the most common heart sounds from all classes and subclasses of cardiovascular diseases, documented, with enough samples, good quality, and well labelled, with a focus on the hard and difficult cases of diagnosis.
  • The dataset includes a total of 1353 recordings of heart sounds. Notably, this dataset includes extremely rare and difficult-to-diagnose classes.
  • The average signal-to-noise ratio (SNR) of HeartWave surpasses that of the widely known PhysioNet/CinC 2016 public dataset.


  1. HeartWave Dataset
  2. Dataset Analysis

1. HeartWave Dataset

1.1. Limitation of Other Datasets

Limitation of Other Datasets at the Rightmost Column

Normally, the existing datasets lack other heart disease while HeartWave has 9 classes as below:

Characteristics of heart sounds in diagnostic classes.

1.2. HeartWave Data Collection

  • The dataset was gathered from 3 prominent hospitals that offer specialized cardiovascular healthcare services: the National Heart Institute in Cairo, Egypt; King Abdulaziz Specialist Hospital-Taif, KSA; and King Faisal Medical Complex-Taif, KSA. Data collection took place from September 9, 2022, to January 30, 2023.

Individuals are from diverse adults age groups, ethnic backgrounds, and geographical areas.

1.3. Instrument


Authors have developed in-house digital stethoscope: 3M Littman Classic III Stethoscope, an Electret Microphone Condenser that was embedded inside the rubber tube, a microphone was connected to the suitable amplifier, which was linked to the iRig HD-2 audio interface. Finally, the audio interface was connected to an Apple iPad.

  • This offers a cost-effective alternative for recording heart sounds, eliminating the need for costly licenses associated with integrating electronic stethoscopes with mobile applications.

1.4. Data Collection App

Data Collection App

The application features a user-friendly interface that enables physicians to seamlessly gather and label data, as depicted above.

  • (a): Main page.
  • (b): Before recording, the corresponding area within the application is needed to be chosen.
  • (c): Start recording the heart sounds by pressing the recording icon. Once the recording is complete, choose the specific diagnosis from the dropdown list. Save the heart sound as a WAV format file along with its corresponding label.

1.5. Data Labeling

A team of cardiologists diligently examined each patient’s echocardiogram report. They managed and supervised the whole labeling and annotation process.

They assigned labels to the heart sounds, indicating the presence or absence of specific diseases or pathological conditions.

  • If a patient had two or three diseases, authors did not collect any samples from them. They aimed to capture only the sound of a single disease.

1.6. Dataset Description

Dataset Description

The HeatWave dataset comprises 1353 records. The dataset offers label annotations at record levels.

  • The annotations indicate the specific chest area from which each recording was obtained.
  • In terms of patient distribution, the dataset consists of 401 recordings from healthy individuals and 952 recordings from patients.
  • Furthermore, an important feature of the HeartWave dataset is the inclusion of murmur grades ranging from 1 to 6. These grades accurately reflect the varying severity and characteristics of murmurs found in realworld scenarios.
  • On average, the record duration is 21.57 seconds, and all sound records are stored in wave (.wav) format.

2. Dataset Analysis

SNR Comparion With PhysioNet

In HeartWave, the majority of SNR values are above zero, contrasting with the PhysioNet/CinC 2016 dataset, where the majority of values fall below zero.

  • Also, HeartWave has the streamlined software which significantly simplifies the management and evaluation of the dataset.

The dataset stands out not merely because of its volume but also due to its comprehensive annotation and noise analysis approach.



Sik-Ho Tsang

PhD, Researcher. I share what I learn. :) Linktree: for Twitter, LinkedIn, etc.