Brief Review — The CirCor DigiScope Dataset: From Murmur Detection to Murmur Classification

CirCor, Heart Sound Dataset, Over 5000 Recordings

Sik-Ho Tsang
6 min readNov 18, 2023
The DigiScope Collector technology

The CirCor DigiScope Dataset: From Murmur Detection to Murmur Classification
, by Instituto Superior de Engenharia do Porto, Rua Dr. António Bernardino de Almeida Nº431, Faculdade de Ciências da Universidade do Porto, INESC TEC, Faculdade de Ciências da Universidade do Porto, Emory University School of Medicine, University of the Basque Country, Georgia Institute of Technology and Emory University, Círculo do Coração de Pernambuco
2022 JBHI, Over 80 Citations (Sik-Ho Tsang @ Medium)

Heart Sound Classification
2020 [1D-CNN] [WaveNet] 2023 [2LSTM+3FC, 3CONV+2FC] [NRC-Net]
==== My Other Paper Readings Are Also Over Here ====

  • A total of 5282 recordings have been collected from the four main auscultation locations of 1568 patients, in the process 215780 heart sounds have been manually annotated.
  • Furthermore, and for the first time, each cardiac murmur has been manually annotated by an expert annotator according to its timing, shape, pitch, grading and quality.
  • In addition, the auscultation locations where the murmur is present were identified as well as the auscultation location where the murmur is detected more intensively.


  1. Preliminaries
  2. CirCor Dataset Collection
  3. CirCor Data Labeling
  4. Discussions

1. Preliminaries

Cardiac auscultation points
  • There are 4 auscultation points:
  1. Aortic valve (1): second intercostal space, right sternal border;
  2. Pulmonary valve (2): second intercostal space, left sternal border;
  3. Tricuspid valve (3): left lower sternal border;
  4. Mitral valve (4): fifth intercostal space, midclavicular line (cardiac apex).
An example of a normalized heart sound recording
  • The first heart sound (S1) is produced by vibrations of the mitral and tricuspid valves as they close in at the beginning of the systole.
  • The second heart sound (S2) is produced by the closure of the aortic and pulmonary valve at the beginning of the diastole.

2. CirCor Dataset Collection

2.1. Summary of Datasets

Summary of Datasets
  • (Please read the paper directly for the prior dataset descriptions.)

CirCor has the highest number of recordings.

2.2. Dataset Collection

CC2014 & CC2015 Campaigns

The presented dataset was collected as part of two mass screening campaigns, referred to as “Caravana do Coração” (Caravan of the Heart) campaigns, conducted in Paraíba state, Brazil between July and August 2014 (CC2014) and June and July 2015 (CC2015).

  • A total of 2,061 participants attended the 2014 and 2015 campaigns, with 493 participants being excluded for not meeting the eligibility criteria. Furthermore, 116 patients attended both screening campaign.

2.3. Demographic and Clinical Information

Left: Gender, Age Group, Child’s Race, Moth’s Race, Right: Age Statistics
  • The collected dataset includes 1568 participants, 787 (50.2%) were male and 781 (49,8%) were female.
  • 988 are children (63.0%), 311 infants (19.80%), 127 adolescents (8.1%), 9 young adults (0.6%), 11 neonates (0.7%) and 110 pregnant women (8.1%); in 12 patients no age data is provided (0.8%).
  • Finally, with regards to ethnicity, 1298 participants are mixed race (82.8%), 249 (15.9%) white, and 1.4% of other ethnic backgrounds.
  • The mean age (standard deviation) of the participants is 73.4±0.1 months, ranging from 0.1 to 356.1 months.
Indication, Diagnosis and Plan
  • No formal indication (444, 27.0%), while (305, 18.5%) presented for follow-up of a previously diagnosed cardiopathy. Moreover, 223 participants (13.5%) attended the screening campaign to investigate the evolution of previously identified murmurs.
  • A total of 647 single or multiple diagnosis were confirmed, the most frequent being simple congenital cardiopathy (30.2%) and acquired cardiopathy (3.3%), with 65 (3.9%) diagnosis of complex congenital cardiopathy.
  • A total of 834 (53.2%) participants were referred for follow-up, 27 (1.2%) were referred for additional testing, and 35 (2.2%) had indication for surgery/intervention. 575 participants (36.7%) were discharged after screening.
  • Regarding clinical presentation, 1401 participants (89.4%) presented a good general condition, with the majority of patients being eupneic (90.2%) and having normal perfusion (90.7%).
Duration, Heart Beats
  • There are also information about weight, height, oxygen saturation, tympanic temperature, blood pressure, etc.

2.4. Heart Sounds and Annotations

  • In the CC2014 screening campaign, 540 recordings were collected from the Aortic point, 497 from the Pulmonary point, 603 from the Mitral point, 461 from the Tricuspid point and 5 from an unreported point.
  • In the CC2015 screening campaign, 817 recordings were collected from the Aortic point, 793 from the Pulmonary point, 812 from the Mitral point, 754 from the Tricuspid point, and 1 extra sound from an unreported point.

Overall, between 1 to 4 records exist per patient, with an average of 3.5 recordings per patient.

  • Heart sound is collected by a Littmann 3200 stethoscope embedded with the DigiScope Collector, sampled at 4KHz and with a 16-bits resolution.
  • They are normalized within the [-1, 1] range.
  • The PCG files from the CC2014 and CC2015 campaigns, had an average duration of 28.7 seconds and 19.0 seconds, respectively.
  • Murmurs were present in 305 patients within the collected dataset. Out of these, 294 patients had only a systolic murmur, 1 patient had only a diastolic murmur, and 9 patients had both systolic and diastolic murmurs.

3. CirCor Data Labeling


3.1. Segmentation

  • The acquired audio samples were automatically segmented using the 3 algorithms. 2 independent cardiac physiologists inspected the resulting algorithms’ outputs. Each expert marked the automated annotations with which they agreed, or re-annotated the misdetections.
  • Cardiac physiologists indicated the high quality representative segments. The remainder of the signal may include both low and high quality data. In this way, the user is free to use (or not) the suggested time window.
  • In case of agreement with at least one algorithm’s segmentation output, the audio file and its corresponding annotation file were directly saved.
  • In case of disagreement with all model’s outputs, manual annotation is required. With at least five heart cycles been annotated.

3.2. Audio & Segmentation Files

  • ABCDE_XY.wav format, where ABCDE is a numeric patient identifier and XY is one of the auscultation positions.
  • ABCDE_nXY.wav if more than one recording exists per auscultation location, an integer index n.
  • The annotation segmentation file is composed by 3 distinct columns: the first column corresponds to the time instance, where the wave was detected for the first time; the second column correspond to the time instance, where the wave was detected for the last time; the third column correspond to an identifier.

3.3. Indicators

  • All the collected heart sound records were also screened for presence of murmurs at each auscultation location. Each murmur was classified according to its timing (early-, mid-, and late- systolic/diastolic) [35], shape (crescendo, decrescendo, diamond, plateau), pitch (high, medium, low), quality (blowing, harsh, musical) [35], and grade (according to Levine’s scale [36]).
  • The sounds were recorded in an ambulatory environment. Different noisy sources have been observed in our dataset, from the stethoscope rubbing noise to a crying or laughing, which is indeed a hard task.

4. Discussions

The pitch quality also varies across different stethoscopes. This is technically due to the difference between the transfer functions of different stethoscopes (and the preprocessing filters, in digital stethoscope front-end).

The location where the murmur is detected with the highest intensity is also an important feature to analyze. A murmur caused by aortic stenosis is often best heard at the upper sternal border, and usually on the right side [22]. In this dataset, most of the murmurs are detected with a highest intensity in the pulmonary point.

The pulmonary and tricuspid are the best auscultation locations to detect cardiac murmurs in the proposed dataset.

  • Authors mentioned that the age distribution is also homogeneously distributed in the proposed dataset, thus potentially paving the way to the design of robust decision support systems for different target populations, from neonates to adults.
  • (There are many discussion points in the paper, please feel free to read the paper directly.)



Sik-Ho Tsang

PhD, Researcher. I share what I learn. :) Linktree: for Twitter, LinkedIn, etc.