Brief Review — Multi-Task Prediction of Murmur and Outcome from Heart Sound Recordings
PRNA Team, 2nd Place for Clinical Outcome Prediction Task
Multi-Task Prediction of Murmur and Outcome from Heart Sound Recordings
PRNA, by Philips Research North America, Banner Health, and University of Arizona College Medicine
2022 CinC (Sik-Ho Tsang @ Medium)Phonocardiogram (PCG) / Heart Sound Classification
2016 … 2024 [MWRS-BFSC + CNN2D] [ML & DL Model Study on HSS] [Audio Data Analysis Tool]
Summary: My Healthcare and Medical Related Paper Readings and Tutorials
==== My Other Paper Readings Are Also Over Here ====
- This paper uses multi-task learning for heart sound classification.
- Each heart sound recording segment is transformed into a time-domain embedding vector through a convolutional neural network (CNN). In parallel, the Mel-frequency cepstrum (MFCC) representation of the segment is transformed into a frequency-domain embedding vector using CNN.
- These embedding vectors and the demographic variables are concatenated and then used as input to two separate networks built to jointly predict the presence of heart murmurs and clinical outcomes respectively using multi-task learning.
- It ranked 2nd out of 39 teams for clinical outcome prediction task on the hidden test set.
Outline
- Dataset & Preprocessing
- Multi-Task Learning Model
- Results
1. Dataset & Preprocessing
1.1. Dataset
- The public training data consists of 3163 heart sound recordings collected from 942 pediatric subjects.
- Most of the patients have multiple recordings from multiple auscultation locations (for example Mitral, Aortic, Pulmonary or Triecuspid Valves).
- The murmur label (Present, Unknown, Absent) is assigned to each recording and a subject is labeled as murmur present if any recording of the subject contains murmur. The clinical outcome label (Abnormal, Normal) is assigned to each subject.
1.2. Preprocessing
- Each recording was downsampled from 4000 Hz to 1000 Hz.
- To remove the noise, an order-2 Butterworth filter [6] with frequency bandpass of 25 Hz to 400 Hz is applied.
- z-normalization is then applied.
- Each recording was divided into multiple consecutive non-overlapping 3-second segments and each segment was labeled using the murmur label of the recording and outcome label of the subject.
2. Multi-Task Learning Model
- The 3-second segment and MFCC features were used as inputs.
- The 3-second segment and MFCC features were transformed into two embedding vectors using two separate CNN networks.
- The embedding vectors were then concatenated with the demographic features and used as input to two separate prediction layers.
- The summation of the cross entropy loss of the murmur prediction and clinical outcome prediction is used to train the model:
3. Results
3.1. Challenges
Compared to other teams, the ranking of outcome prediction (2nd place) is much higher than murmur prediction (20th place).
3.2. Ablation Studies
After removing the MFCC features from the model input, the performance of both murmur prediction and outcome prediction decreases.
- Compared to the multi-task training, the performance of the independent murmur prediction model becomes worse.
- In contrast, the performance of the independent outcome prediction model becomes better (cost decreases from 12302.800 to 12065.997).
The multi-task prediction improves the performance of murmur prediction at the expense of outcome prediction.