Brief Review — Ensemble Transformer-Based Neural Networks Detect Heart Murmur In Phonocardiogram Recordings

Care4MyHeart, 6th in Murmur Detection

5 min readDec 7, 2024

Ensemble Transformer-Based Neural Networks Detect Heart Murmur In Phonocardiogram Recordings, by Khalifa University, ADNOC H.Q., and Aristotle University of Thessaloniki
Care4MyHeart, 2022 CinC (Sik-Ho Tsang @ Medium)
Phonocardiogram (PCG) / Heart Sound Classification
2016 … 2024 [MWRS-BFSC + CNN2D] [ML & DL Model Study on HSS] [Audio Data Analysis Tool]
Summary: My Healthcare and Medical Related Paper Readings and Tutorials
==== My Other Paper Readings Are Also Over Here ====

The George B. Moody PhysioNet 2022 Challenge partcipanting team, Care4MyHeart, developed an approach that transforms the raw PCG recordings into wavelet power features signals for the use within the proposed deep learning Transformer models for heart sound classification.

Outline

Data Preparation, Feature Extraction
Transformer-based neural network
Results

1. Data Preparation, Feature Extraction

1.1. Data Preparation

The challenge data included 1568 patients from the pediatric population, out of which 942 were released as training.

Each patient had one or more PCG recordings from four auscultation locations. If a patient had less than four channels, the previous channel was duplicated once or multiple times as needed.
Then, the first 40 seconds are selected from each recording. Many signals were recorded for shorter duration, therefore, padding is used by duplicating the same signal to reach up to 40 seconds.
At the end, z-score normalization is used for the four-channel recording.

1.2. Feature Extraction

**Transformation of the four-channel phonocardiogram (PCG) recording into 30 stacked power features using wavelet transform decomposition**

Each PCG signal is transformed into wavelet transform-based power features. For each signal, a 32-sample wide sliding window with a shifting interval of 32 samples is used to split the signal into 5000 segments.

For each 32-sample segment, wavelet transform decomposition using the Symlets 8 (Sym8) wavelet is applied to decompose each segment into 6 levels. Then, the decomposed signal is converted into 105 concatenated approximation coefficients.
Using these coefficients, the power is calculated by taking the square of the absolute value of each coefficients vector.
Lastly, five features are calculated from the coefficients vector, namely the energy (summation of values), variance, standard deviation, waveform length, and Shannon entropy yielding a total of 30 features per 32-sample segment.

Since there are four-channel PCG recording, features of each channel are concatenated to form a wavelet power transformation of the raw PCG recording with 30 features (6 levels times 5 features) and overall length of 20000 (4 channels time 5000 32-sample segments) to form the deep learning input.

2. Transformer-based neural network

2.1. Model Architecture

The proposed model comprises of four main elements, namely the feature encoder, positional encoder, Transformer unit, and decoder.
The network starts by encoding the features using two sets of one-dimensional (1D) convolutions followed by batch normalization, Gaussian error linear unit (GeLU), and max pooling layers. These sets encode the input data (30x20000) by extracting important features and reducing its dimension to 30x1250.
Next, a positional encoder assigns a unique representation for each sequence using the regular structure of sequential sin and cosine functions before entering the Transformer unit.
Then, in the Transformer, a single Transformer unit and two heads of features with a 20% Dropout rate.
The extracted attention vector obtained from the Transformer passes through the decoder, which includes a set of two fully connected layers followed by two Dropout layers of 5% rate and a rectified linear unit (ReLU).
Lastly, a global average pooling layer is used to pool over the temporal sequence and obtain a single value per vector.

2.2. Training Losses

For the detection of heart murmur, the training data for the detection of heart murmur had a severe imbalance between absent (73.8%), present (19.0%), and unknown (7.2%) classes. A weighted cross-entropy loss is used for training the model.
For the identification of clinical outcome, the model is trained for another time to predict the clinical outcome of patients (abnormal or normal). With 48.4% for abnormal and 51.6% for normal, the same imbalance handling method is used.

2.3. Additional Neural Network

As an additional approach to handle the imbalance in the training dataset, the synthetic minority over-sampling technique (SMOTE) is applied to oversample the small classes in the murmur detection task (present: 1500, unknown: 1500) and both classes in the clinical outcome task (abnormal: 3500, normal: 3500) using a safe-level mechanism.
Next, a simple neural network is trained with the over-sampled classes using the same Transformer network’s training settings.

2.4. Ensemble classification

Four scenarios are used using both the Transformer-based and SMOTE-based neural networks as follows:

Scenario 1: 1 Transformer— 1 SMOTE networks
Scenario 2: 3 Transformers— 3 SMOTE networks
Scenario 3: 5 Transformers— 5 SMOTE networks
Scenario 4: 10 Transformers— 10 SMOTE networks

All scores are averaged using all networks at every scenario to generate the final score for every class for every subject.

3. Results

The accuracy in murmur detection increased steadily from scenario 1 to scenario 4, that is by adding more trained networks in the ensemble (voters).

The highest achieved accuracy was 0.855 and 0.761 for the training and validation datasets, respectively, in murmur detection.
On the other hand, the lowest costs in clinical outcome prediction was 9980 and 11490 in both datasets, respectively.

The lowest clinical outcome prediction score, 9737, is achieved for the validation dataset in one of the entries after reduction in input features (from 30 to 10) in models of scenario 1. The reduction of features was based on the chi-squared (χ2) test. However, it had a low murmur detection accuracy of 0.730.

In the final testing phase (Table 2), Care4MyHeart achieved a murmur detection score of 0.757 and a clinical outcome prediction score of 14410. Care4MyHeart was ranked the 6th/40 and the 29th/40 in both tasks, respectively.

Brief Review — Ensemble Transformer-Based Neural Networks Detect Heart Murmur In Phonocardiogram Recordings

Care4MyHeart, 6th in Murmur Detection

Outline

1. Data Preparation, Feature Extraction

1.1. Data Preparation

1.2. Feature Extraction

2. Transformer-based neural network

2.1. Model Architecture

2.2. Training Losses

2.3. Additional Neural Network

2.4. Ensemble classification

3. Results

Written by Sik-Ho Tsang

Responses (1)