Brief Review — An End-to-End Deep Learning Framework for Real-Time Denoising of Heart Sounds for Cardiac Disease Detection in Unseen Noise

LSTM U-Net (LU-Net)

4 min readJan 27, 2024

**A graphical overview of the end-to-end denoising workflow.**

An End-to-End Deep Learning Framework for Real-Time Denoising of Heart Sounds for Cardiac Disease Detection in Unseen Noise
LSTM U-Net (LU-Net), by Bangladesh University of Engineering and Technology (BUET), National Heart Foundation Hospital and Research Institute, Qatar University, Johns Hopkins University
2023 ACCESS (Sik-Ho Tsang @ Medium)
Heart Sound Classification
2013 … 2023 [2LSTM+3FC, 3CONV+2FC] [NRC-Net] [Log-MelSpectrum+Modified VGGNet] [CNN+BiGRU] [CWT+MFCC+DWT+CNN+MLP]
==== My Other Paper Readings Are Also Over Here ====

A novel deep encoder-decoder-based denoising architecture (LSTM U-Net, LU-Net) to suppress ambient and internal lung sound noises.
Training is done using a large benchmark PCG dataset mixed with physiological noise, i.e., breathing sounds.
Two different noisy datasets were prepared for experimental evaluation by mixing unseen lung sounds and hospital ambient noises with the clean heart sound recordings.
Authors also used the inherently noisy portion of the PASCAL heart sound dataset for evaluation.

Outline

LSTM U-Net (LU-Net)
Dataset Preparation
Results

1. LSTM U-Net (LU-Net)

1.1. Problem Formulation

(It is assumed that Bi-LSTM and U-Net are familiar aleady.)

A noise-free PCG signal x is corrupted with several irrelevant components n coming from the environment or system to form a noisy PCG signal y:

LU-Net, F(), is used to denoise y to obtain ^x, which should be close to x:

Thus, Mean Square Error (MSE) is used to train the network.

With denoised heart sound, classification performance should also be improved.

1.1. Model Architecture

**LSTM U-Net (LU-Net) Model Architecture**

The proposed network is a convolutional encoder-decoder-based architecture with bi-directional long short term memory (Bi-LSTM) modules in the skip connections.

Encoder Path: 1D convolution layers with ReLU are used. Last
Encoder_i=2–5 contain convolution layer with a stride of 2, they successively create lower dimensional representation.
Decoder Path: The Decoder_i consists of a 1D convolution layer followed by a ReLU non-linearity activation and an UpSampling1D layer.
Finally, the output from Decoder_1 is passed through a convolution layer, where Cout = 1 which provides the corresponding denoised output sequence, yˆt.
At skip connection, the Bi-LSTM module is used as it can internally concatenate the forward and backward vectors to a single vector to learn the long-term dependencies with fewer parameters.

2. Dataset Preparation

(Please read the paper directly for the detailed dataset preparation and experimental setup. It covers many pages for this section.)

2.1. PhysioNet

This dataset provides signal with the presence of several noises (e.g., breathing, stethoscope movement, intestinal activity, peripheral talking.)

2.2. PASCAL

In the training set of Dataset-B, there are sub-directories containing noisy data of normal (120) and murmur (29).

2.3. Open-Access Heart Sound (OAHS) Dataset (Yaseen GitHub Dataset)

It provides publicly available noise-free PCG dataset containing a total number of 1000 recordings.

2.4. ICBHI 2017 Dataset

The largest publicly available respiratory sound database [48].

2.5. Hospital Ambient Noise (HAN) Dataset

A non-copyrighted YouTube video of 68 minutes where the audio occurrences were recorded from different places (corridor, waiting room, etc.) of a busy hospital.

2.6. Training Data Preparation

PhysioNet dataset is used.
Lung sounds from the ICBHI 2017 dataset as the noise source to create synthetic noisy PCG recordings.

2.7. Test Data Preparation

The relatively clean OAHS dataset recordings are mixed with lung sound and hospital ambient noise to generate two synthetic noisy test sets, OAHS-LS and OAHS-HAN.
To represent the real-world test scenario, the noisy recordings of the PASCAL dataset are used.
For classification, OAHS dataset is split into 3 distinct sets: training, validation, and test, with a ratio of 70 : 10 : 20. The test portion has been mixed with lung sound and hospital ambient noise to generate the test OAHS-LS and OAHS-HAN datasets, respectively.