Brief Review — An End-to-End Deep Learning Framework for Real-Time Denoising of Heart Sounds for Cardiac Disease Detection in Unseen Noise

LSTM U-Net (LU-Net)

Sik-Ho Tsang
4 min readJan 27, 2024
A graphical overview of the end-to-end denoising workflow.

An End-to-End Deep Learning Framework for Real-Time Denoising of Heart Sounds for Cardiac Disease Detection in Unseen Noise
LSTM U-Net (LU-Net)
, by Bangladesh University of Engineering and Technology (BUET), National Heart Foundation Hospital and Research Institute, Qatar University, Johns Hopkins University
2023 ACCESS (

@ Medium)

Heart Sound Classification
2023 [2LSTM+3FC, 3CONV+2FC] [NRC-Net] [Log-MelSpectrum+Modified VGGNet] [CNN+BiGRU] [CWT+MFCC+DWT+CNN+MLP]
==== My Other Paper Readings Are Also Over Here ====

  • A novel deep encoder-decoder-based denoising architecture (LSTM U-Net, LU-Net) to suppress ambient and internal lung sound noises.
  • Training is done using a large benchmark PCG dataset mixed with physiological noise, i.e., breathing sounds.
  • Two different noisy datasets were prepared for experimental evaluation by mixing unseen lung sounds and hospital ambient noises with the clean heart sound recordings.
  • Authors also used the inherently noisy portion of the PASCAL heart sound dataset for evaluation.


  1. LSTM U-Net (LU-Net)
  2. Dataset Preparation
  3. Results

1. LSTM U-Net (LU-Net)

1.1. Problem Formulation

A noise-free PCG signal x is corrupted with several irrelevant components n coming from the environment or system to form a noisy PCG signal y:

  • LU-Net, F(), is used to denoise y to obtain ^x, which should be close to x:
  • Thus, Mean Square Error (MSE) is used to train the network.

With denoised heart sound, classification performance should also be improved.

1.1. Model Architecture

LSTM U-Net (LU-Net) Model Architecture

The proposed network is a convolutional encoder-decoder-based architecture with bi-directional long short term memory (Bi-LSTM) modules in the skip connections.

  • Encoder Path: 1D convolution layers with ReLU are used. Last
  • Encoder_i=2–5 contain convolution layer with a stride of 2, they successively create lower dimensional representation.
  • Decoder Path: The Decoder_i consists of a 1D convolution layer followed by a ReLU non-linearity activation and an UpSampling1D layer.
  • Finally, the output from Decoder_1 is passed through a convolution layer, where Cout = 1 which provides the corresponding denoised output sequence, yˆt.
  • At skip connection, the Bi-LSTM module is used as it can internally concatenate the forward and backward vectors to a single vector to learn the long-term dependencies with fewer parameters.

2. Dataset Preparation

  • (Please read the paper directly for the detailed dataset preparation and experimental setup. It covers many pages for this section.)

2.1. PhysioNet

  • This dataset provides signal with the presence of several noises (e.g., breathing, stethoscope movement, intestinal activity, peripheral talking.)


  • In the training set of Dataset-B, there are sub-directories containing noisy data of normal (120) and murmur (29).

2.3. Open-Access Heart Sound (OAHS) Dataset (Yaseen GitHub Dataset)

  • It provides publicly available noise-free PCG dataset containing a total number of 1000 recordings.

2.4. ICBHI 2017 Dataset

  • The largest publicly available respiratory sound database [48].

2.5. Hospital Ambient Noise (HAN) Dataset

  • A non-copyrighted YouTube video of 68 minutes where the audio occurrences were recorded from different places (corridor, waiting room, etc.) of a busy hospital.

2.6. Training Data Preparation

  • PhysioNet dataset is used.
  • Lung sounds from the ICBHI 2017 dataset as the noise source to create synthetic noisy PCG recordings.

2.7. Test Data Preparation

  • The relatively clean OAHS dataset recordings are mixed with lung sound and hospital ambient noise to generate two synthetic noisy test sets, OAHS-LS and OAHS-HAN.
  • To represent the real-world test scenario, the noisy recordings of the PASCAL dataset are used.
  • For classification, OAHS dataset is split into 3 distinct sets: training, validation, and test, with a ratio of 70 : 10 : 20. The test portion has been mixed with lung sound and hospital ambient noise to generate the test OAHS-LS and OAHS-HAN datasets, respectively.

3. Results

3.1. Denoising Performance

Denoising Performance
Denoising Performance

LU-Net consistently outperforms FCN and U-Net across all evaluated metrics.

3.2. Classification Performance

Classification Performance

The proposed LU-Net improves the estimated SNR by 6.517 dB, which is 26.175% and 2.725% superior relative to U-Net and FCN, respectively.

3.3. Visualization


The superiority of LU-Net over the baselines can be visually observed in Fig. 7 and 8 above.



Sik-Ho Tsang

PhD, Researcher. I share what I learn. :) Linktree: for Twitter, LinkedIn, etc.