Brief Review — Towards Domain Invariant Heart Sound Abnormality Detection using Learnable Filterbanks


Sik-Ho Tsang
4 min readMar 9, 2024

Towards Domain Invariant Heart Sound Abnormality Detection using Learnable Filterbanks
, by Bangladesh University of Engineering and Technology (BUET), and Robert Bosch Research and Technology Center (RTC)
2020 JBHI, Over 50 Citations (Sik-Ho Tsang @ Medium)

Heart Sound Classification
2023 [2LSTM+3FC, 3CONV+2FC] [NRC-Net] [Log-MelSpectrum + Modified VGGNet] [CNN+BiGRU] [CWT+MFCC+DWT+CNN+MLP] [LSTM U-Net (LU-Net)]
==== My Other Paper Readings Are Also Over Here ====

  • A novel Convolutional Neural Network (CNN) layer is proposed, consisting of time-convolutional (tConv) units, that emulate Finite Impulse Response (FIR) filters, for heart sound classification.
  • The filter coefficients can be updated via backpropagation and be stacked in the front-end of the network as a learnable filterbank.


  1. tConv Variants
  2. tConv-CNN Model Architecture
  3. Results

1. tConv Variants

  • To be brief, authors argue that CNNs are analogous to FIR filters. Also, symmetry condition is found for a causal generalized linear phase FIR filter.
  • Thus, the proposed tConv units in the front-end enable the pre-processing steps, e.g. spectral decomposition or filterbank analysis, to be supplemented by the first layer of an end-to-end CNN.

1.1. Linear Phase tConv (LP-tConv)

Linear Phase tConv
  • Linear Phase tConv is proposed, which can have 4 types as above.

CNN kernels with weight sharing between coefficients on the two sides of the symmetry. This results in half of the kernel weights being learned and shared.

1.2. Zero Phase tConv (ZP-tConv)

  • A zero phase tConv layer is proposed that has no phase effect on the input signal. If x(n) is the input signal, h(n) is the impulse response of the kernel, and y(n) is the output, we have in the frequency domain:
  • The flip operation in time domain is equivalent to taking the complex conjugate in the frequency domain.

In the implementation of the ZP-tConv unit, two consecutive convolution operations with the same kernel are performed; during the second convolution, the kernel is flipped to equalize the phase response of the first convolution.

1.3. Gammatone tConv

  • The gammatone auditory filterbank is implemented in practice as a series of parallel band-pass filters. It models the tuning frequency at different points of the human basilar membrane [28]. A novel tConv unit is proposed that approximates a gammatone function.
  • The gammatone impulse response is given by:
  • where, g(n), α, η, β, f and φ denote the n-th gammatone coefficient, amplitude, filter order, bandwidth, center frequency and phase of the gammatone wavelet (in radians), respectively.
  • With φ is set to 0, a gammatone tConv has only 4 learnable parameters (α, η, β, f). These 4 parameters can be learnt by backpropagation.

2. tConv-CNN Model Architecture

2.1. Model Architecture

tConv-CNN Model Architecture
  • The frontend of the model is a learnable filterbank, built with four tConv units.
  • Each of the spectral bands decomposed by the learnable filterbank is passed through a separate branch of our CNN architecture.
  • Each branch has two convolutional layers of kernel size 5, followed by a Rectified Linear Unit (ReLU) activation and a max-pooling of 2. Activations are normalized for each training mini-batch prior to ReLU and Dropout with a probability of 0.5
  • The outputs of the four branches are fed to an MLP network after being concatenated along the channels and flattened.
  • Cardiac cycles are extracted from each PCG resampled to 1kHz using the method presented in [29] and zero-padded to be 2.5s in length.
  • The posterior predictions for all of the cardiac cycles are fused for each recording.
  • Cross-entropy loss is optimized.

2.2. Domain Balanced Training

  • With each mini-batch of size B, balanced with an equal number of classes from each PHSDB (PhysioNet) subset.

3. Results

3.1. PHSDB (PhysioNet) Dataset

PHSDB (PhysioNet) Dataset

The proposed methods portray superior performance in all of the metrics compared to the baselines, with a significant improvement in average subset-wise accuracy and Macc.

  • The proposed CNN with a learnable filterbank front-end with linear phase Type IV tConvs, acquired relative improvements of 8% and 11:84% in Macc compared to the Potes-CNN and Gabor-BoAW-SVC (Upsamp.) baselines, respectively.

3.2. HSS Dataset

HSS Dataset

The proposed Gammatone tConv-CNN and Type IV tConv-CNN models provide the best performances in terms of Macc and F1 metrics, respectively.



Sik-Ho Tsang

PhD, Researcher. I share what I learn. :) Linktree: for Twitter, LinkedIn, etc.