Brief Review — Heart sound classification based on bispectrum features and Vision Transformer mode

Bispectrum + ViT

Sik-Ho Tsang
3 min readApr 7, 2024

Heart sound classification based on bispectrum features and Vision Transformer mode
Bispectrum +
ViT, by Chinese Academy of Medical Sciences & Peking Union Medical College, National Health Commission Key Laboratory of Cardiovascular Regeneration Medicine, Chinese Academy of Medical Sciences, Peking Union Medical College
2023 Elsevier AEJ (Sik-Ho Tsang @ Medium)

Phonocardiogram (PCG)/Heart Sound Classification
20132023 [2LSTM+3FC, 3CONV+2FC] [NRC-Net] [Log-MelSpectrum+Modified VGGNet] [CNN+BiGRU] [CWT+MFCC+DWT+CNN+MLP] [LSTM U-Net (LU-Net)] [DL Overview] [MFCC + k-NN / RF / ANN / SVM + Grid Search] [Long-Short Term Features (LSTF)] [WST+1D-CNN and CST+2D-CNN Ensemble]
==== My Other Paper Readings Are Also Over Here ====

  • Bispectrum-inspired feature is extracted and used as input to Vision Transformer (ViT) model for heart sound classification.


  1. Bispectrum + ViT
  2. Results

1. Bispectrum + ViT

1.1. Preprocessing

  • PhysioNet Challenge 2022 dataset is used. Training set and testing set are split at random at a ratio of 8:2. There are 2530 and 633 samples respectively.
  • A second-order Butterworth median-value filter with a frequency ranging from 25 to 400 Hz is used for audio filtering.
  • The signal is downsampled to 1000 Hz.
  • Then, the audio signal is normalized to within the range of [–1,1].
  • After automatic algorithmic segmentation and annotation by cardiac physiologists, high-quality and representative heart sound recording segments have been tagged to form segmentation labels. In this study, the tags suggested by the dataset are used to cut the heart sound recordings so that each sample is one heart sound cycle.

1.2. Bispectrum


Bispectral analysis is a higher-order spectrum analysis for signals and is often used to extract nonlinear information in nonimpairment signals, including heart sound signals and EEG signals [10].

  • It can collect the phase relations of the quadratic phase coupling (QPC) in the heart sound signals and inspect the nonlinear structure of the time series.

The bispectrum can indicate the amplitude distribution features of the heart sound signals as well as restrain the disturbance of Gaussian noise and retain as many effective features as possible.

  • The bispectra of all beats are computed, and the resultant images are subsequently stored as graphic databases, as above.

1.3. ViT


Vision Transformer (ViT) model is used for classifying into normal and abnormal images.

Overall framework for 4 location data classification
  • There are at most four locations for each object in the database with at least one heartbeat data point (PV, TV, AV, MV), which leads to inconsistency of data in different locations.

When predicting the label of an object, the soft label is first obtained by inputting the location data of the object into the model, and then the average of the soft labels is taken as the new result.

2. Results

Confusion Matrices and ROC Curves

The proposed approach exhibits an accuracy of 0.91 and AUC of 0.98 for the integrated data of all patients in the test set.

  • In the test set, a reader test is conducted, where 5 cardiologists evaluate a different test set with images from 135 randomly selected patients.

The overall performance of the model is comparable to that of the cardiologists.



Sik-Ho Tsang

PhD, Researcher. I share what I learn. :) Linktree: for Twitter, LinkedIn, etc.