Brief Review — PCTMF-Net: heart sound classification with parallel CNNs-transformer and second-order spectral analysis

CNN + Transformers Using Second-Order Features

Sik-Ho Tsang
4 min readMay 25, 2024

PCTMF-Net: heart sound classification with parallel CNNs-transformer and second-order spectral analysis
PCTMF-Net
, by Macao Polytechnic University, and Shanghai Jiao Tong University
2023 Springer J. Visual Computer (Sik-Ho Tsang @ Medium)

Phonocardiogram (PCG)/Heart Sound Classification
2013 …
2023 … [CTENN] [Bispectrum + ViT] 2024 [MWRS-BFSC + CNN2D]
==== My Other Paper Readings Are Also Over Here ====

  • Second-order spectal features are extracted into a new parallel CNNs-Transformer network with multi-scale feature context aggregation (PCTMF-Net) for heart sound classification.

Outline

  1. PCTMF-Net
  2. Results

1. PCTMF-Net

1.1. Pre-Processing

Overlap Segmentation
  • A second-order 25–400 Hz Butterworth median is used for filtering.
  • Downsampling to 1000 Hz is performed.
  • Each audio file was segmented into 2-second increments with a 50% overlap.

1.2. Second-Order Spectral Analysis

  • It is claimed that MFCC and wavelet are low-order feature extraction methods. Second-Order Spectral Analysis is used in Bispectrum + ViT, which shows better performance. In this paper, Second-Order Spectral Analysis is also used.
  • The second-order Fourier change is shown as:
  • where:
  • The second equation is the third-order accumulation.
  • A two-dimensional feature matrix (256×256×1) can be generated using second-order spectral analysis, and we can also visualize the extracted feature matrix as a contour map (256 × 256 × 3) and a heat map (256 × 256 × 3).

The feature maps obtained by second-order spectral analysis can be well distinguished.

1.3. PCTMF-Net

PCTMF-Net
  • Top 2 branches: Two-way parallel CNNs module is used.
  • Bottom Branch: Multi-Head Transformer Encoder with 4 heads (MHTE-4) module is used. To reduce the computational effort, global pooling is used for the heart sound feature maps and then the features are sampled before sending them to the MHTE-4 in parallel.

Combining CNNs and Transformer is an effective way to explore.

CNNs will extract the most expressive local feature representation at a low computational cost, while the Transformer is used to encode and fuse information from the context, ultimately focusing on the global feature structure.

2. Results

2.1. SOTA Comparions

MFCC as Input
  • Using MFCC, the proposed PCTMF-Net achieves the best results in terms of accuracy, recall, and F1-score.
Second-Order Spectral Features as Inputs
  • Using the second-order spectral analysis, the proposed PCTMF-Net achieves the best results in terms of accuracy, precision, recall, and F1-score.

2.2. Confusion Matrices

4-Class & 2-Class Datasets
  • It predicts the maximum number of correct samples in two tasks.

2.3. t-SNE

Visualization Using t-SNE
  • The class separation shows that the second-order spectral analysis feature extraction and PCTMF-Net have good classification ability on heart tone classification.

--

--

Sik-Ho Tsang

PhD, Researcher. I share what I learn. :) Linktree: https://linktr.ee/shtsang for Twitter, LinkedIn, etc.