Brief Review — PCTMF-Net: heart sound classification with parallel CNNs-transformer and second-order spectral analysis
CNN + Transformers Using Second-Order Features
PCTMF-Net: heart sound classification with parallel CNNs-transformer and second-order spectral analysis
PCTMF-Net, by Macao Polytechnic University, and Shanghai Jiao Tong University
2023 Springer J. Visual Computer (Sik-Ho Tsang @ Medium)Phonocardiogram (PCG)/Heart Sound Classification
2013 … 2023 … [CTENN] [Bispectrum + ViT] 2024 [MWRS-BFSC + CNN2D]
==== My Other Paper Readings Are Also Over Here ====
- Second-order spectal features are extracted into a new parallel CNNs-Transformer network with multi-scale feature context aggregation (PCTMF-Net) for heart sound classification.
Outline
- PCTMF-Net
- Results
1. PCTMF-Net
1.1. Pre-Processing
- A second-order 25–400 Hz Butterworth median is used for filtering.
- Downsampling to 1000 Hz is performed.
- Each audio file was segmented into 2-second increments with a 50% overlap.
1.2. Second-Order Spectral Analysis
- It is claimed that MFCC and wavelet are low-order feature extraction methods. Second-Order Spectral Analysis is used in Bispectrum + ViT, which shows better performance. In this paper, Second-Order Spectral Analysis is also used.
- The second-order Fourier change is shown as:
- where:
- The second equation is the third-order accumulation.
- A two-dimensional feature matrix (256×256×1) can be generated using second-order spectral analysis, and we can also visualize the extracted feature matrix as a contour map (256 × 256 × 3) and a heat map (256 × 256 × 3).
The feature maps obtained by second-order spectral analysis can be well distinguished.
1.3. PCTMF-Net
- Top 2 branches: Two-way parallel CNNs module is used.
- Bottom Branch: Multi-Head Transformer Encoder with 4 heads (MHTE-4) module is used. To reduce the computational effort, global pooling is used for the heart sound feature maps and then the features are sampled before sending them to the MHTE-4 in parallel.
Combining CNNs and Transformer is an effective way to explore.
CNNs will extract the most expressive local feature representation at a low computational cost, while the Transformer is used to encode and fuse information from the context, ultimately focusing on the global feature structure.
2. Results
2.1. SOTA Comparions
- Using MFCC, the proposed PCTMF-Net achieves the best results in terms of accuracy, recall, and F1-score.
- Using the second-order spectral analysis, the proposed PCTMF-Net achieves the best results in terms of accuracy, precision, recall, and F1-score.
2.2. Confusion Matrices
- It predicts the maximum number of correct samples in two tasks.
2.3. t-SNE
- The class separation shows that the second-order spectral analysis feature extraction and PCTMF-Net have good classification ability on heart tone classification.