Brief Review — Automatic classification of excitation location of snoring sounds

ZCR + MFCC + PCA + SVM on MPSSC

Sik-Ho Tsang
3 min readSep 18, 2024
Happy Mid-Autumn Festival (Image from Pexels: Nataliya Vaitkevich)

Automatic classification of excitation location of snoring sounds
ZCR + MFCC + PCA + SVM
, by Chinese Academy of Sciences, University of Chinese Academy of Sciences, Harvard Medical School, and Nanjing University of Science and Technology
2021 JCSM (Sik-Ho Tsang @ Medium)

Snore Sound Classification
2017
[INTERSPEECH 2017 Challenges: Addressee, Cold & Snoring] 2018 [MPSSC] [AlexNet & VGG-19 for Snore Sound Classification] 2019 [CNN for Snore] 2020 [Snore-GAN]
==== My Healthcare and Medical Related Paper Readings ====
==== My Other Paper Readings Are Also Over Here ====

  • For each snore episode, an acoustic and a physiological feature were extracted and concatenated, forming a 59-dimensional fusion feature.
  • A principal component analysis (PCA) and a support vector machine (SVM) were used for dimensional reduction and snore classification.

Outline

  1. ZCR + MFCC + PCA + SVM
  2. Results

1. ZCR + MFCC + PCA + SVM

Diagram of the VOTE scheme in the upper airway.
  • To have VOTE classification for score sounds using SVM, features are to be extracted from raw signal.
Zero-Crossing Rate (ZCR) and Mel-Frequency Cepstral Coefficients (MFCC)

1.1. Zero-Crossing Rate (ZCR) with Multi-Scale Entropy (MSE)

Zero-Crossing Rate (ZCR)
  • ZCR is defined as the number of time-domain zero-crossings within a defined region of a signal, divided by the number of samples of that region.
  • It represents the smoothness of the signal and is an indicator of the frequency.
  • Multi-Scale Entropy (MSE) is based on the application of approximate entropy or sample entropy for different scales of the same process.

Entropies on 20 scales of ZCR-transformed signals (20 features) are calculated as the measurement of complexity.

1.2. Mel-Frequency Cepstral Coefficients (MFCC)

  • The first 13 coefficients are used by taking into consideration the fact that most of the signal feature is compacted in the first few coefficients owing to the properties of the cosine transform.
  • Furthermore, its delta (first-order difference) and acceleration (second-order difference) features related to the change in the characteristics of snores over time were added.

Finally, a total of 39 features (13 MFCC, 13 Δ, and 13 acceleration) were extracted from each snoring sound.

1.3. Principal Component Analysis (PCA)

Feature Space After PCA
  • PCA to extract the most significant components of features and reduce the computation for dimensionality reduction.
  • It is found that reducing from 20+39=59 features to 3-dimensional subspace is sufficient.

1.4. Support Vector Machine (SVM)

  • SVM is used for the classification of snoring sounds.
  • 80% of the MPSSC dataset is selected randomly for training; the remainder were used for testing.
  • This process was repeated 10 times, and the average accuracy was calculated as the final accuracy.

2. Results

Statistical Results
Confusion Matrix
  • Some statistical results and confusion matrix are shown above.

The unweighted average sensitivity of the proposed approach is 86.36%, which is higher than the prior arts of 67% [18], 66.5% [19], and 74.19% [20].

--

--

Sik-Ho Tsang

PhD, Researcher. I share what I learn. :) Linktree: https://linktr.ee/shtsang for Twitter, LinkedIn, etc.