Brief Review — Convolutional neural network for screening of obstructive sleep apnea using snoring sounds

Modified VG + Inception for OSAHS

Sik-Ho Tsang
5 min readOct 27, 2024

Convolutional neural network for screening of obstructive sleep apnea using snoring sounds
Modified VG +
Inception, by Hangzhou Dianzi University
2023 Elsevier J. BSPC (Sik-Ho Tsang @ Medium)

Snore Sound Classification
2017
[INTERSPEECH 2017 Challenges: Addressee, Cold & Snoring] 2018 [MPSSC] [AlexNet & VGG-19 for Snore Sound Classification] 2019 [CNN for Snore] 2020 [Snore-GAN] 2021 [ZCR + MFCC + PCA + SVM] [DWT + LDOP + RFINCA + kNN]
==== My Healthcare and Medical Related Paper Readings ====
==== My Other Paper Readings Are Also Over Here ====

  • A database is built including more than 80 thousand of snoring sound episodes from 124 subjects. These sounds are recorded by non-contact microphone in the subject’s private room, and labeled by trained sleep clinicians.
  • And then a modified visibility graph (VG) method is proposed to encode the snoring time series into images. These images are fed into a convolutional neural network (CNN) for classification.

Outline

  1. Data Collection
  2. Modified VG + Inception
  3. Results

1. Data Collection

Through the ongoing study, a total of 124 individuals between 12 and 81 years old including 32 females and 92 males were enrolled from the Affiliated Hospital of Hangzhou Normal University (Zhejiang, China).

  • A high-fidelity acquisition equipment is designed for recording, analysis and transmission of nocturnal sleep respiratory sound, as above. This acquisition equipment is placed on the bedside table at a distance in the range of 20–150 cm. A wave format file at a sampling frequency of 16 kHz is recorded.
  • Synchornously, various physiological signals, including oxygen saturation, sphygmic and respiratory effort, were recorded by portable PSG synchronously at a 10 Hz sampling frequency, and monitored with a portable monitor.
Statistics

These subjects in our experiment finally were diagnosed by a trained sleep specialist according to all measures recorded by PSG and their clinical symptoms. Accordingly, the recorded breathing sounds were labeled as normal and mild, moderate, severe OSAHS, respectively.

  • The average duration of a respiratory sound recorded was 7 h and 57 min.

2. Modified VG + Inception

2.1. Preprocessing

  • A spectral subtraction method is adapted to reduce interferences from some additive noise.

And then, an automatic and unsupervised method [48] merging vertical box algorithm, frequency analysis and fuzzy C-means clustering algorithm is used to segment snoring episode (SE). The overall accuracy of this algorithm was found to be 93.1 % for the snoring sound detection, and we recalibrated manually the identifying results to make sure precision.

  • A typical SE is from 0.3 to 5s in duration. The SEs more than 3 s in duration were cut to a fixed length.

2.2. Modified VG

  • (b) The visibility algorithm [49] is used to convert a time series into a graph under a geometric principle of visibility. The time series is encoded into connections between nodes which represent the time series as a geometric object. Obviously, the converted image is symmetric diagonally and sparse, which takes a loss of computing resource.

(c) Modified VG: An improved VG method is provided in this study in order to achieve high computational efficiency and reduce the storage space. The element value at relative position is defined as shown above.

They were converted into grayscale, padded into a fixed size 100 × 12000 with 0 elements and resized into the same resolution 256 × 256.

  • All preprocessing and analysis were performed using MATLAB R2019b.
  • (Please read the paper directly about VG if interested.)

2.3. Inception

Inception Module

The inception module, shown in Fig. 6, contains three parallel convolution layers and max pooling layer with different kernel; and then a combination of all these layers with their output concatenated into a single output vector forms an input of the next stage.

CNN Built Upon Inception Module

A CNN is then built based on inception module as above.

3. Results

3.1. Train/Val/Test Split

  • There were more than 50 thousand mapping images obtained in the training set including 6087 in normal group, 7838 in mild group, 7893 in moderate group, and 28,478 in severe group.
  • Here, all images in normal group and 6200 ones from other three groups were selected for training the model. During training, they were divided into training set and validation set at a 4:1 ratio.
  • And in the test data set, there were 9,378 SEs for the model evaluation.

3.2. Performance

Evolution of loss and accuracy of training data and validation data

A best accuracy of 92.1 % is obtained during the validation for the screening of OSAHS patients after 100 iterations.

Test Set

The accuracy of 91.5 %, sensitivity of 90.4 %, specificity of 92.1 %, AUC of 0.954, are obtained on the test set.

  • The predictions for snoring sounds of 6 subjects in test data are shown above, where 3 subjects were diagnosed as simple snorers and others as OSAHS patients in clinic. The ratio of OSAHS-related SEs for each subject listed in last column is regarded as a new index for diagnosis of OSAHS.

When 0.5 was chosen as a threshold, an accuracy of 92.5 %, a sensitivity of 93.9 % and a specificity of 91.2 % are obtained for 12 normal people and 28 OSAHS patients.

3.3. Indirect Comparisons

Indirect SOTA Comparisons

Some methods for automatic screening of OSAHS patients using snoring sounds in the past decade were summarized above, which could evaluate objectively the research performance level to some extent.

--

--

Sik-Ho Tsang
Sik-Ho Tsang

Written by Sik-Ho Tsang

PhD, Researcher. I share what I learn. :) Linktree: https://linktr.ee/shtsang for Twitter, LinkedIn, etc.