Brief Review — AHI estimation of OSAHS patients based on snoring classification and fusion model
XGBoost + CNN + ResNet
AHI estimation of OSAHS patients based on snoring classification and fusion model,
XGBoost + CNN + ResNet, by South China University of Technology, and Guangzhou Medical University
2023 AMJOTO (Sik-Ho Tsang @ Medium)Snore Sound Classification
2017 … 2020 [Snore-GAN] 2021 [ZCR + MFCC + PCA + SVM] [DWT + LDOP + RFINCA + kNN]
==== My Healthcare and Medical Related Paper Readings ====
==== My Other Paper Readings Are Also Over Here ====
- Three models were used, including acoustic features combined with XGBoost, Mel-spectrum combined with convolution neural network (CNN), and Mel-spectrum combined with residual neural network (ResNet).
- Further, the three models were fused by soft voting to detect these two types of snoring sounds. The subject’s apnea-hypopnea index (AHI) was estimated according to these recognized snoring sounds.
Outline
- Dataset
- XGBoost + CNN + ResNet
- Results
1. Dataset
- A total of 40 subjects from the First Affiliated Hospital of Guangzhou Medical University and Korompili’s study [10] are applied in this study.
- Audios were captured at 44.1 kHz with 16-bit resolution. Sound segments were obtained using the adaptive threshold approach [40].
- After ear-nose-throat (ENT) experts picked up the snoring sound segments, they labeled them as abnormal snoring sounds or simple snoring sounds.
- 30 subjects were applied to train and validate the proposed system. Thes rest 10 were for testing.
- There were 9728 abnormal snoring sounds and 39039 simple snoring sounds obtained from these 30 subjects.
- During the training process, 10000 simple snoring sound episodes were randomly selected from the original dataset to obtain a relatively balanced dataset. The down-sampled dataset was further divided into training set, validation set, and test set by the ratio of 3:1:1.
- The validation set included 1982 simple snoring sounds and 1896 abnormal snoring sounds.
- The test set consisted of 1975 simple snoring sounds and 1806 abnormal snoring sounds.
2. XGBoost + CNN + ResNet
- Three classification models constructed by different features are fused to classify simple snores and abnormal snores including Mel-spectrum combined with CNN structure, Mel-spectrum combined with pre-trained ResNet18 structure, and acoustic features combined with XGBoost classifier.
2.1. XGBoost
- A set of acoustic features is collected and fed into XGBoost for training.
2.2. CNN & ResNet18
- Librosa in the Python library was used to convert audio segments into Mel spectrogram, which is fed into CNN and ResNet18. The number of the Mel filters was 128, the frame length was 20 ms, and the frameshift is 50%. The size is 224 * 224.
- CNN is composed of four convolution layers with a kernel size of 3 × 3.
- ResNet18 is also used.
2.3. AHI Estimate Method
- The classification model will produce a set of 0/1 sequences when fed in snoring sound segments in chronological sequence. The detected abnormal and simple snoring sound was labeled as 1 and 0 respectively.
- Since an apnea-hypopnea event might occur when the model outputs 1, the AHI estimation approach should concentrate on this condition.
- When the model predicts an abnormal snoring sound (labeled as 1), the maximum and minimum values of the possible duration of the abnormal snoring sound are calculated:
- If tmax < 10s, the segments predicted to be abnormal snoring sounds are considered false positives and are ignored;
- Else if tmin < tthr, then there exists an apnea-hypopnea event during the predicted abnormal snoring sound segments;
- Else one or more apnea-hypopnea events may occur in those parts. In this case, tmin ≥ tthr, the number of apnea-hypopnea events can be calculated as follow:
- The AHI of the subject can be obtained by the following equation:
- where Trecord is the whole night record time, and numtotal means the total number of apnea-hypopnea events.
3. Results
CNN achieved the highest result with an accuracy of 81.83%. ResNet18 has the lowest result, with an accuracy of 80.32%, while the artificially designed acoustic features combined with XGBoost classifier have an accuracy of 81.67%.
The soft voting (threshold = 0.6) yields the highest precision and specificity compared with all fused models with values of 81.05% and 81.77% respectively.
- In this experiment, the snoring sounds of the rest 10 subjects who did not participate in the training were selected. The sorted snoring sound segments were first fed into the proposed fusion model to obtain abnormal and simple snoring sounds.
The proposed model could correctly predict the severity of all but one subject in the test set.
- Although these results cannot be directly compared, the above table shows the comparison results.