Brief Review — AIOSA: An approach to the automatic identification of obstructive sleep apnea events based on deep learning
AIOSA, A CNN Model on OSASUD Dataset
AIOSA: An approach to the automatic identification of obstructive sleep apnea events based on deep learning
AIOSA, by Udine University Hospital, and University of Udine
2021 Elsevier J. ARTMED, Over 50 Citations (Sik-Ho Tsang @ Medium)OSA
2022 [OSASUD]
==== My Healthcare and Medical Related Paper Readings ====
==== My Other Paper Readings Are Also Over Here ====
- The gold standard test for diagnosing OSAS is polysomnography (PSG). However, it needs expert knowledge and it is time consuming.
- In this paper, a novel convolutional deep learning architecture is proposed to effectively reduce the temporal resolution of raw waveform data, like physiological signals, extracting key features, to identify OSAS cases among acute stroke patients.
Outline
- Datasets
- AIOSA Model
- Results
1. Datasets
1.1. Two Datasets
- The first dataset considered is Physionet’s Apnea-ECG Database. It consists of 70 recordings. The duration of the recordings ranges from roughly 7 to 10 h each. The recordings are segmented into 1 min intervals, each tagged as either normal or apnea by a human scorer.
- The second one is the OSASUD dataset. Data of about 30 patients is collected. Patients underwent simultaneous overnight vital signs and PSG recording.
- Vital signs were collected by a Mindray iMec15 monitor connected to a Mindray Benevision CMS II central monitoring system and, among them, authors considered ECG waveform (II derivation, 80 Hz) and photoplethysmography-derived SpO2 blood oxygen saturation (1 Hz).
- PSG was performed with an Embletta MPR polysomnograph, recording the following channels: thoracic movements, abdominal movements, nasal airflow, blood oxygen saturation, snoring, body position, and movement activity.
- Recordings were analyzed with Embla RemLogic Software by trained sleep medicine physicians in accordance with the American Academy of Sleep Medicine sleep scoring rules [3], and tagged against the presence of central/obstructive/mixed apnea and hypopnea events (which is referred to as anomalies), each identified by its specific time interval. A tag is in 1-second granularity.
1.2. Pre-processing
- As for the Apnea-ECG Database, a Butterworth bandpass filter of order 2, with 5 Hz highpass frequency and 35 Hz lowpass frequency is applied on the ECG waveform signals.
- Then, a 180 seconds-worth of ECG data, that is, the interval of 60 s corresponding to the label, and the intervals of 60 s respectively preceding and following it, was associated with each binary label (general presence or absence of apnea).
- As for the OSASUD dataset, a Butterworth bandpass filter of order 2, with 5 Hz highpass frequency and 35 Hz lowpass frequency is applied on the ECG waveform signals as well.
- 180 seconds-worth of ECG and SpO2 data is used (i.e. the 60-second intervals respectively preceding and following them.)
- As a result, each instance is characterized by 14,400 (180×80) ECG values, 180 SpO2 values, and 60 binary labels, all one-dimensional.
- There are null values for the data, which might be due to malfunctions or sensors disconnection during the night due to patients’ movements. It is decided to keep only the instances in the dataset with at least 50% of non-null SpO2 or ECG values.
1.3. Settings
- Two sets of experiments.
- The first set helped us in the development of the models. In this case, a grid search-based hyperparameter tuning through 10-fold cross-validation on the training instances.
- The second set of experiments aimed at establishing the performance of the proposed models on the real-world Stroke Unit dataset. To perform hyperparameter tuning, the 30 patients were randomly partitioned into 2 disjoint subsets, making sure not to fragment the data belonging to each individual: 23 in the training set and 7 in the validation (tuning) set. The models were ultimately evaluated relying on leave-one-out cross-validation (each test fold corresponding to one of the 30 patients).
2. AIOSA Model
2.1. Backbone
- A 1-D convolutional neural network, which exploits depth-wise separable convolutions with dilation, is used.
- A set of arbitrarily stacked convolutional blocks, each one characterized by a fixed series of operations, repeated four times. ((i) depth-wise separable convolution with dilation; (ii) batch normalization; (iii) ReLU activation function, and (iv) spatial Dropout.)
- Between a series and the next one, a skip connection is employed.
- An average pooling operation is applied to reduce the size of the data.
2.2. Head
- After the convolutional blocks, data can be transformed in different ways according to the desired neural network architecture.
- As an example, if dense layers are to be put after the convolutional blocks, data can pass through a 1 × 1 convolution (Fig. 6b). In this case, starting from the 14,400 input ECG values of the Stroke Unit dataset, we end up with 180 values (due to the chosen pooling sizes), which intuitively encode the condensed 1-second granularity representation of the original information.
- Another possibility (Fig. 6c) is that of stacking an LSTM directly on the convolutional output, which, in this case, can be seen as a multivariate, 180-second long time series, and then considering the last output of the sequence.
- In both variants, the depicted neural networks generate, as the final outcome, 60 values, each representing a nonthresholded score related to the likelihood of having an apnea in each considered second.
- It is worth pointing out that there is no activation function in the output layer, because of the choice of weighted squared hinge loss function: an anomaly will be considered as present when the output is greater than 0, absent otherwise. This is the setting employed for OSASUD dataset.
- In the case of Apnea-ECG, just a single score is returned, due to the 1-minute granularity classification task.
- To train the model, Adam optimizer, one-cycle learning rate scheduler, and gradient clipping are exploited.
3. Results
3.1. Apnea-ECG Dataset
- Two architectural variants are considered that differ from each other in their final components: the first one is based on 1 × 1 convolution layers followed by dense layers (Fig. 6b); the other one makes use of a 16-features LSTM (Fig. 6c), as the size of the output of the convolutional blocks is 180 × 16. Of course, in this case, there is a single output neuron, instead of 60.
- Another difference with respect to the depicted architectures is that the pooling values are 4–5–5, instead of 4–4–5, as in the dataset ECG information is sampled at 100 Hz, instead of 80 Hz.
- The model was developed taking into account the 35 training set instances, and then evaluated on the official 35 test set instances.
Both the proposed models vastly outperform previous proposals, with the overall best results provided by the LSTM-based variant, considering both per-segment and per-patient results.
More precisely, as for per-segment results, an accuracy improvement of 8.3% and 4.7% was obtained with respect to, respectively, the average and best performance of the considered state-of-the-art approaches.
As for the OSA/non-OSA patient detection, the model provides a perfect classification result.
3.2. Stroke Unit Dataset (OSASUD)
- The evaluation on the Stroke Unit dataset is based on leave-one-out cross-validation, where each test fold corresponds to one of the 30 patients.
- Two variants are considered. The first one, depicted in Fig. 6c, makes use of just ECG data; the second one, shown in Fig. 6d, also relies on the SpO2 signal.
- For the sake of comparison, a classic 1-D ResNet was tested on ECG data, and a vanilla Bidirectional LSTM was applied to SpO2 (both of them were tuned according to the same training/validation split as of the proposed models).
Overall, the best figures were provided by the proposed CNN + LSTM architecture, which excels when combined with SpO2, achieving an improvement of 57.7%, 17.2%, and 10% compared to respectively ResNet, LSTM, and CNN + LSTM itself when applied on only ECG data.
- Although patient 15 has an F1 of 0.446 and gets classified as moderate, instead of severe, the model is still able of identifying clusters of anomalies that may be relevant for the clinical decision-making process.
- Patient 19, despite showing an even lower (0.249) F1 score, is correctly classified, with a good approximation of his/her AHI and anomalies distribution.
- Finally, Patient 26, who is correctly classified as well, has a high (0.701) F1 score.
- A law of diminishing returns seems to apply, after 0.8.