Brief Review — Battling with the low‑resource condition for snore sound recognition: introducing a meta‑learning strategy

Model-Agnostic Meta-Learning (MAML) on Snore Sound Classification

Sik-Ho Tsang
4 min readOct 12, 2024
A diagram of the upper airway showing the location where VOTE snoring is triggered

Battling with the low‑resource condition for snore sound recognition: introducing a meta‑learning strategy
Snore Sound MAML
, by Ministry of Education (Beijing Institute of Technology), Beijing Institute of Technology, The University of Tokyo, Imperial College London, University of Augsburg
2023 EURASIP Journal on Audio, Speech, and Music Processing (Sik-Ho Tsang @ Medium)

Snore Sound Classification
2017
[INTERSPEECH 2017 Challenges: Addressee, Cold & Snoring] 2018 [MPSSC] [AlexNet & VGG-19 for Snore Sound Classification] 2019 [CNN for Snore] 2020 [Snore-GAN]
==== My Healthcare and Medical Related Paper Readings ====
==== My Other Paper Readings Are Also Over Here ====

  • MPSSC is a small, class imbalance snore sound dataset.
  • In view of this, Model-Agnostic Meta-Learning (MAML), a small sample method based on meta-learning, is used to classify snore signals with less resources.
  • During meta-training, ESC-50 dataset is used without the use of MPSSC. MPSSC is only used during testing phase.

Outline

  1. Model-Agnostic Meta-Learning (MAML)
  2. Results

1. Model-Agnostic Meta-Learning (MAML)

1.1. Dataset

Training Data for MAML Setting
  • In MAML, only 36 snoring samples from the development portion of MPSSC are used as the fine-tuning support data during the meta-testing while the remaining 529 snoring samples (including training and development) are not used.
  • Test portion of the MPSSC dataset with the original split is used to test the model in this work.
  • ESC-50 is encompasses 2000 environmental sound recordings that have been labelled with corresponding tags. Each recording has a duration of 5 s and can be assigned to one of 50 distinct semantic classes, with 40 exemplary instances per class [34].
  • For comparison, MiniImageNet is also used.

1.2. Feature Extraction

Mel spectrogram for VOTE
  • Mel spectrograms of size 84×84×3, is extracted as input features into CNN, with the removal of the upper segment of the spectrogram image beyond the 10,000 Hz threshold.

1.3. CNN

CNN
  • A 4-layer CNN is used, as shown above.

1.4. Model‑Agnostic Meta‑Learning (MAML)

MAML Framework

MAML divides the train set and test set into N-way, K-shot, and Q-query problems. This indicates that N categories are randomly selected from the data set each time, and K + Q samples are selected for each category as one task. That means, each task contains N × (K + Q) sampled data [25].

  • For each task i, the parameters are updated using K support images.
  • where LTi denotes the loss obtained on the support set of task i.
  • Then, the model is tested on Q query images and the loss i is obtained for this task. The batch of loss i is sum up to obtain the loss.
  • The above process is repeated until completion.

By training and adjusting model parameters on one task distribution within a given dataset, the MAML algorithm enables the resultant model to quickly adapt to new tasks through one or a few updates on the support set. This also means that the MAML algorithm can adapt to different new learning tasks with greater universality and robustness.

  • On MPSSC, N = 4 as it has 4 classes. K = 5 and Q = 9 by taking consideration on the number of samples in T category.
  • Therefore, for each training task, 14 images are randomly selected from each of the 4 selected categories, in total 56 images within each task distribution.
  • During the meta-testing phase, a total of 36 snoring samples (4 categories * 9 samples each) of snoring sounds are used as as the support set to fine-tune the meta-trained model.
  • 64 classes from MiniImageNet that are unrelated to snore sound are used as the meta-training data, with 16 classes as meta-validation.
  • For ESC-50 dataset, 35 classes are used as meta-training, and 15 classes are used as meta-validation.
  • MPSSC test data was used for meta-testing.

During the meta-testing, the 36 snoring samples from the development set of the original MPSSC dataset are applied to the support set of the new snoring classification task and the entire test set of the MPSSC dataset’s original split is used for prediction on the query set.

2. Results

Confusion matrix Using ESC-50
UAR for Different Meta-Training Dataset

Using the ESC-50 sound dataset’s mel spectrogram as training achieved a UAR of 60.2 % on the test set, surpassed the MPSSC baseline using only 36 instances of non-test snoring data.

  • MAML still learns some features from the MiniImageNet data set and achieves 41.2 % UAR.

--

--

Sik-Ho Tsang

PhD, Researcher. I share what I learn. :) Linktree: https://linktr.ee/shtsang for Twitter, LinkedIn, etc.