Brief Review — Snore-GANs: Improving Automatic Snore Sound Classification with Synthesized Data

Snore-GANs to Synthesize More Diverse Data

Sik-Ho Tsang
6 min readSep 14, 2024
Snore-GAN to Synthesize More Data for Model Training

Snore-GANs: Improving Automatic Snore Sound Classification with Synthesized Data
Snore-GAN
, by Imperial College London, University of Augsburg, The University of Tokyo, Technical University of Munich, Lanzhou University, and audEERING GmbH
2020 JBHI, Over 50 Citations (Sik-Ho Tsang @ Medium)

Snore Sound Classification
2017
[InterSpeech 2017 Challenges: Addressee, Cold & Snoring] 2018 [MPSSC] [AlexNet & VGG-19 for Snore Sound Classification] 2019 [CNN for Snore]
==== My Healthcare and Medical Related Paper Readings ====
==== My Other Paper Readings Are Also Over Here ====

  • A novel GAN-based data augmentation approach is proposed based on semi-supervised conditional Generative Adversarial Networks (scGANs).
  • An ensemble strategy is introduced to enhance the diversity of the generated data.

Outline

  1. Snore-GAN
  2. Dataset & Model
  3. Results

1. Snore-GAN

1.1. scGAN

semi-supervised conditional Generative Adversarial Network (scGAN)
  • [38] proposed scGAN where the label is used as condition c which is an auxiliary information to control the output of the generator, similar to cGAN.
  • And the discriminator D classifies the input into K+1 classes, where K is the number of classes of a classification task. Real samples are supposed to be classified into the first K classes and the generated samples into the K+1-th class (i. e., fake).
  • Therefore, generator G aims to generate data that is realistic enough so that they are classified into any of the first K classes by D.
  • The generator G targets at maximizing the log-likelihood that it assigns to the correct classes:
  • whilst the discriminator D aims to maximize the following log-likelihood:
  • where k is among the first K classes, ^x=Gθg(z|c), and pc(z) indicates the latent random distribution relating to the conditional information c.

1.2. Proposed Dynamic Alternation

  • The adversarial training process, however, suffers from two major issues: training instability and mode collapse.

Dynamic alternation is proposed, i.e. the training epochs between the generator G and the discriminator D are dynamically alternated, in contrast to the conventional approaches which often fix the training epochs for both (fixed alternation), so as to avoid the training instability.

  • Mathematically, a loss threshold function for G and D is respectively defined with:
  • where Λ, b, and c are the hyper-parameters which control the threshold together at the i-th training iteration.
  • They are set to 0.95, 0, and 0.7 for G, and 0.95, 1.0, and 1.0 for G.

Once the training loss from G is below a pre-defined loss LTG, the training process is altered to D. Similarly, once the training loss from D is below another pre-defined loss LTD, the training process is altered to G.

1.3. Proposed Ensemble of scGAN

Ensemble of scGAN
  • Another issue is the mode collapse, which indicates that the generated samples have integrated into a small subset of similar samples (partial collapse), or even a single sample (complete collapse). In this case, the G exhibits very limited diversity amongst generated samples, thus reducing the usefulness of GANs.
  • A standard ensemble approach in [42] is used.

A set of scGANs is trained.

  • These scGANs are with different network structures (i. e., a different number of hidden nodes per layer in our experiments) in order to maximally explore their differences, and trained independently.

When conducting data augmentation, the data is aggregated from all scGANs, and randomly selected from the pool which are then further merged into the original training set. By doing this, it is expected to expand the data diversity.

1.4. Proposed Sequence Generation

Sequence Generation
  • Most available GANs were particularly designed to generate standalone samples (e. g., images).
  • A novel approach is proposed to generate sequential samples by means of the GANs equipped with Recurrent Neural Networks (RNNs) with Gated Recurrent Units (GRUs). The generation process is indeed inspired by the Seq2Seq modeling.
  • As to the discriminator D, we further utilize another GRU-RNN to distinguish the generated sequences from the real ones.

1.5. Implementation

  • The generator and discriminator used the same network structure, with two hidden layers and N=60 nodes per hidden layer.
  • As to the discriminator, an additional dense layer is appended and a softmax activation function for pattern classification.

2. Dataset & Model

2.1. Dataset

MPSSC

2.2. Features

  • Three different kinds of acoustic feature sets are chosen at either the frame level (i. e., low-level descriptor) or the segment level (i. e., functional or Bag-of-Audio-Words).

2.3. Model

  • SVM is used, which are i) functional-based features with SVMs (functionals + SVMs), ii) BoAW-based features with SVMs (BoAWs + SVMs), and iii) sequential LLDs with GRU-RNNs (LLDs + GRU-RNNs), respectively.

3. Results

Dynamic Alternation

The curves are shown to be much smoother when using the dynamic alternation training strategy.

UAR Against Number of Synthesized Data
  • In the case of the ‘functionals + SVMs’ system, the obtained UAR remarkably boosts from 45.3% to 47.8% when adding 50 synthesized samples per class, and dramatically to 52.7% when adding 250 synthesized samples per class.
  • Notable gain can also be observed for the ‘BoAWs + SVMs’ system (i. e., from 41.4% to 45.9% UAR).
  • For the ‘LLDs + GRU-RNNs’ system, a moderate improvement could be found (i. e., from 65.7% to 66.7% UAR).

This tells that a scGAN-based data augmentation approach can indeed improve the performance of the systems when dealing with sparse data.

  • Also, the ensemble of scGANs (i. e., ensemble) outperforms the mono-scGAN (i. e., net-60; dotted green curves) for data augmentation.
LSTM vs GRU

GRU-RNNs are competitive to the LSTM-RNNs, but with fewer parameters to be trained.

Comparisons with SMOTE and Baseline
  • SMOTE approach is merely competitive to the baseline.

Obviously, the results indicate that the scGAN-based data augmentation promotes the baseline systems without any data augmentation in the most scenarios.

SOTA Comparisons

The best achieved results by Snore-GAN are competitive with, or even superior to, most of the other state-of-the-art systems.

t-SNE Visualization

Compared with the mono-scGAN, the ensemble of scGANs is capable of generating more diverse data that better reflect the original data distribution in all V, O, T, and E cases.

--

--

Sik-Ho Tsang

PhD, Researcher. I share what I learn. :) Linktree: https://linktr.ee/shtsang for Twitter, LinkedIn, etc.