Brief Review — Adaptive Mixtures of Local Experts

MoE, By Hinton in 1991. Hinton Extended to NLP in 2017

Sik-Ho Tsang
3 min readSep 25, 2022

Adaptive Mixtures of Local Experts
MoE, by MIT, and University of Toronto,
1991 JNEUCOM, Over 5000 Citations (Sik-Ho Tsang @ Medium)
Mixture of Experts, MoE, Sequence Model, Vowel Recognition

  • By using gating network, different experts are on/off based on the input signal.
  • This is a paper by Prof. Hinton’s research group. They extended this idea to NLP in 2017 MoE.

Outline

  1. Prior Art
  2. Proposed MoE
  3. Results

1. Prior Art

  • A prior art before this work proposes a linear combinations of the local experts:
  • Linear combinations may not be a valid solution to solve a complex problem.

2. Proposed MoE

Adaptive Mixtures of Local Experts
  • Instead of using linear combination of experts, a gating function p is used:

Depending on the input signal, different experts are on/off by the gating function. Thus, each expert is focus on its particular signal pattern.

  • A variant of MoE is also proposed below, which obtains better performance:

3. Results

  • A multi-speaker vowel recognition task is used for evaluation.
Data of vowel discrimination problem, and experts and gating network decision lines

Different experts learn to concentrate on one pair of classes or the other.

Performance of vowel discrimination task

The number of epochs for proposed MoE methods (4 Experts and 8 Experts) is much smaller to achieve the same accuracy.

This MoE concept has been further developed as MoE in 2017. Inspired by MoE, Vision MoE (V-MoE) for image classification is appeared in 2021.

Reference

[1991 JNEUCOM] [MoE]
Adaptive Mixtures of Local Experts

4.1. Language Model / Sequence Model

(It is not related to NLP, but I just want to centralize/group them here)

1991 [MoE] … 2020 [ALBERT] [GPT-3] [T5] [Pre-LN Transformer] [MobileBERT] [TinyBERT]

My Other Previous Paper Readings

--

--

Sik-Ho Tsang

PhD, Researcher. I share what I learn. :) Linktree: https://linktr.ee/shtsang for Twitter, LinkedIn, etc.