Brief Review — Adaptive Mixtures of Local Experts

MoE, By Hinton in 1991. Hinton Extended to NLP in 2017

  • By using gating network, different experts are on/off based on the input signal.
  • This is a paper by Prof. Hinton’s research group. They extended this idea to NLP in 2017 MoE.

Outline

  1. Prior Art
  2. Proposed MoE
  3. Results

1. Prior Art

  • A prior art before this work proposes a linear combinations of the local experts:
  • Linear combinations may not be a valid solution to solve a complex problem.

2. Proposed MoE

Adaptive Mixtures of Local Experts
  • Instead of using linear combination of experts, a gating function p is used:
  • A variant of MoE is also proposed below, which obtains better performance:

3. Results

  • A multi-speaker vowel recognition task is used for evaluation.
Data of vowel discrimination problem, and experts and gating network decision lines
Performance of vowel discrimination task

Reference

[1991 JNEUCOM] [MoE]
Adaptive Mixtures of Local Experts

4.1. Language Model / Sequence Model

(It is not related to NLP, but I just want to centralize/group them here)

My Other Previous Paper Readings

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store