Review — Google’s Neural Machine Translation System: Bridging the Gap between Human and Machine Translation (GNMT)

GNMT, Wordpiece Model, Using Deep LSTM With Residual Connections


1. GNMT Network Architecture

GNMT: Network Architecture

1.1. Encoder

The structure of bi-directional connections in the first layer of the encoder

1.2. Attention

1.3. Decoder

1.4. Residual Connections

Left: Normal Stacked LSTM, Right: Stacked LSTM with Residual Connections

2. Model Parallelism

3. Wordpiece Model or Mixed Word/Character Model

4. Model Training

5. Quantizable Model and Quantized Inference

Log perplexity vs. steps
Model inference on CPU, GPU and TPU

6. Experimental Results

6.1. ML Training Models

Single model results on WMT En > Fr (newstest2014)
Single model results on WMT En > De (newstest2014)

6.2. RL Training Models

Single model test BLEU scores, averaged over 8 runs

6.3. Model Ensemble and Human Evaluation

Model ensemble results on WMT En > Fr (newstest2014)
Model ensemble results on WMT En > De (newstest2014)
Human side-by-side evaluation scores of WMT En > Fr models

6.4. Results on Production Data

Histogram of side-by-side scores on 500 sampled sentences from Wikipedia and news websites for a typical language pair, here English > Spanish (PBMT blue, GNMT red, Human orange)
Mean of side-by-side scores on production data


Natural Language Processing

My Other Previous Paper Readings

PhD, Researcher. I share what I've learnt and done. :) My LinkedIn:, My Paper Reading List: