Sik-Ho Tsang

Oct 16, 2021

7 min read

Review — Neural Machine Translation by Jointly Learning to Align and Translate

Using Attention Decoder, Automatically Search for Part of Source Sentence at Encoder for Machine Translation

Attention Decoder/RNNSearch (Figure from


1. Proposed Architecture Using Attention Decoder

Proposed Architecture Using Attention Decoder (Top: Decoder, Bottom: Encoder)

2. Encoder: Bidirectional RNN (BiRNN)

Encoder: Bidirectional RNN (BiRNN)

3. Decoder: Attention Decoder

Decoder: Attention Decoder

4. Experimental Results

BLEU scores of the trained models computed on the test set (RNNsearch-50* was trained much longer)

More importantly, the performance of the RNNsearch is as high as that of the conventional phrase-based translation system (Moses) when only the sentences consisting of known words are considered. This is a significant achievement, considering that Moses uses a separate monolingual corpus (418M words) in addition to the parallel corpora, RNNsearch and RNNencdec uses much smaller corpus.

The BLEU scores of the generated translations on the test set with respect to the lengths of the sentences

RNNsearch-50, especially, shows no performance deterioration even with sentences of length 50 or more.

Four sample alignments found by RNNsearch-50 (a) an arbitrary sentence. (b–d) three randomly selected samples among the sentences without any unknown words and of length between 10 and 20 words from the test set.
Input Long Sentences
Output by RNNencdec-50
Output by RNNsearch-50