Review: Neural Machine Translation in Linear Time (ByteNet)

Character-Level Machine Translation Using CNN, Outperforms Character-Level GNMT, Slightly Worse Than Wordpiece-Level GNMT


1. ByteNet Architecture

ByteNet Architecture (All s and t are characters)
Dynamic unfolding in the ByteNet architecture.
Lengths of sentences in characters and their correlation coefficient for the English-to-German WMT NewsTest-2013 validation data

2. Some Details of CNN

Left: Residual block with ReLUs (He et al., 2016) adapted for decoders. Middle & Right: Residual Multiplicative Block adapted for decoders and corresponding expansion of the MU

3. Character-Level Machine Translation Results

BLEU scores on En-De WMT NewsTest 2014 and 2015 test sets

On NewsTest 2014, the ByteNet achieves the highest performance in character-level and subword-level neural machine translation.

Compared to the word-level systems it is second only to the version of GNMT that uses word-pieces.

On NewsTest 2015, ByteNet achieves the best published results to date.

Raw output translations generated from the ByteNet
Magnitude of gradients of the predicted outputs with respect to the source and target inputs



PhD, Researcher. I share what I learn. :) Linktree: for Twitter, LinkedIn, etc.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store