Review — The OpenNMT Neural Machine Translation Toolkit: 2020 Edition

OpenNMT Website:

Sik-Ho Tsang
4 min readJun 18, 2022
OpenNMT: open source ecosystem for neural machine translation and neural sequence learning

The OpenNMT Neural Machine Translation Toolkit: 2020 Edition
OpenNMT: Neural Machine Translation Toolkit,
OpenNMT: Open-Source Toolkit for Neural Machine Translation, OpenNMT
, by SYSTRAN, Ubiqus, and Harvard SEAS
2020 AMTA, 2018 AMTA, 2017 ACL, Over 20, 90, 1600 Citations Respectively (

@ Medium)
Natural Language Processing, NLP, Neural Machine Translation, NMT

  • OpenNMT is a multi-year open-source ecosystem for neural machine translation (NMT) and natural language generation (NLG).
  • OpenNMT has been used in several production MT systems.
  • This is a paper to introduce OpenNMT toolkit rather than a NMT method.


  1. OpenNMT
  2. Experimental Results

1. OpenNMT
Features implemented by OpenNMT-py (column py) and OpenNMT-tf (column tf)
  • It supports a wide range of model architectures (ConvS2S, GPT-2, Transformer, etc.) and training procedures for neural machine translation as well as related tasks such as natural language generation and language modeling.
  1. OpenNMT-py: A user-friendly and multimodal implementation benefiting from PyTorch ease of use and versatility.
  2. OpenNMT-tf: A modular and stable implementation powered by the TensorFlow 2 ecosystem.
  • OpenNMT was first released in late 2016 as a Torch7 implementation. The original demonstration paper in 2017 was awarded “Best Demonstration Paper Runner-Up” at ACL 2017.
  • After the release of PyTorch, the sunsetting of the Torch7 was initiated.
  • After more than 3 years (2017 to 2020) of active development, OpenNMT projects have been starred by over 7,400 users. A community forum is also home of 970 users and more than 9,800 posts about NMT research and how to use OpenNMT effectively.
Live demo of the OpenNMT system
  • Live demo is also developed.
  • Research: OpenNMT was used for other tasks related to neural machine translation such as summarization, data-to-text, image-to-text, automatic speech recognition and semantic parsing.
  • Production: OpenNMT also proved to be widespread in industry. Companies such as SYSTRAN,, or Ubiqus are known to deploy OpenNMT models in production.
  • Framework: It has been used in many frameworks such as SwissPost and BNP Paribas, while NVIDIA used OpenNMT as a benchmark for the release of TensorRT 6.

2. Experimental Results

2.1. 2020 ATMA Results

Model size and translation speed (target tokens per second) for a base English-German Transformer
  • Dataset: English to German WMT19 task, with the addition of ParaCrawl v5 instead of v3.
  • Tokenization: 40,000 BPE merge operations, learned and applied with Tokenizer.
  • Model: Transformer Medium (12 heads, 768 dmodel size, 3072 dff size).
  • Training: Trained with OpenNMT-py on 6 RTX 2080 Ti, using mixed precision. Initial batch size is around 50,000 tokens, final batch size around 200,000 tokens.
  • Inference: Shown scores are obtained with beam search of size 5 and average length penalty.
OpenNMT system vs. some commercial systems
  • During the WMT19 campaign, the best BLEU score for English to German was 44.9 but the best human evaluated system scored only 42.7 with an ensemble of Big Tranformers.
  • OpenNMT tools allow to reach a superior performance.
OpenNMT English to French model performance on test sets of various domains

2.2. 2018 ATMA Results

Comparison with GNMT on EN→DE. ONMT used 2-layers bi-RNN of 1024, embedding size 512, dropout 0.1 and max length 100
  • OpenNMT is compared with GNMT as well. OpenNMT has similar performance with GNMT.
  • (There are more results in these 3 papers, please feel free to read them directly if interested. OpenNMT was still having updates in 2021.)



Sik-Ho Tsang

PhD, Researcher. I share what I learn. :) Linktree: for Twitter, LinkedIn, etc.