Review — context2vec: Learning Generic Context Embedding with Bidirectional LSTM

Using Bidirectional LSTM Instead of Averaging in Word2Vec

A 2D illustration of context2vec’s embedded space and similarity metrics. Triangles and circles denote sentential context embeddings and target word embeddings, respectively


  1. CBOW in Word2Vec
  2. Bidirectional LSTM in context2vec
  3. Experimental Results

1. CBOW in Word2Vec

CBOW in Word2Vec
  • The context window can be larger, e.g. extend to [-5, 5] to have 5 previous words and 5 future words.
  • Obviously, averaging is not good enough, the weighting should be depending on the contexts around the word.

2. Bidirectional LSTM in context2vec

context2vec architecture
  • The parameters of these two networks are completely separate, including two separate sets of left-to-right and right-to-left context word embeddings.
  • The LSTM output vector representing its left-to-right context (“John”) with the one representing its right-to-left context (“a paper”), are concatenated. With this, the relevant information in the sentential context can be captured:
  • Some details in the network:
context2vec hyperparameters

3. Experimental Results

3.1. MSCC Corpus Development Set

Development set results (iters+ denotes the best model found when running more training iterations with α = 0.75)
  • Training the proposed models with more iterations and it is found that with 3 iterations over the ukWaC corpus and 10 iterations over the MSCC corpus, some further improvement can be obsereved.
Test set results (c2v is context2vec)
  • S-1/S-2 stand for the best/second-best prior result reported for the benchmark. context2vec either surpass or almost reach the state-of-the-art on all benchmarks.

3.2. Others

Top-5 closest target words to a few given target words
Closest target words to various sentential contexts


[2016 CoNLL] [context2vec]
context2vec: Learning Generic Context Embedding with Bidirectional LSTM

Natural Language Processing (NLP)

Language Model: 2007 [Bengio TNN’07] 2013 [Word2Vec] [NCE] [Negative Sampling] 2014 [GloVe] [GRU] [Doc2Vec] 2015 [Skip-Thought] 2016 [GCNN/GLU] [context2vec]
Machine Translation: 2014 [Seq2Seq] [RNN Encoder-Decoder] 2015 [Attention Decoder/RNNSearch] 2016 [GNMT] [ByteNet] [Deep-ED & Deep-Att] 2017 [ConvS2S] [Transformer]
Image Captioning: 2015 [m-RNN] [R-CNN+BRNN] [Show and Tell/NIC] [Show, Attend and Tell]

My Other Previous Paper Readings



PhD, Researcher. I share what I learn. :) Linktree: for Twitter, LinkedIn, etc.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store