Review — GloVe: Global Vectors for Word Representation

Using global corpus statistics for learning word representation, outperforms CBOW in Word2Vec

(Image from https://unsplash.com/photos/bqzLehtF8XE)
  • Instead of local information, global corpus statistics is utilized for learning word representation.

Outline

  1. The Statistics of Word Occurrences in a Corpus
  2. GloVe: Global Vectors
  3. Word Analogy Task Results

1. The Statistics of Word Occurrences in a Corpus

  • Let the matrix of word-word co-occurrence counts be denoted by X, whose entries Xij tabulate the number of times word j occurs in the context of word i.
  • Let Xi = Σk Xik be the number of times any word appears in the context of word i.
  • Finally, let Pij=P(j|i)=Xij/Xi be the probability that word j appear in the context of word i.
Co-occurrence probabilities for target words ice and steam with selected context words from a 6 billion token corpus
  • Let i=ice, and j=steam.
  • For words k related to ice but not steam, say k=solid, we expect the ratio Pik/Pjk will be large.
  • Similarly, for words k related to steam but not ice, say k=gas, the ratio should be small.
  • For words k like water or fashion, that are either related to both ice and steam, or to neither, the ratio should be close to one.
  • Noting that the ratio Pik/Pjk depends on three words i, j, and k:
  • where w is word vector, ~w is separate context word vector.
  • One of the designs for F could be neural network.

2. GloVe: Global Vectors

  • The cost function is to be minimized in order to find w:
  • where V is the size of the vocabulary.
  • If Xij=0, log Xij is undefined. f(Xij) is added such that f(Xij)=0 when Xij=0:
  • Further, bias is added for completeness:
  • GloVe is essentially a log-bilinear model with a weighted least-squares objective.

3. Word Analogy Task Results

Accuracy (%) on Word Analogy Task
  • The word analogy task consists of questions like, “a is to b as c is to ?” The dataset contains 19,544 such questions, divided into a semantic subset and a syntactic subset.
  • The question “a is to b as c is to ?” by finding the word d whose representation wd is closest to wb-wa+wc according to the cosine similarity.
man — woman
company — ceo
city — zip code
comparative — superlative
  • The above visualized some examples.

References

Natural Language Processing (NLP)

My Other Previous Paper Readings

--

--

PhD, Researcher. I share what I learn. :) Linktree: https://linktr.ee/shtsang for Twitter, LinkedIn, etc.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store