Brief Review — Universal Sentence Encoder for English

Universal Sentence Encoder (USE) is Proposed

Sik-Ho Tsang
3 min readAug 3, 2024
USE Sentence Embedding (Image from HuggingFace)

Universal Sentence Encoder for English
USE
, by Google AI, & Google
2018 EMNLP, Over 1200 Citations (Sik-Ho Tsang @ Medium)

Sentense Embedding / Dense Text Retrieval
2019
[Sentence-BERT (SBERT)] 2020 [Retrieval-Augmented Generation (RAG)] [Dense Passage Retriever (DPR)] 2021 [Fusion-in-Decoder] [Augmented SBERT (AugSBERT)]
==== My Other Paper Readings Are Also Over Here ====

  • Easy-to-use TensorFlow Hub sentence embedding models are presented.
  • After pretraining the Universal Sentence Encoder (USE), transfer learning can be performed for even better performace.

Outline

  1. Universal Sentence Encoder (USE)
  2. Results

1. Universal Sentence Encoder (USE)

  • Two sentence encoding models are provided: Transformer and Deep Average Network (DAN).

1.1. Transformer

  • The encoder uses attention to compute context aware representations of words in a sentence that take into account both the ordering and identity of other words.
  • The context aware word representations are averaged together to obtain a sentence-level embedding.

1.2. Deep Averaging Network (DAN)

  • The DAN sentence encoding model begins by averaging together word and bi-gram level embeddings.
  • Sentence embeddings are then obtain by passing the averaged representation through a feedforward deep neural network (DNN).
  • The DAN encoder is trained similar to the Transformer encoder.
  • DAN encoder’s compute time is linear in the length.

1.3. Encoder Training Data

  • The sources are Wikipedia, web news, web question-answer pages and discussion forums.
  • Unsupervised learning is augmented with training on supervised data from the Stanford Natural Language Inference (SNLI) corpus.

1.4. Transfer Tasks

USE Sentence Embedding for Classification (Image from HuggingFace)
Transfer Tasks
  • Data used for the transfer learning experiments and word embedding association tests (WEAT):
  1. MR: Movie review sentiment on a five star scale;
  2. CR: Sentiment of customer reviews;
  3. SUBJ: Subjectivity of movie reviews and plot summaries;
  4. MPQA: Phrase opinion polarity from news data;
  5. TREC: Fine grained question classification sourced from TREC;
  6. SST: Binary phrase sentiment classification;
  7. STS Benchmark: Semantic textual similarity (STS) between sentence pairs scored by Pearson r with human judgments;
  8. WEAT: Word pairs from the psychology literature on implicit association tests (IAT).

2. Results

Classification Tasks
  • Additional baseline CNN and DAN models are trained without using any pretrained word or sentence embeddings.
  • Authors also explore combining the sentence and word-level transfer models by concatenating their representations prior to the classification layers.

Using Transformer sentence-level embeddings alone outperforms InferSent on MR, SUBJ, and TREC.

The Transformer sentence encoder also strictly outperforms the DAN encoder.

STS
  • Transformer embeddings outperform the sentence representations produced by InferSent.
STS With Varying Training Data

With small quantities of training data, sentence-level transfer achieves surprisingly good performance. Transfer learning is important when training data is limited.

--

--

Sik-Ho Tsang

PhD, Researcher. I share what I learn. :) Linktree: https://linktr.ee/shtsang for Twitter, LinkedIn, etc.