Brief Review — Universal Sentence Encoder for English

Universal Sentence Encoder (USE) is Proposed

3 min readAug 3, 2024

**USE Sentence Embedding** (Image from HuggingFace)

Universal Sentence Encoder for English
USE, by Google AI, & Google
2018 EMNLP, Over 1200 Citations (Sik-Ho Tsang @ Medium)
Sentense Embedding / Dense Text Retrieval
2019 [Sentence-BERT (SBERT)] 2020 [Retrieval-Augmented Generation (RAG)] [Dense Passage Retriever (DPR)] 2021 [Fusion-in-Decoder] [Augmented SBERT (AugSBERT)]
==== My Other Paper Readings Are Also Over Here ====

Easy-to-use TensorFlow Hub sentence embedding models are presented.
After pretraining the Universal Sentence Encoder (USE), transfer learning can be performed for even better performace.

Outline

Universal Sentence Encoder (USE)
Results

1. Universal Sentence Encoder (USE)

Two sentence encoding models are provided: Transformer and Deep Average Network (DAN).

1.1. Transformer

The encoder uses attention to compute context aware representations of words in a sentence that take into account both the ordering and identity of other words.
The context aware word representations are averaged together to obtain a sentence-level embedding.

1.2. Deep Averaging Network (DAN)

The DAN sentence encoding model begins by averaging together word and bi-gram level embeddings.
Sentence embeddings are then obtain by passing the averaged representation through a feedforward deep neural network (DNN).
The DAN encoder is trained similar to the Transformer encoder.
DAN encoder’s compute time is linear in the length.

1.3. Encoder Training Data

The sources are Wikipedia, web news, web question-answer pages and discussion forums.
Unsupervised learning is augmented with training on supervised data from the Stanford Natural Language Inference (SNLI) corpus.

1.4. Transfer Tasks

**USE Sentence Embedding for Classification** (Image from HuggingFace)

Data used for the transfer learning experiments and word embedding association tests (WEAT):

MR: Movie review sentiment on a five star scale;
CR: Sentiment of customer reviews;
SUBJ: Subjectivity of movie reviews and plot summaries;
MPQA: Phrase opinion polarity from news data;
TREC: Fine grained question classification sourced from TREC;
SST: Binary phrase sentiment classification;
STS Benchmark: Semantic textual similarity (STS) between sentence pairs scored by Pearson r with human judgments;
WEAT: Word pairs from the psychology literature on implicit association tests (IAT).

2. Results

Additional baseline CNN and DAN models are trained without using any pretrained word or sentence embeddings.
Authors also explore combining the sentence and word-level transfer models by concatenating their representations prior to the classification layers.

Using Transformer sentence-level embeddings alone outperforms InferSent on MR, SUBJ, and TREC.
The Transformer sentence encoder also strictly outperforms the DAN encoder.

Transformer embeddings outperform the sentence representations produced by InferSent.

With small quantities of training data, sentence-level transfer achieves surprisingly good performance. Transfer learning is important when training data is limited.

Brief Review — Universal Sentence Encoder for English

Universal Sentence Encoder (USE) is Proposed

Outline

1. Universal Sentence Encoder (USE)

1.1. Transformer

1.2. Deep Averaging Network (DAN)

1.3. Encoder Training Data

1.4. Transfer Tasks

2. Results

Written by Sik-Ho Tsang

No responses yet