Brief Review — Universal Sentence Encoder for English
Universal Sentence Encoder (USE) is Proposed
Universal Sentence Encoder for English
USE, by Google AI, & Google
2018 EMNLP, Over 1200 Citations (Sik-Ho Tsang @ Medium)Sentense Embedding / Dense Text Retrieval
2019 [Sentence-BERT (SBERT)] 2020 [Retrieval-Augmented Generation (RAG)] [Dense Passage Retriever (DPR)] 2021 [Fusion-in-Decoder] [Augmented SBERT (AugSBERT)]
==== My Other Paper Readings Are Also Over Here ====
- Easy-to-use TensorFlow Hub sentence embedding models are presented.
- After pretraining the Universal Sentence Encoder (USE), transfer learning can be performed for even better performace.
Outline
- Universal Sentence Encoder (USE)
- Results
1. Universal Sentence Encoder (USE)
- Two sentence encoding models are provided: Transformer and Deep Average Network (DAN).
1.1. Transformer
- The encoder uses attention to compute context aware representations of words in a sentence that take into account both the ordering and identity of other words.
- The context aware word representations are averaged together to obtain a sentence-level embedding.
1.2. Deep Averaging Network (DAN)
- The DAN sentence encoding model begins by averaging together word and bi-gram level embeddings.
- Sentence embeddings are then obtain by passing the averaged representation through a feedforward deep neural network (DNN).
- The DAN encoder is trained similar to the Transformer encoder.
- DAN encoder’s compute time is linear in the length.
1.3. Encoder Training Data
- The sources are Wikipedia, web news, web question-answer pages and discussion forums.
- Unsupervised learning is augmented with training on supervised data from the Stanford Natural Language Inference (SNLI) corpus.
1.4. Transfer Tasks
- Data used for the transfer learning experiments and word embedding association tests (WEAT):
- MR: Movie review sentiment on a five star scale;
- CR: Sentiment of customer reviews;
- SUBJ: Subjectivity of movie reviews and plot summaries;
- MPQA: Phrase opinion polarity from news data;
- TREC: Fine grained question classification sourced from TREC;
- SST: Binary phrase sentiment classification;
- STS Benchmark: Semantic textual similarity (STS) between sentence pairs scored by Pearson r with human judgments;
- WEAT: Word pairs from the psychology literature on implicit association tests (IAT).
2. Results
- Additional baseline CNN and DAN models are trained without using any pretrained word or sentence embeddings.
- Authors also explore combining the sentence and word-level transfer models by concatenating their representations prior to the classification layers.
Using Transformer sentence-level embeddings alone outperforms InferSent on MR, SUBJ, and TREC.
The Transformer sentence encoder also strictly outperforms the DAN encoder.
- Transformer embeddings outperform the sentence representations produced by InferSent.
With small quantities of training data, sentence-level transfer achieves surprisingly good performance. Transfer learning is important when training data is limited.