# Brief Review — The Power of Scale for Parameter-Efficient Prompt Tuning

## T5+LM, Using Prompt Tuning, Further Used in T0

--

The Power of Scale for Parameter-Efficient Prompt Tuning
Prompt Tuning, T5+LM, by Google Research
2021 EMNLP, Over 1100 Citations (Sik-Ho Tsang @ Medium)

Language Model
1991 … 2023
[GPT-4] [LLaMA] [LIMA] [Koala] [BloombergGPT] [GLM-130B] [UL2] [PaLM 2]
==== My Other Paper Readings Are Also Over Here ====

• Prompt Tuning, a simple yet effective mechanism, is proposed for learning “soft prompts” to condition frozen language models to perform specific downstream tasks.
• Unlike the discrete text prompts used by GPT-3, soft prompts are learned through backpropagation and can be tuned to incorporate signals from any number of labeled examples.

# Outline

1. Prompt Tuning
2. Results

# 1. Prompt Tuning

• Model tuning is to fine-tune the whole model, which is expensive and not effective, especially when the model is large language model.

## 1.1. Prompt Tuning

• Suppose the conditional generation is Prθ(y|X), where X is a series of tokens and y is a single class label. θ is the model parameter.
• Normally, prompting is done by prepending a series of tokens, P, to the input X, such that the model maximizes the likelihood of the correct Y, Prθ(y|[P; X]).

Prompt tuning can be thought of as using a fixed prompt of special tokens, where only the embeddings of these prompt tokens, θp, can be updated. The new conditional generation is now Prθ,θp(y|[P; X]).

## 1.2. Soft-Prompts

• Soft-prompts are represented as a parameter Pe. The prompt is then concatenated to the embedded input forming a single matrix [Pe, Xe] which then flows though the encoder-decoder as normal.

The models are trained to maximize the probability of Y, but only the prompt parameters Pe are updated. The pretrained model parameters, θ, are frozen.

• Since T5 is pre-trained on a span corruption objective, in which target output text consists of all the masked content, separated by sentinels, plus a final sentinel.
• Using LM Adaptation, T5’s self-supervised training is performed for a small number of additional steps, up to 100K steps, but using the autoregressive “LM” objective, to remove sentinel at output.

# 2. Results

Prompt tuning becomes more competitive with model tuning as scale increases. At the XXL size (11 billion parameters), prompt tuning matches even the stronger multi-task model tuning baseline, despite having over 20,000 times fewer task-specific parameters.