Brief Review — Prefix-Tuning: Optimizing Continuous Prompts for Generation

Prefix-Tuning

3 min readMay 5, 2024

--

Prefix-Tuning

Prefix-Tuning: Optimizing Continuous Prompts for Generation
Prefix-Tuning, by Stanford University
2021 ACL IJCNLP, Over 2600 Citations (Sik-Ho Tsang @ Medium)
Language Model (LM)
2007 … 2022 [GLM] [Switch Transformers] [WideNet] [MoEBERT] [X-MoE] [sMLP] [LinkBERT, BioLinkBERT] [AlphaCode] 2023 [ERNIE-Code]
==== My Other Paper Readings Are Also Over Here ====

Prefix-tuning is proposed, which is a lightweight alternative to fine-tuning for natural language generation tasks.
Prefix-tuning keeps language model parameters frozen and instead optimizes a sequence of continuous task-specific vectors, which we call the prefix.

Outline

Prefix-Tuning
Results

1. Prefix-Tuning

Prefix-Tuning

Top: Fine-tuning a full languge model (LM) is expensive.
Bottom: On the contrary, intuitively for example, if we want the LM to generate a word (e.g., Obama), we can prepend its common collocations as context (e.g., Barack).
If we keep the LM parameters frozen, then the number of paramters need to be tuned is very few compared to the one for full fine-tuning.

Prefix-tuning prepends a prefix for an autoregressive LM to obtain z = [PREFIX; x; y], or prepends prefixes for both encoder and decoder to obtain z = [PREFIX; x; PREFIX0; y]:

The language model parameters φ are fixed and the prefix parameters θ are the only trainable parameters.

The prefix activations are always in the left context and will therefore affect any activations to the right.
Direct fine-tuning of θ is unstable. Reparametrization parameters are introduced to avoid direct fine-tuning of θ. Once training is complete, these reparametrization parameters can be dropped, and only the prefix (P) needs to be saved.

2. Results

GPT-2MEDIUM and GPT-2LARGE are used.

Prefix-tuning is significantly better than ADAPTER (0.1%), attaining 4:1 BLEU improvement per dataset on average.

Left: 8 examples generated by both prefix-tuning and fine-tuning models trained on different data levels.

Right: Prefix-tuning outperforms fine-tuning in low-data regimes by 2.9 BLEU on average, in addition to requiring much fewer parameters, but the gap narrows as the dataset size increases.

Artificial Intelligence

Sik-Ho Tsang

Written by Sik-Ho Tsang

13.7K Followers

PhD, Researcher. I share what I learn. :) Linktree: https://linktr.ee/shtsang for Twitter, LinkedIn, etc.

Help
Status
About
Careers
Press
Blog
Privacy
Terms
Text to speech
Teams