Brief Review — Prefix-Tuning: Optimizing Continuous Prompts for Generation

Prefix-Tuning

Sik-Ho Tsang
3 min readMay 5, 2024
Prefix-Tuning

Prefix-Tuning: Optimizing Continuous Prompts for Generation
Prefix-Tuning
, by Stanford University
2021 ACL IJCNLP, Over 2600 Citations (Sik-Ho Tsang @ Medium)

Language Model (LM)
2007 … 2022
[GLM] [Switch Transformers] [WideNet] [MoEBERT] [X-MoE] [sMLP] [LinkBERT, BioLinkBERT] [AlphaCode] 2023 [ERNIE-Code]
==== My Other Paper Readings Are Also Over Here ====

  • Prefix-tuning is proposed, which is a lightweight alternative to fine-tuning for natural language generation tasks.
  • Prefix-tuning keeps language model parameters frozen and instead optimizes a sequence of continuous task-specific vectors, which we call the prefix.

Outline

  1. Prefix-Tuning
  2. Results

1. Prefix-Tuning

Prefix-Tuning
  • Top: Fine-tuning a full languge model (LM) is expensive.
  • Bottom: On the contrary, intuitively for example, if we want the LM to generate a word (e.g., Obama), we can prepend its common collocations as context (e.g., Barack).
  • If we keep the LM parameters frozen, then the number of paramters need to be tuned is very few compared to the one for full fine-tuning.

Prefix-tuning prepends a prefix for an autoregressive LM to obtain z = [PREFIX; x; y], or prepends prefixes for both encoder and decoder to obtain z = [PREFIX; x; PREFIX0; y]:

The language model parameters φ are fixed and the prefix parameters θ are the only trainable parameters.

  • The prefix activations are always in the left context and will therefore affect any activations to the right.
  • Direct fine-tuning of θ is unstable. Reparametrization parameters are introduced to avoid direct fine-tuning of θ. Once training is complete, these reparametrization parameters can be dropped, and only the prefix (P) needs to be saved.

2. Results

Prefix-tuning is significantly better than ADAPTER (0.1%), attaining 4:1 BLEU improvement per dataset on average.

  • Left: 8 examples generated by both prefix-tuning and fine-tuning models trained on different data levels.

Right: Prefix-tuning outperforms fine-tuning in low-data regimes by 2.9 BLEU on average, in addition to requiring much fewer parameters, but the gap narrows as the dataset size increases.

--

--

Sik-Ho Tsang

PhD, Researcher. I share what I learn. :) Linktree: https://linktr.ee/shtsang for Twitter, LinkedIn, etc.