Brief Review — Alpaca: A Strong, Replicable Instruction-Following Model

Stanford Alpaca 7B & 13B

3 min readMay 21, 2024

Alpaca: A Strong, Replicable Instruction-Following Model
Alpaca, by Stanford University
2023 Stanford Web Site (Sik-Ho Tsang @ Medium)
Large Langauge Model (LLM)
2020 … 2023 [GPT-4] [LLaMA] [Koala] [BloombergGPT] [GLM-130B] [UL2] [PaLM 2] [Llama 2] [MultiMedQA, HealthSearchQA, Med-PaLM] [Med-PaLM 2] [Flan 2022, Flan-T5] [AlphaCode 2] [Mistral 7B]
==== My Other Paper Readings Are Also Over Here ====

Alpaca 7B instruction-following model is proposed by fine-tuning LLaMA.
In their GitHub, Alpaca 13B is constructed. They claimed that they also tried using LoRA for fine-tuning as well.
Later, Alpaca is further fine-tuned as MedAlpaca using medical data.
(Alpaca is one of the famous LLM models. Yet it is not a paper or arXiv tech report.)

Outline

Alpaca 7B
Results

1. Alpaca 7B

1.1. Data

For the data, instruction-following demonstrations are generated by building upon the self-instruct method. Authors started with the 175 human-written instruction-output pairs from the self-instruct seed set.
Then text-davinci-003 is used for prompting to generate more-instructions using the seed set as in-context examples.

The self-instruct method is improved by simplifying thegeneration pipeline, which significantly reduced the cost.

This data generation process results in 52K unique instructions and the corresponding outputs, which costed less than $500 using the OpenAI API.

1.2. Model

With the data, LLaMA models are fine-tuned using Hugging Face’s training framework, taking advantage of techniques like Fully Sharded Data Parallel and mixed precision training.
For the initial run, fine-tuning a 7B LLaMA model took 3 hours on 8 80GB A100s, which costs less than $100 on most cloud compute providers.

2. Preliminary Results

Human evaluation (by the 5 student authors) is conducted on the inputs from the self-instruct evaluation set, which covers a diverse list of user-oriented instructions including email writing, social media, and productivity tools.

Alpaca wins 90 versus 89 comparisons against text-davinci-003.

Authors have also been testing the Alpaca model interactively and found that Alpaca often behaves similarly to text-davinci-003 on a diverse set of inputs. However, it is noted that the evaluation may be limited in scale and diversity.

However, similar to other models, Alpaca also has hallucination and misinformation:

Brief Review — Alpaca: A Strong, Replicable Instruction-Following Model

Stanford Alpaca 7B & 13B

Outline

1. Alpaca 7B

1.1. Data

1.2. Model

2. Preliminary Results

Written by Sik-Ho Tsang