Brief Review — Alpaca: A Strong, Replicable Instruction-Following Model
Stanford Alpaca 7B & 13B
Alpaca: A Strong, Replicable Instruction-Following Model
Alpaca, by Stanford University
2023 Stanford Web Site (Sik-Ho Tsang @ Medium)Large Langauge Model (LLM)
2020 … 2023 [GPT-4] [LLaMA] [Koala] [BloombergGPT] [GLM-130B] [UL2] [PaLM 2] [Llama 2] [MultiMedQA, HealthSearchQA, Med-PaLM] [Med-PaLM 2] [Flan 2022, Flan-T5] [AlphaCode 2] [Mistral 7B]
==== My Other Paper Readings Are Also Over Here ====
- Alpaca 7B instruction-following model is proposed by fine-tuning LLaMA.
- In their GitHub, Alpaca 13B is constructed. They claimed that they also tried using LoRA for fine-tuning as well.
- Later, Alpaca is further fine-tuned as MedAlpaca using medical data.
- (Alpaca is one of the famous LLM models. Yet it is not a paper or arXiv tech report.)
Outline
- Alpaca 7B
- Results
1. Alpaca 7B
1.1. Data
For the data, instruction-following demonstrations are generated by building upon the self-instruct method. Authors started with the 175 human-written instruction-output pairs from the self-instruct seed set.
Then text-davinci-003 is used for prompting to generate more-instructions using the seed set as in-context examples.
- The self-instruct method is improved by simplifying thegeneration pipeline, which significantly reduced the cost.
This data generation process results in 52K unique instructions and the corresponding outputs, which costed less than $500 using the OpenAI API.
1.2. Model
With the data, LLaMA models are fine-tuned using Hugging Face’s training framework, taking advantage of techniques like Fully Sharded Data Parallel and mixed precision training.
For the initial run, fine-tuning a 7B LLaMA model took 3 hours on 8 80GB A100s, which costs less than $100 on most cloud compute providers.
2. Preliminary Results
- Human evaluation (by the 5 student authors) is conducted on the inputs from the self-instruct evaluation set, which covers a diverse list of user-oriented instructions including email writing, social media, and productivity tools.
Alpaca wins 90 versus 89 comparisons against text-davinci-003.
- Authors have also been testing the Alpaca model interactively and found that Alpaca often behaves similarly to text-davinci-003 on a diverse set of inputs. However, it is noted that the evaluation may be limited in scale and diversity.
- However, similar to other models, Alpaca also has hallucination and misinformation: