Brief Review — Inflection-1

Pi.AI, Empowered by Inflection-1

Sik-Ho Tsang
3 min read3 days ago
Inflection.AI (Image from https://voicebot.ai/2022/03/11/deepmind-and-linkedin-co-founders-unveil-new-conversational-ai-startup-inflection-ai/)

Inflection-1, by Inflection AI
2023 Technical Memo (

@ Medium)

Large Langauge Model (LLM)
2020 … 2023
[GPT-4] [LLaMA] [Koala] [BloombergGPT] [GLM-130B] [UL2] [PaLM 2] [Llama 2] [MultiMedQA, HealthSearchQA, Med-PaLM] [Med-PaLM 2] [Flan 2022, Flan-T5] [AlphaCode 2] [Mistral 7B]
==== My Other Paper Readings Are Also Over Here ====

  • Last week, one of my colleagues has introduced me about Pi.AI which make me read about this Inflection-1 LLM technical memo.
  • In 2022, DeepMind and LinkedIn Co-Founders Unveil New Conversational AI Startup Inflection AI. The mission is to create personal AIs for everyone. In 2023, Inflection AI has invented Inflection-1 LLM, and Pi.AI is published and freely used, which is empowered by Inflection-1.
  • Yet, DeepMind co-founder Mustafa Suleyman has become the CEO of Microsoft AI recently.

Outline

  1. Inflection-1
  2. Results

1. Inflection-1 Benchmarking Results

  • To offer a fair comparison amongst models of varying sizes and training methods, foundation models are segmented into those pretrained using at most the FLOPs of Google’s PaLM-540B (approximately 10x GPT-3) and those which used more.
  • First compute class: Models in the former category are usually faster to serve and can be deployed more widely.
  • Second compute class: Models in the latter category tend to have the highest performance.
  • GPT-3.5 to the former category and GPT-4 to the latter.

Inflection-1 was trained on a large dataset using thousands of NVIDIA H100 GPUs, and is a model within the first compute class.

  • For Inflection-1, results without instruction tuning or RLHF are reported.

2. Results

2.1. Overview

Overview of Inflection-1’s performance relative to LLaMA and GPT-3.5

Inflection-1 outperforms GPT-3.5 and LLaMA-65B for the above 5 benchmarks.

2.2. Multitask Language Understanding (MMLU)

Multitask Language Understanding (MMLU)
  • The proposed model outperforms all models in the first compute class including both GPT-3.5 and LLaMA.

2.3. Closed Book Question Answering

Closed Book Question Answering

Inflection-1 is significantly better at Trivia Questions.

2.4. Others

  • The above results are shown in their website while the below results only shown in the technical memo.
0-shot results on common sense benchmarks.
  • Similar to OpenAI GPT-4, authors do not disclose their model size as we can see in all tables.
Common sense benchmarks with comparison to GPT-4 and PaLM 2.
BIG-Bench hard with Chain of Thought prompting.
Reading comprehension benchmark RACE along with LAMBADA.
Mathematical reasoning datasets.
Code generation tasks.

Inflection-1 outperforms or has similar performance with first compute class LLM in most of the tasks, only underperforms second compute class LLM such as GPT-4 and PaLM 2-L.

--

--

Sik-Ho Tsang

PhD, Researcher. I share what I learn. :) Linktree: https://linktr.ee/shtsang for Twitter, LinkedIn, etc.