BloombergGPT: A Large Language Model for Finance

BloombergGPT 50B, Curate Financial Data for Training, Outperforms Other LLMs Trained by General Data

Sik-Ho Tsang
2 min readJun 10, 2023
BloombergGPT (Image from Bloomberg)

BloombergGPT: A Large Language Model for Finance,
BloombergGPT, by Bloomberg, and Johns Hopkins University,
2023 arXiv v1 (Sik-Ho Tsang @ Medium)

Language Model
1991 … 2022
[GPT-NeoX-20B] [GPT-3.5, InstructGPT] [GLM] [MT-NLG 530B] [Chinchilla] [PaLM] [AlexaTM] [BLOOM] [AlexaTM 20B] [OPT] [Switch Transformers] [LaMDA] [LoRA] [Galactica] 2023 [GPT-4] [LLaMA] [LIMA] [Koala]
==== My Other Paper Readings Are Also Over Here ====

  • BloombergGPT, a 50 billion parameter language model, is proposed that is trained on a wide range of financial data.
  • A 363 billion token dataset is constructed based on Bloomberg’s extensive data sources, perhaps the largest domain-specific dataset yet, augmented with 345 billion tokens from general purpose datasets.

Outline

  1. BloombergGPT Dataset
  2. BloombergGPT Model
  3. Results

1. BloombergGPT Dataset

Full Training Set

To train BloombergGPT, “FinPile” is constructed, which is a comprehensive dataset consisting of a range of English financial documents including news, filings, press releases, web-scraped financial documents, and social media drawn from the Bloomberg archives.

  • General-purpose dataset, which is used by many LLMs, is also included so that there are half domain-specific text (51.27%) and half general-purpose text (48.73%).

2. BloombergGPT Model

  • The model is a decoder-only causal language model based on BLOOM.
  • The model contains L=70 layers of Transformer decoder blocks. It has an additional layer normalization after token embeddings.
  • Based on scaling law by Chinchilla, with L=70, 40 heads, each having a dimension of 192, resulting in a total hidden dimension of D=7680 and a total of 50.6B parameters.
  • A total of 512 40GB A100 GPUs are used. 139,200 steps (~53 days) are used for training.

3. Results

3.1. Bits Per Byte

Bits Per Byte

BloombergGPT consistently outperforms GPT-Neo-X, OPT-66B, and BLOOM-176B.

3.2. Finance-Specific and General-Purpose

Finance-Specific and General-Purpose (Table from Bloomberg)

BloombergGPT obtains the best results on finance-specific tasks.

BloombergGPT also obtains the best results under similar model size, only underperforms GPT-3, while GPT-3 has 176B model size, which is much larger.

--

--

Sik-Ho Tsang

PhD, Researcher. I share what I learn. :) Linktree: https://linktr.ee/shtsang for Twitter, LinkedIn, etc.