Brief Review — ChatDoctor: A Medical Chat Model Fine-Tuned on a Large Language Model Meta-AI (LLaMA) Using Medical Domain Knowledge

ChatDoctor, LLaMA for Medical Domain Knowledge

Sik-Ho Tsang
4 min readFeb 27, 2024

ChatDoctor: A Medical Chat Model Fine-Tuned on a Large Language Model Meta-AI (LLaMA) Using Medical Domain Knowledge
ChatDoctor
, by University of Texas Southwestern Medical Center, University of Illinois at Urbana-Champaign, The Ohio State University, and Hangzhou Dianzi University
2023 Cureus, Over 60 Citations (Sik-Ho Tsang @ Medium)

Medical/Clinical/Healthcare NLP/LLM
2017 [LiveQA] … 2023 [MultiMedQA, HealthSearchQA, Med-PaLM] [Med-PaLM 2] [GPT-4 in Radiology] [ChatGPT & GPT‑4 on USMLE] [Regulatory Oversight of LLM] [ExBEHRT]
==== My Other Paper Readings Are Also Over Here ====

  • ChatDoctor is proposed, where the large language model meta-AI LLaMA is adapted and refined using a large dataset of 100,000 patient-doctor dialogues sourced from a widely used online medical consultation platform.

Outline

  1. ChatDoctor Datasets
  2. ChatDoctor Methodology
  3. Results

1. ChatDoctor Datasets

1.1. Collection and Preparation of Patient-Physician Conversation Dataset

Collection and Preparation of Patient-Physician Conversation Dataset
  • Authentic patient-doctor conversations are gathered, which has around 100k such interactions from the online medical consultation website, HealthCareMagic.

1.2. Creation of External Knowledge Database

Samples of External Knowledge Database
  • The accuracy of these models could be significantly improved if they could generate or assess responses based on a reliable knowledge database.

Consequently, a database (sample shown in the above Figure 3) is curated, encompassing diseases, their symptoms, relevant medical tests/treatment procedures, and potential medications. This database serves as an external and offline knowledge brain for ChatDoctor.

  • Continually updatable without requiring model retraining, this database can be tailored to specific diseases or medical specialties. MedlinePlus is utilized to construct this disease database, but other reliable sources can also be used.

Additionally, online information sources like Wikipedia can supplement the knowledge base.

2. ChatDoctor Methodology

2.1. Development of Autonomous ChatDoctor with Knowledge Brain

QA Using External Knowledge Database
  • A mechanism is devised to enable ChatDoctor to autonomously retrieve necessary information to answer queries. This was accomplished by constructing appropriate prompts to input into the ChatDoctor model.
Prompts and Instructions

Keyword mining prompts (Figure 4) are designed as the initial step ChatDoctor to extract key terms from patient queries for relevant knowledge search.

  • Based on these keywords, top-ranked information was retrieved from the knowledge brain using a term-matching retrieval system [13].
  • Given the LLM’s word limit (token size), the texts are divided to be read into equal sections and ranked each section by the number of keyword hits.

The ChatDoctor model then reads the first N sections (5 used in this study) sequentially, selecting and summarizing pertinent information via prompts (Figure 5).

Ultimately, the model processes and compiles all the knowledge entries to generate a final response (Figure 6).

  • This information retrieval approach ensures patients receive precise, well-informed responses backed by credible sources and can serve as a verification method for responses generated by ChatDoctor from prior knowledge.

2.2. Model Training

  • Meta’s publicly accessible LLaMA-7B model is used.
  • Conversations from HealthCareMagic-100k are used to fine-tune the LLaMA model in line with Stanford Alpaca [5] training methodology.

The model was first fine-tuned with Alpaca’s data to acquire basic conversation skills, followed by further refinement on HealthCareMagic-100k using 6 * A100 GPUs for three hours.

3. Results

3.1. Qualitative Results

  • One of these included a question related to “Monkeypox”. Monkeypox was recently designated by the World Health Organization (WHO) on November 28, 2022, making it a relatively novel term.

While ChatGPT was incapable of providing a satisfactory response, ChatDoctor, due to its autonomous knowledge retrieval feature, was able to extract pertinent information about Monkeypox from Wikipedia and deliver a precise answer.

A question about “Daybue,” a drug that received FDA approval in March 2023, was accurately addressed by ChatDoctor.

  • (More examples are shown in the paper)

3.2. Quantitative Results

BERTScore
  • BERTScore leverages pre-trained BERT to match words in the candidate and reference sentences via cosine similarity.
  • It is used to calculate precision, recall and F1 score.
  • For a quantitative evaluation of ChatDoctor’s performance, questions from the independently sourced iCliniq database is utilized as inputs.

The fine-tuned ChatDoctor model outperforms ChatGPT across all 3 metrics.

--

--

Sik-Ho Tsang

PhD, Researcher. I share what I learn. :) Linktree: https://linktr.ee/shtsang for Twitter, LinkedIn, etc.