Brief Review — MedGPT: Medical Concept Prediction from Clinical Narratives

MedGPT, Predicting Future Events

4 min readOct 12, 2023

**Prediction of the right-most disorder using MedGPT** Age (49) was the most salient input token followed by the tokens ketoacidosis and ulcer of foot.

MedGPT: Medical Concept Prediction from Clinical Narratives
MedGPT, by Kings College London, and Kings College Hospital NHS Foundation Trust,
2021 arXiv v1 (Sik-Ho Tsang @ Medium)
Medical LLM
2020 [BioBERT] [BEHRT] 2023 [Med-PaLM]
==== My Other Paper Readings Are Also Over Here ====

Temporal modelling of a patient’s medical history, which takes into account the sequence of past events, can be used to predict future events such as a diagnosis of a new disorder or complication of a previous or existing disorder.
MedGPT, a novel Transformer-based pipeline, is proposed that uses Named Entity Recognition (NER) and Linking tools (i.e. MedCAT) to structure and organize the free text portion of EHRs and anticipate a range of future medical events (initially disorders).
(Beaware when searching MedGPT in Internet, many products and services are dubbed as MedGPT nowadays.)

Outline

MedGPT
Datasets
Results

1. MedGPT

1.1. Model

MedGPT is built on top of the GPT-2, which uses causal language modeling (CLM).
Given a corpus of patients 𝑈 = {𝑢1, 𝑢2, u3, ...} where each patient is defined as a sequence of tokens 𝑢𝑖 = {𝑤1, w2, w3, …} and each token is medically relevant and temporally defined piece of patient data, the objective is the standard language modeling objective:

1.2. 8 Model Variants

8 different approaches on top of the base GPT-2 model, are tried.
1) Memory Transformers [4]; 2) Residual Attention [7]; 3) ReZero [2]; 4) Talking Heads Attention [18]; 5) Sparse Transformers [23]; 6) Rotary embeddings [21]; 7) GLU [17]; and 8) Word2Vec word embedding initialization.
(Please read the paper for the details.)

2. Datasets

Two EHR datasets were used: King’s College Hospital (KCH) NHS Foundation Trust, UK and MIMIC-III [10].
No preprocessing or filtering was done on the MIMIC-III dataset of clinical notes and all 2083179 free text documents were used directly.
At KCH, a total of 18436789 documents is collected. After filtering step 13084498 documents were left.
In brief, the Medical Concept Annotation Toolkit (MedCAT [11]) was used to extract disorder concepts from free text and link them to the SNOMED-CT concept database.
(Please read the paper for the sophisticated disorder extraction.)
The concepts were then grouped by patient and only the first occurrence of a concept was kept.
Without any filtering, there were 1121218 patients at KCH and 42339 at MIMIC-III, after removal of all disorders with frequency < 100 and all patients that have < 5 tokens, 582548 and 33975 patients were left respectively. The length of each sample/patient is limited to 50 tokens.
The resulting dataset was then split into a train/test set with an 80/20 ratio. The train set was further split into a train/validation set with a 90/10 ratio.

3. Results

3.1. Model Variants

The combined “GLU+Rotary” MedGPT achieves the best results.

Finally, the MedGPT model, which consists of the GPT-2 base model with the GLU+Rotary extension, is tested on two datasets KCH and MIMIC-III.

3.2. Performance Comparison

MedGPT outperforms BoC SVM and LSTM.

3.3. Qualitative Analysis

Example 1: This is a simple binary task which it performed well, consistent with medical literature.
Example 2: The background (cerebral aneurysm) provided the contextual cue for the rarer diagnosis which MedGPT successfully discerned.
Example 3: is used to test the longer attention. MedGPT also successfully handled the necessary indirect inference.
Example 4: Similar to above Example 3, attention in the presence of distractors were tested through intermixing historical diseases.

Brief Review — MedGPT: Medical Concept Prediction from Clinical Narratives

MedGPT, Predicting Future Events

Outline

1. MedGPT

1.1. Model

1.2. 8 Model Variants

2. Datasets

3. Results

3.1. Model Variants

3.2. Performance Comparison

3.3. Qualitative Analysis

Written by Sik-Ho Tsang

No responses yet