Brief Review — Clinial BERT: Publicly Available Clinical BERT Embeddings

Clinical BERT, BERT Pretrained on 2M of Clinical Notes

Sik-Ho Tsang
3 min readNov 10, 2023
Clinical BERT

Publicly Available Clinical BERT Embeddings
Clinical BERT
, by MIT CSAIL, Microsoft Research
2019 ClinicalNLP, Over 1500 Citations (Sik-Ho Tsang @ Medium)

Medical NLP/LLM
2017 [LiveQA] 2018 [Clinical NLP Overview] 2019 [MedicationQA] [G-BERT] 2020 [BioBERT] [BEHRT] 2021 [MedGPT] 2023 [Med-PaLM]
==== My Other Paper Readings Are Also Over Here ====

  • The aim of this paper is to train and publicly release BERT-Base and BioBERT-finetuned models trained on both all clinical notes and only discharge summaries.


  1. Clinical BERT
  2. Results

1. Clinical BERT

1.1. Data

  • Clinical text from the approximately 2 million notes in the MIMIC-III v1.4 database, with some preprocessing (Appendix A), is used.
  • Two settings on MIMIC notes:
  1. Clinical BERT/BioBERT, which uses text from all note types, and
  2. Discharge Summary BERT/BioBERT, which uses only discharge summaries in an effort to tailor the corpus to downstream tasks

1.2. Models

  • Two BERT models are trained on clinical text:
  1. Clinical BERT, initialized from BERTBase, and
  2. Clinical BioBERT, initialized from BioBERT.
  • For all downstream tasks, BERT models were allowed to be fine-tuned, then the output BERT embedding was passed through a single linear layer for classification.
  • The entire embedding model pretraining procedure took roughly 17 - 18 days of computational runtime using a single GeForce GTX TITAN X 12 GB GPU.

1.3. Downstream Tasks

Downstream Tasks
  • (Please read the paper for more details.)

2. Results

2.1. Quantitative Results

Quantitative Results

On 3 of the 5 tasks (MedNLI, i2b2 2010, and i2b2 2012), clinically fine-tuned BioBERT shows improvements over BioBERT or general BERT.

  • However, on 2 de-ID tasks, i2b2 2006 and i2b2 2014, clinical BERT offers no improvements over BioBERT or general BERT. This is actually not surprising, De-ID challenge data presents a different data distribution than MIMIC text.

2.2. Qualitative Results

Qualitative Results

Clinical BERT retains greater cohesion around medical or clinicoperations relevant terms than does BioBERT.

For example, the word “Discharge” is most closely associated with “admission,” “wave,” and “sight” under BioBERT, yet only the former seems relevant to clinical operations. In contrast, under Clinical BERT, the associated words all are meaningful in a clinical operations context.



Sik-Ho Tsang

PhD, Researcher. I share what I learn. :) Linktree: for Twitter, LinkedIn, etc.