Brief Review — MedAug: Contrastive Learning Leveraging Patient Metadata Improves Representations for Chest X-Ray Interpretation

MedAug, Leverages Patient Metadata to Select Positive Pairs

Sik-Ho Tsang
3 min readDec 7, 2022


MedAug: Contrastive Learning Leveraging Patient Metadata Improves Representations for Chest X-Ray Interpretation,
MedAug, by Stanford University,
2021 MLHC, Over 20 Citations (Sik-Ho Tsang @ Medium)
Self-Supervised Learning, Medical Imaging, Medical Image Analysis, Image Classification

  • MedAug is proposed to select positive pairs, by leveraging patient metadata to improve representations, for medical image self-supervised learning.
  • This is a paper by Prof. Andrew Ng’s research group.


  1. MedAug
  2. Results

1. MedAug

1.1. Dataset

  • CheXpert, is a large collection of de-identified chest X-ray images.
  • The dataset consists of 224,316 images from 65,240 patients labeled for the presence or absence of 14 radiological observations. These images are used for pretraining, and random samples of 1% of these images for fine-tuning.
  • The test set consists of 500 additional labeled images from 500 studies not included in the training set.

1.2. Self-Supervised Learning

  • ResNet-18 is used as backbone.
  • MoCo v2 is used for self-supervised learning.
  • Given an input image x, encoder g, and a set of augmentations T, most contrastive learning algorithms involve minimizing the InfoNCE loss (CPC / CPCv1):

The positive pair (~x1=t1(x), ~x2=t2(x)) with t1, t2 ∈ T are augmentations of the input image x.

  • The negative pairs (~x1, zi), are pairs of augmentations of different images.

1.3. MedAug

Selecting positive pairs for contrastive learning with patient metadata

In this paper, beyond the disease labels, MedAug is proposed to use patient metadata such as patient number, study number, laterality, patient historical record, etc. to create appropriate positive pairs.

  • Formally, patient metadata is used to obtain an enhanced augmentation set dependent on x as follows:
  • where Sc(x) is the set of all images satisfying some predefined criteria c in relation to the properties of x. The criteria for using the metadata could be informed by clinical insights about the downstream task of interest.

Any set of images Sc(x) from the same patient, can be from distinct/same/all studies, or from distinct/same/all literalities, as shown above.

2. Results

  • The best result is obtained when using S_same study_all_lateralities(x), the set of images from the same patient and same study as that of x, regardless of laterality, in respective gains of 0.029 (3.4%) and 0.021 (2.4%) in AUC for the linear model and end-to-end model.

With also the modified random crop scale of [0.95, 1], the best pretrained model is achieved with a linear fine-tuning AUC of 0.883 and an end-to-end fine-tuning AUC of 0.906 on the test set, significantly outperforming previous baselines.

  • Authors tried to pick up the hard negative pairs using metadata, but no improvement is observed.
  • Authors also provides other ablation experimental results, please feel free to read the paper.



Sik-Ho Tsang

PhD, Researcher. I share what I learn. :) Linktree: for Twitter, LinkedIn, etc.