Review — A Deep Learning System for Differential Diagnosis of Skin Diseases

A Dataset for Skin Diseases, Inception-v4 Used with Both Image Data and Clinical Metadata as Input

Sik-Ho Tsang
4 min readJul 17, 2022

A Deep Learning System for Differential Diagnosis of Skin Diseases,
Liu NatureMedicine’20
, by Google Health, University of California, Advanced Clinical, Adecco Staffing, Massachusetts Institute of Technology, and Medical University of Graz
2020 Nature Medicine, Over 200 Citations, Impact Factor of 53.44 (Sik-Ho Tsang @ Medium)
Image Classification, Biomedical, Differential Diagnosis, Dermatology

  • A dataset is collected which is used for the differential diagnosis of skin conditions using 16,114 de-identified cases (photographs and clinical data) from a teledermatology practice serving 17 sites.
  • A deep learning system (DLS) is proposed to provide a differential diagnosis of the above dataset using image classification network.


  1. Dermatology Dataset
  2. Deep Learning System (DLS)
  3. Experimental Results

1. Dataset

Representative examples of challenging cases missed by non-dermatologists
  • A temporal split is applied to teledermatology cases: the first 80% of the cases (years 2010–2017) for development and the last 20% (years 2017–2018) for validation.
  • The reference standard for each case was determined by the aggregated opinions of multiple dermatologists who reviewed the case independently.
  • After excluding cases with multiple skin conditions and those that were non-diagnosable, 16,114 cases (64,837 images) were used for development and 3,756 cases (14,883 images) for validation (validation set ‘A’; a smaller subset ‘B’ was used for comparison with clinicians and is described in the relevant sections).
  • In total, 64,878 dermatologist reviews were collected for development and 11,268 reviews for validation.
  • (One of the most difficult parts is to gather so many cases from different sites, which involves a lot of collaborations among different institutions.)

2. Deep Learning System (DLS)

Overview of Deep Learning System (DLS)

2.1. Input

  • For each case, the DLS takes as input one to six de-identified skin photographs and 45 metadata variables such as demographic information and medical history (left).

2.2. Model

  • The DLS then processes the images using Inception-v4 modules with shared weights before applying an average pool and concatenating with the metadata features.

2.3. Output

  • The primary output of the classification layer of the DLS is the relative likelihood of 27 categories (26 skin conditions plus ‘other’).
  • The secondary output is the relative likelihood of the full set of 419 skin conditions seen in this work. These conditions were chosen based on a granularity that could guide a non-dermatologist clinician to next steps in clinical care.

2.4. Labels

  • The labels used to develop and validate the DLS were provided by board-certified dermatologists (one or more dermatologists per case for training and three dermatologists per case for the validation set).
  • For each case, each dermatologist provided their top three differential diagnoses. The multiple differential diagnoses are then aggregated into a single ranked list.
  • During training, the aggregated ranked list of dermatologist-provided diagnoses have an associated aggregated ‘confidence’ score per diagnosis, and these confidences are the target ‘soft’ labels for the DLS.
  • The DLS therefore learns from both the primary (top-ranked) diagnosis as well as the lower-ranked diagnoses. In this way, the DLS was trained to provide a differential diagnosis instead of a single prediction output.

3. Experimental Results

Importance of different metadata inputs to the DLS
  • For important features, after permutation, there will be large drop in accuracy.

As seen, self-reported skin problem is one of the important metadata features.

Importance of different inputs to the DLS
  • The blue line illustrates the impact on the top-1 accuracy of different numbers of input images for the same DLS (that was trained using all images and metadata).
  • The red line illustrates a similar trend when the clinical metadata are absent from this same DLS.
  • Finally, the green line illustrates the trend, but for a DLS retrained without using clinical metadata (so that the DLS cannot depend on the presence of clinical metadata).

Using both metadata and image data obtains the highest accuracy.

Also, increasing number of images for training improves the accuracy.

(I am not an expert in biomedical field. I only present this paper in the deep learning perspective. There are still many dataset statistics and results shown in the paper, please feel free to read the paper.)


[2020 Nature Medicine] [Dermatology]
A Deep Learning System for Differential Diagnosis of Skin Diseases

Biomedical Image Classification

2020 [VGGNet for COVID-19] [Dermatology]

My Other Previous Paper Readings



Sik-Ho Tsang

PhD, Researcher. I share what I learn. :) Linktree: for Twitter, LinkedIn, etc.