Brief Review — Using Clinical Natural Language Processing for Health Outcomes Research: Overview and Actionable Suggestions for Future Advances

Clinical NLP Overview: Challenges and Opportunities

Sik-Ho Tsang
3 min readOct 21, 2023
NLP Model Development

Using Clinical Natural Language Processing for Health Outcomes Research: Overview and Actionable Suggestions for Future Advances
Clinical NLP Overview
, by King’s College London, KTH, The Australian National University, University of Turku, University of Warwick/Alan Turing Institute, University College London, University College London NHS Foundation Trust, The University of Melbourne, Camden and Islington NHS Foundation Trust, South London and Maudsley NHS Foundation Trust, and University of Utah,
2018 J. Bio. Info., Over 170 Citations (Sik-Ho Tsang @ Medium)

Medical LLM
2020 [BioBERT] [BEHRT] 2021 [MedGPT] 2023 [Med-PaLM]
==== My Other Paper Readings Are Also Over Here ====

  • In this paper, authors provide a broad summary and outline of the challenging issues involved in defining appropriate intrinsic and extrinsic evaluation methods for NLP research that is to be used for clinical outcomes research, and vice versa.
  • (Before the invention of LLM, it is very challenging applying NLP onto clinical/medical research.)


  1. Challenges
  2. Opportunities

1. Challenges

1.1. Evaluation Criteria

  • NLP evaluation criteria can be intrinsic (evaluating an NLP system in terms of directly measuring its performance on attaining its immediate objective) and extrinsic (evaluating an NLP system in terms of its usefulness in fulfilling an overarching goal where the NLP system is perhaps part of a more complex process or pipeline).
  • The goal of clinical research studies, on the other hand, typically relates to assessing the effect of a treatment or intervention. Clinical NLP method development has mainly focused on internal, intrinsic evaluation metrics.
  • For instance, current state-of-the-art that is achieved in medical concept classification is>80% F-score [7], which is close to human agreement on the same task; however, if such a system was to be deployed in clinical practice, any>0% error rate, might be seen as unacceptable.

Thus when using outputs from NLP approaches in clinical research studies, it is not always clear how best to incorporate and interpret NLP performance metrics.

1.2. Data Variety

  • As clinical informatics resources become larger and more comprehensive as a result of text-derived meta-data.

A variety of data sources are amenable to clinical research such as social media, wearable device data, audio and video recordings of team discussions and interactions.

1.3. Clinical NLP Applied on Mental Health Records

Using NLP methods to derive and identify such concepts from EHRs holds great promise, but requires careful methodological design.

1.4. Using NLP for Large-Sample Clinical Research

Increase in the depth of data provided by NLP can come at a cost to study reproducibility and research transparency.

  • Also, the NLP algorithm may produce different results if applied on new data for the same task.

2. Opportunities

2.1. Shareable Data

  • The ethical and legal policies that protect privacy complicate the data storage, use, and exchange from one study to another.

Synthetic clinical data has been developed.

2.2. Intrinsic Evaluation and Representation Levels

Typically, in evaluating clinical NLP methods, a gold standard corpus with instance annotations is developed, and used to measure whether or not an NLP approach correctly identifies and classifies these instances.

2.3. Beyond Electronic Health Record (EHR) Data

  • e.g.: Work on using computational language analysis on speech transcripts to study communication disturbances in patients with schizophrenia [65] or to predict onset of psychosis [66,67].
  • Work by Tsakalidis et al. [71] is the first to use both language and heterogeneous mobile phone data to predict mental well-being scores of individual users over time calibrated against psychological scales.

Therefore, authors think that for the actionable guidance and directions for the future, there can be works on data availability, evaluation workbenches, and reporting standards.



Sik-Ho Tsang

PhD, Researcher. I share what I learn. :) Linktree: for Twitter, LinkedIn, etc.