Brief Review — The imperative for regulatory oversight of large language models (or generative AI) in healthcare

Regulatory Oversight of LLM

Sik-Ho Tsang
3 min readJan 30, 2024
Ten examples of use cases of LLMs for medical professionals; and ten examples for patients.

The imperative for regulatory oversight of large language models (or generative AI) in healthcare
Regulatory Oversight of LLM
, by The Medical Futurist Institute, Semmelweis University, Scripps Research
2023 npj Digit Med, Over 90 Citations (Sik-Ho Tsang @ Medium)

Medical/Clinical NLP/LLM
20172023 [MultiMedQA, HealthSearchQA, Med-PaLM] [Med-PaLM 2] [GPT-4 in Radiology] [ChatGPT & GPT‑4 on USMLE]
==== My Other Paper Readings Are Also Over Here ====

  • The regulation of GPT-4 and generative AI in medicine and healthcare without damaging their exciting and transformative potential is a timely and critical challenge to ensure safety, maintain ethical standards, and protect patient privacy.
  • Authors argue that regulatory oversight should assure medical professionals and patients can use LLMs without causing harm or compromising their data or privacy.


  1. LLM Problems
  2. Regulatory Challenges & Expectations

1. LLM Problems

  • LLMs have transformative potential, with use cases ranging from clinical documentation to providing personalized health plans. At the same time, the introduction of these models into healthcare leads to the amplification of risks and challenges.
  • It started posing a new challenge to physicians as patients arrive to the meeting with not only responses received after googling their symptoms but also from ChatGPT-like chatbots.

LLMs can sometimes “hallucinate” results, which refers to generating outputs that are not grounded in the input data or factual information.

  • It poses a significant risk of providing unreliable or outright false answers in the medical setting that might have serious consequences.
  • Another issue, bias in medicine while using LLMs can affect clinical decision-making, patient outcomes, and healthcare equity. Biases contains, such as underrepresentation of certain demographic groups, overemphasis on specific treatments, or outdated medical practices.

Biased outputs from GPT-4 may lead to incorrect diagnoses or suboptimal treatment recommendations, potentially causing harm to patients or delaying appropriate care.

  • The application of GPT-4 in healthcare raises ethical concerns that warrant a regulatory framework. Issues such as transparency, accountability, and fairness need to be addressed to prevent potential ethical lapses.

2. Regulatory Challenges & Expectations

2.1. Regulatory Challenges

  • LLM at the early stages can analyze texts only.
  • With the release of GPT-4 that can analyze not only texts but images, it can be expected that the model will grow to analyze uploaded documents, research papers, hand-written notes, sound, and video in the near future. (Table 2).
  • Without taking these future additions into consideration, a regulation that focuses on language models only could miss important updates by the time those updates become widely accessible.
  • The above table shows the LLM regulatory challenges.

2.2. Expectations

  • Authors expect regulators to:
  1. Create a new regulatory category for LLMs.
  2. Provide a regulatory guidance for companies and healthcare organizations about how they can deploy LLMs.
  3. Create a regulatory framework that not only covers text-based interactions but possible future iterations such as analyzing sound or video.
  4. Provide a framework for making a distinction between LLMs specifically trained on medical data and LLMs trained for non-medical purposes.
  5. Similar to the FDA’s Digital Health Pre-Cert Program, regulate companies developing LLMs instead of regulating every single LLM iteration.



Sik-Ho Tsang

PhD, Researcher. I share what I learn. :) Linktree: for Twitter, LinkedIn, etc.