Brief Review — Accuracy of a Chatbot in Answering Questions that Patients Should Ask Before Taking a New Medication

ChatGPT on Patients Medication

4 min readJun 18, 2024

Accuracy of a Chatbot in Answering Questions that Patients Should Ask Before Taking a New Medication
ChatGPT on Patients Medication, by University of Arizona
2024 JAPH (Sik-Ho Tsang @ Medium)
Medical/Clinical/Healthcare NLP/LLM
2017 … 2023 [MultiMedQA, HealthSearchQA, Med-PaLM] [Med-PaLM 2] [GPT-4 in Radiology] [ChatGPT & GPT‑4 on USMLE] [Regulatory Oversight of LLM] [ExBEHRT] [ChatDoctor] [DoctorGLM] [HuaTuo] 2024 [ChatGPT & GPT-4 on Dental Exam] [ChatGPT-3.5 on Radiation Oncology] [LLM on Clicical Text Summarization] [Extract COVID-19 Symptoms Using ChatGPT & GPT-4]
==== My Other Paper Readings Are Also Over Here ====

I come across this paper as my project leaders have shared this paper recently to have a look due to the project’s need.
In this paper, authors evaluate the accuracy of answers provided by a chatbot (ChatGPT) in response to questions that patients should ask before taking a new medication.

Outline

ChatGPT on Patients Medication
Results

1. ChatGPT on Patients Medication

1.1. Method

12 questions obtained from the Agency for Healthcare Research and Quality (AHRQ) were asked to a chatbot (ChatGPT) for the top 20 drugs.
The top 20 drugs are listed below wih issues noted:

**Top 20 Drugs** (There are quite a number of 3-hypers drugs)

Therefore, 12 questions were asked for 20 medications generating 240 individual responses from the model.

1.2. Correctness and Completeness

2 reviewers independently evaluated and rated each response on a 6-point scale for correctness and a 3-point scale for completeness with a score of 2 considered adequate.
6-point correctness scale (1 = completely incorrect; 2 = more incorrect than correct; 3 = approximately equal correct and incorrect; 4 = more correct than incorrect; 5 = nearly all correct; and 6 = completely correct) Accuracy was determined using clinical expertise and a drug information database.
3-point completeness scale (1 = incomplete [addresses some aspects of the question, but significant parts are missing or incomplete]; 2 = adequate [addresses all aspects of the question and provides the minimum amount of information required to be considered complete]; and 3 = comprehensive [addresses all aspects of the question and provides additional information or context beyond what was expected]).
After the independent reviews, the 2 reviewers met to compare answers and discuss any discrepancies and assign a consensus score for correctness and completeness.

1.3. Reproducibility

To assess for reproducibility, responses that were scored as less than 6 for correctness were reassessed 14 days. They can be either improve in correctness, have no change, or decrease in correctness.

2. Results

2.1. Correctness

Out of 240 responses, 222 (92.5%) were assessed as completely correct. Of the incorrect responses, 10 (4.2%) provided information that was nearly all correct, 5 (2.1%) more correct than incorrect, 2 (0.8%) were equal parts correct and incorrect, 1 (0.4%) was more incorrect than correct, and none (0%) were completely incorrect.

2.2. Completeness

Of the 240 responses, 194 (80.8%) were comprehensively complete. A score of 2 was considered adequate, and 235 (97.9%) scored 2 or higher indicating at least an adequate level of completeness. 5 (2.1%) were considered incomplete.

2.3. Reproducibility

When the 18 items that scored low for correctness, they need to be reassessed, responses were scored the same as the initial query for 6 items, decreased in quality for 5 items, and improved in quality for 7 items.

The median correctness score was 5 (IQR 5–4) with the initial query and 4.5 (IQR 6–2) (p=0.64) in the repeat query.

2.4. Discussions & Limitations

This raises some concerns regarding the consistency of accuracy if chatbots are used in the clinical setting.

Pharmacists are uniquely trained to counsel patients on the most important aspects of a medication. They are not required to have a question prompted correctly by a patient to cover the counseling point while ChatGPT needs.

If a chatbot is used as a source to inquire about medication information, specific, singular prompts may be required for accurate responses.

With only 2 investigators, there is a risk for personal bias and subjective interpretations.
An additional limitation to this study is that it was conducted using English only.
Utilizing a chatbot to answer questions commonly asked by patients is mostly accurate but may include or lack valuable information for patients. Educating patients is important.

Brief Review — Accuracy of a Chatbot in Answering Questions that Patients Should Ask Before Taking a New Medication

ChatGPT on Patients Medication

Outline

1. ChatGPT on Patients Medication

1.1. Method

1.2. Correctness and Completeness

1.3. Reproducibility

2. Results

2.1. Correctness

2.2. Completeness

2.3. Reproducibility

2.4. Discussions & Limitations

Written by Sik-Ho Tsang

No responses yet