Brief Review — Accuracy of a Chatbot in Answering Questions that Patients Should Ask Before Taking a New Medication
ChatGPT on Patients Medication
Accuracy of a Chatbot in Answering Questions that Patients Should Ask Before Taking a New Medication
ChatGPT on Patients Medication, by University of Arizona
2024 JAPH (Sik-Ho Tsang @ Medium)Medical/Clinical/Healthcare NLP/LLM
2017 … 2023 [MultiMedQA, HealthSearchQA, Med-PaLM] [Med-PaLM 2] [GPT-4 in Radiology] [ChatGPT & GPT‑4 on USMLE] [Regulatory Oversight of LLM] [ExBEHRT] [ChatDoctor] [DoctorGLM] [HuaTuo] 2024 [ChatGPT & GPT-4 on Dental Exam] [ChatGPT-3.5 on Radiation Oncology] [LLM on Clicical Text Summarization] [Extract COVID-19 Symptoms Using ChatGPT & GPT-4]
==== My Other Paper Readings Are Also Over Here ====
- I come across this paper as my project leaders have shared this paper recently to have a look due to the project’s need.
- In this paper, authors evaluate the accuracy of answers provided by a chatbot (ChatGPT) in response to questions that patients should ask before taking a new medication.
Outline
- ChatGPT on Patients Medication
- Results
1. ChatGPT on Patients Medication
1.1. Method
- 12 questions obtained from the Agency for Healthcare Research and Quality (AHRQ) were asked to a chatbot (ChatGPT) for the top 20 drugs.
- The top 20 drugs are listed below wih issues noted:
Therefore, 12 questions were asked for 20 medications generating 240 individual responses from the model.
1.2. Correctness and Completeness
- 2 reviewers independently evaluated and rated each response on a 6-point scale for correctness and a 3-point scale for completeness with a score of 2 considered adequate.
- 6-point correctness scale (1 = completely incorrect; 2 = more incorrect than correct; 3 = approximately equal correct and incorrect; 4 = more correct than incorrect; 5 = nearly all correct; and 6 = completely correct) Accuracy was determined using clinical expertise and a drug information database.
- 3-point completeness scale (1 = incomplete [addresses some aspects of the question, but significant parts are missing or incomplete]; 2 = adequate [addresses all aspects of the question and provides the minimum amount of information required to be considered complete]; and 3 = comprehensive [addresses all aspects of the question and provides additional information or context beyond what was expected]).
- After the independent reviews, the 2 reviewers met to compare answers and discuss any discrepancies and assign a consensus score for correctness and completeness.
1.3. Reproducibility
- To assess for reproducibility, responses that were scored as less than 6 for correctness were reassessed 14 days. They can be either improve in correctness, have no change, or decrease in correctness.
2. Results
2.1. Correctness
Out of 240 responses, 222 (92.5%) were assessed as completely correct. Of the incorrect responses, 10 (4.2%) provided information that was nearly all correct, 5 (2.1%) more correct than incorrect, 2 (0.8%) were equal parts correct and incorrect, 1 (0.4%) was more incorrect than correct, and none (0%) were completely incorrect.
2.2. Completeness
Of the 240 responses, 194 (80.8%) were comprehensively complete. A score of 2 was considered adequate, and 235 (97.9%) scored 2 or higher indicating at least an adequate level of completeness. 5 (2.1%) were considered incomplete.
2.3. Reproducibility
When the 18 items that scored low for correctness, they need to be reassessed, responses were scored the same as the initial query for 6 items, decreased in quality for 5 items, and improved in quality for 7 items.
- The median correctness score was 5 (IQR 5–4) with the initial query and 4.5 (IQR 6–2) (p=0.64) in the repeat query.
2.4. Discussions & Limitations
This raises some concerns regarding the consistency of accuracy if chatbots are used in the clinical setting.
- Pharmacists are uniquely trained to counsel patients on the most important aspects of a medication. They are not required to have a question prompted correctly by a patient to cover the counseling point while ChatGPT needs.
If a chatbot is used as a source to inquire about medication information, specific, singular prompts may be required for accurate responses.
- With only 2 investigators, there is a risk for personal bias and subjective interpretations.
- An additional limitation to this study is that it was conducted using English only.
- Utilizing a chatbot to answer questions commonly asked by patients is mostly accurate but may include or lack valuable information for patients. Educating patients is important.