Brief Review — MedicationQA: Bridging the Gap Between Consumers’ Medication Questions and Trusted Answers

MedicationQA: 674 Question-Answer Pairs

Sik-Ho Tsang
2 min readOct 23, 2023
Word Cloud Representing the Consumer Questions about Drugs that We Used to Create the Gold Standard Corpus.

Bridging the Gap Between Consumers’ Medication Questions and Trusted Answers
MedicationQA, by National Library of Medicine
2019 SHTI (Sik-Ho Tsang @ Medium)

Medical Dataset
==== My Other Paper Readings Are Also Over Here ====

  • This paper addresses the task of answering consumer health questions about medications.
  • A gold standard corpus for Medication Question Answering is created using real consumer questions. The gold standard consists of 674 question-answer pairs with annotations of the question focus and type and the answer source.
  • Recurrent and convolutional neural networks are used in question type identification and focus recognition.
  • (This is a dataset evaluated by Med-PaLM. In Med-PaLM, the dataset proposed by this paper is named “MedicationQA”.)


  1. MedicationQA Dataset
  2. Results

1. MedicationQA Dataset

  • Each question is manually annotated with a:
  1. Question focus (always a Drug name in this dataset),
  2. Question type (e.g. Dose, Interaction, Side effects).
  • The ground-truth answer is an answer retrieved sequentially based on availability:
  1. MedlinePlus and DailyMed.
  2. Other NIH or U.S. government websites.
  3. Other trustworthy websites (e.g., the Mayo Clinic) or academic institutions’ websites.
  4. Other websites returned by a Google search.
  • The final gold standard contains 674 question-answer pairs with their associated annotations. These annotations include 25 question types, reported with examples as in Table 1 above.
  • The answer sources are summarized in Figure 4.
  • Table 2 shows the token-and-sentence-level statistics about the questions and the answers in the dataset.

2. Results

  • Focus Recognition: The Bi-LSTM-CRF network is trained on 80% training data. The CRF-based loss function is minimized.

74% F1 score in question focus recognition for exact span matching and 90% for partial span matching, are obtained.

  • Question Type Identification: The CNN network is trained. The softmax-based loss function is minimized.

The CNN network achieved an average accuracy of 75.7% on 5 runs with a variation in the [0, 2.5%] range.



Sik-Ho Tsang

PhD, Researcher. I share what I learn. :) Linktree: for Twitter, LinkedIn, etc.