Brief Review — COVID-QA: A Question Answering Dataset for COVID-19

RoBERTa on 2019 COVID Questions, COVID-QA Dataset

Sik-Ho Tsang
2 min readMay 7, 2024

COVID-QA: A Question Answering Dataset for COVID-19
, by deepset GmbH, Intel Corporation, and Lawrence Livermore National Laboratory
2020 ACL Workshop NLP-COVID19, Over 80 Citations (

@ Medium)

Medical/Clinical/Healthcare NLP/LLM
20172024 [ChatGPT & GPT-4 on Dental Exam] [ChatGPT-3.5 on Radiation Oncology]
==== My Other Paper Readings Are Also Over Here ====

  • COVID-QA is proposed, which is a Question Answering dataset consisting of 2,019 question/answer pairs annotated by volunteer biomedical experts on scientific articles related to COVID-19.
  • RoBERTa-base is used for benchmarking.


  2. Benchmarking Results


1.1. Dataset

  • 147 scientific articles are selected mostly related to COVID-19 from the CORD-19 (The White House Office of Science and Technology Policy, 2020 (accessed May 9, 2020) collection to be annotated by 15 experts.
  • The annotations were created in SQuAD style fashion where annotators mark text as answers and formulate corresponding questions.
  • COVID-QA differs from SQuAD in that answers come from longer texts (6118.5 vs 153.2 tokens), answers are generally longer (13.9 vs. 3.2 words) and it does not contain n-way annotated development nor test sets.

1.2. Model

  • RoBERTa-base is used, either the baseline model vs. the model finetuned on COVID-QA.

2. Results


As shown above, finetuning the model on COVID-QA results in significant improvement across both metrics though the overall scores are pretty low compared to SQuAD.

  • It is hypothesized the low scores relate to more complex question/answer pairs on much longer documents and the lack of multiple annotations per question.



Sik-Ho Tsang

PhD, Researcher. I share what I learn. :) Linktree: for Twitter, LinkedIn, etc.