Brief Review — COVID-QA: A Question Answering Dataset for COVID-19

RoBERTa on 2019 COVID Questions, COVID-QA Dataset

Sik-Ho Tsang
2 min readMay 7, 2024
RoBERTa for COVID-QA

COVID-QA: A Question Answering Dataset for COVID-19
COVID-QA
, by deepset GmbH, Intel Corporation, and Lawrence Livermore National Laboratory
2020 ACL Workshop NLP-COVID19, Over 80 Citations (Sik-Ho Tsang @ Medium)

Medical/Clinical/Healthcare NLP/LLM
20172024 [ChatGPT & GPT-4 on Dental Exam] [ChatGPT-3.5 on Radiation Oncology]
==== My Other Paper Readings Are Also Over Here ====

  • COVID-QA is proposed, which is a Question Answering dataset consisting of 2,019 question/answer pairs annotated by volunteer biomedical experts on scientific articles related to COVID-19.
  • RoBERTa-base is used for benchmarking.

Outline

  1. COVID-QA
  2. Benchmarking Results

1. COVID-QA

1.1. Dataset

  • 147 scientific articles are selected mostly related to COVID-19 from the CORD-19 (The White House Office of Science and Technology Policy, 2020 (accessed May 9, 2020) collection to be annotated by 15 experts.
  • The annotations were created in SQuAD style fashion where annotators mark text as answers and formulate corresponding questions.
  • COVID-QA differs from SQuAD in that answers come from longer texts (6118.5 vs 153.2 tokens), answers are generally longer (13.9 vs. 3.2 words) and it does not contain n-way annotated development nor test sets.

1.2. Model

  • RoBERTa-base is used, either the baseline model vs. the model finetuned on COVID-QA.

2. Results

Results

As shown above, finetuning the model on COVID-QA results in significant improvement across both metrics though the overall scores are pretty low compared to SQuAD.

  • It is hypothesized the low scores relate to more complex question/answer pairs on much longer documents and the lack of multiple annotations per question.

--

--

Sik-Ho Tsang

PhD, Researcher. I share what I learn. :) Linktree: https://linktr.ee/shtsang for Twitter, LinkedIn, etc.