Brief Review — Performance of Generative Artificial Intelligence in Dental Licensing Examinations

ChatGPT-3.5 & GPT-4 on Dental Licensing Exam

2 min readMar 14, 2024

Performance of Generative Artificial Intelligence in Dental Licensing Examinations
ChatGPT & GPT-4 on Dental Exam, by The University of Hong Kong, and Hong Kong Chu Hai College
2024 Int. Dental J. (Sik-Ho Tsang @ Medium)
Medical/Clinical NLP/LLM
2017 … 2023 [MultiMedQA, HealthSearchQA, Med-PaLM] [Med-PaLM 2] [GPT-4 in Radiology] [ChatGPT & GPT‑4 on USMLE] [Regulatory Oversight of LLM] [ExBEHRT]
==== My Other Paper Readings Are Also Over Here ====

A total of 1461 multiple-choice questions from question books for the US and the UK dental licensing examinations were input into 2 versions of ChatGPT-3.5 and GPT-4 for evaluation.

1. Dataset

The Integrated National Board Dental Examination (INBDE) and the Overseas Registration Examination (ORE) are the US and the UK dental licensing examinations, respectively, and were selected as the sampling base for this study.
The passing rates of the US and UK dental examinations were 75.0% and 50.0%, respectively.
A total of 1461 questions were selected for this study, including 745 questions for the INBDE and 716 questions for the ORE.
32 questions with figures or tables were excluded, all from the ORE textbooks.

2. Results

GPT-3.5 correctly answered 805 out of the 1461 questions, scoring 56.3%. GPT-4 correctly answered 1030 out of the 1429 questions, scoring 72.1%, which was higher than GPT-3.5’s score.

For the 745 INBDE questions, GPT-3.5 correctly answered 509, achieving a score of 68.3%, whereas GPT-4 correctly answered 601, achieving a score of 80.7%, a better result than GPT-3.5.
Of the 684 ORE questions, GPT-3.5 answered 296 correctly, scoring 43.3%, whilst GPT-4 answered 429 correctly, scoring 62.7%, an improvement from GPT-3.5.
Results are shown as below:

Brief Review — Performance of Generative Artificial Intelligence in Dental Licensing Examinations

ChatGPT-3.5 & GPT-4 on Dental Licensing Exam

1. Dataset

2. Results

Written by Sik-Ho Tsang

No responses yet