Brief Review — BERT-Based Medical Chatbot: Enhancing Healthcare Communication through Natural Language Understanding

Medical Chatbot Using DistilBERT

Sik-Ho Tsang
4 min readJul 27, 2024
AI chatbot framework

BERT-Based Medical Chatbot: Enhancing Healthcare Communication through Natural Language Understanding
BERT-Based Medical Chatbot
, by Koneru Lakshmaiah Education Foundation
2024 Elsevier J. RCSOP (Sik-Ho Tsang @ Medium)

Medical/Clinical/Healthcare NLP/LLM
2017 … 2024 [ChatGPT & GPT-4 on Dental Exam] [ChatGPT-3.5 on Radiation Oncology] [LLM on Clicical Text Summarization] [Extract COVID-19 Symptoms Using ChatGPT & GPT-4] [ChatGPT on Patients Medication]
==== My Healthcare and Medical Related Paper Readings ====
==== My Other Paper Readings Are Also Over Here ====

Outline

  1. BERT-Based Medical Chatbot
  2. Results

1. BERT-Based Medical Chatbot

1.1. Data

  • The proposed system makes use of MIMIC-III, BioASQ, PubMed, COVID 19 datasets.
  • MIMIC-III is a comprehensive dataset containing clinical notes, diagnoses, medications, lab results, and other medical information.
  • PubMed facilitates access to a vast collection of biomedical literature, including abstracts and full-text articles.
  • COVID-19 includes pandemic related medical information, research articles, and clinical data.
  • BioASQ contains biomedical question and answers for information retrieval.

1.2. Model

DistilBERT
Configuration
  • Though DistilBERT is not mentioned to be used in the passage, according to Fig. 4, DistilBERT is used.
  • After processing by BERT or DistilBERT, the sequence of contextualized embeddings H is obtained for each token in the input which is characterized in the context of entire input sequence:

1.3. Entity Recognition

  • In entity recognition tasks like identifying medical conditions or symptoms in the query, authors mention that Conditional Random Field (CRF) is used with softmax activation on top of H:
  • where Entities represents the recognized entities in the input query.

1.4. Intent Classification

  • For intent classification tasks like determining the user’s intent while asking a question about symptoms or treatments, softmax layer is used over the pooled embeddings:
  • where pool(H) denotes the max pooling operation over the sequence of word embeddings.

1.5. Response Generation

  • For generating responses in a dialogue-based chatbot, a decoder model is adopted which takes H as input and generates a response sequence token by token.

1.6. Fine-Tuning

  • BERT is fine-tuned on medical domain specific data to adapt it to medical language and context which involves training the model on a specific medical dataset, adjusting its weights to improve performance on medical tasks.
  • Let X represents the input sequence for the downstream task which is a sequence of tokens and Y represents the ground truth labels for the task. Cross-entropy loss is represented as:

2. Results

  • The dataset size is around 11,000 medical questions and answers extracted from Sources MIMIC-III, BioASQ, PubMed, COVID 19 datasets.
  • Clinical experts from diverse medical domains, healthcare practitioners, including nurses and general practitioners, provided valuable insights into patient interactions and common health-related inquiries.
  • Language experts and cultural sensitivity consultants were actively engaged in the process.
  • Ethical consideration is addressed through collaboration with bioethicists and legal experts.
  • etc.
  • Sentences containing information on symptoms with labels in relation to medical specialty aspect, are collected.
  • (What kinds of labels they are, the paper does not mention in details.)

The proposed BERT based Medical Chabot has achieved the highest accuracy of 94%, demonstrating its superior performance.

  • It had a precision of 0.92, indicating high accuracy in query responses.
  • The AUC-ROC score of 0.97 suggests excellent power to predict specific diseases based on user queries and symptoms.
  • Recall at 0.95 indicates its ability to ensure that the chatbot doesn’t miss cases where the condition is present in medical diagnosis.
  • F1 score of 0.93 provides a balanced measure of precision and recall.

The proposed BERT outperformed all baseline models (LSTM, SVM, and BI-LSTM) across all metrics, demonstrating its effectiveness in contextual understanding and pre-trained embeddings give it a significant advantage in capturing nuanced linguistic patterns.

  • However, challenges exist, including the computational demands of BERT-based models, potential biases in training data influencing performance, and the interpretability of complex decision-making processes.
  • Continuous learning for adaptation to evolving healthcare scenarios and addressing data privacy concerns also pose operational considerations. Additionally, the model’s effectiveness may vary in handling uncommon medical cases with limited training data.

--

--

Sik-Ho Tsang
Sik-Ho Tsang

Written by Sik-Ho Tsang

PhD, Researcher. I share what I learn. :) Linktree: https://linktr.ee/shtsang for Twitter, LinkedIn, etc.

No responses yet