NLP for Symptom Detection | Pranav M. Murugan

Full text:

A summary is shown below.

Objective

We identify symptom-related conversation segments in physician-patient dialogue using natural language processing (NLP) techniques, with the goal of developing an automated pipeline for symptom detection in unstructured clinical conversation. Our work helps address the high symptom burden placed on patients, improving care satisfaction and decreasing healthcare costs.

Dataset Overview

Turn-level conversation data between patients and their healthcare provides
Roughly 79,000 turns spanning over 181 unique conversations and 94 unique patients
Around 13% of the turns are symptom-related

Model Comparison

Results

Left: Precision-recall curve for each method tested. Right: Receiver-Operator Characteristic (ROC) curves for each method. The dashed gray line indicates the curve for a perfectly random classifier. The associated AUROC for each of these curves can be found in Table 1. Error bars show the 95% confidence interval and were estimated with a bootstrap method using 250 resamples per threshold value.

Quantitative metrics for each model on the non-preprocessed data.

SHAP Plot using XGBoost.

Takeaways

Transformer-based BERT model performed the best, but LSTM is second best and is less computationally-intensive
BioBERT performed poorly, suggesting conventional NLP models are sufficient
Bag-of-words models are very interpretable; next steps may focus on the interpretation of the deep learning models
Future goal is to incorporate symptom detection in an automatic pipeline during patient care

Full text:

Objective

Dataset Overview

Model Comparison

Results

Takeaways

Full text:

Github: