The most talked about research of the week was easily the JAMA Internal Medicine study that found that licensed medical professionals prefer ChatGPT’s responses to patient questions better than doctors’ responses 79% of the time – in part because ChatGPT is more empathetic.
What wasn’t discussed as much was the fact that the doctors’ responses were sourced from questions on Reddit – AKA from doctors with the time and empathy to share their thoughts in the first place – so it’s possible that ChatGPT’s lead is even wider than it’s getting credit for.
Here’s how it worked. Researchers randomly pulled 195 exchanges from the r/AskDocs subreddit where a verified physician responded to a patient question.
- ChatGPT responses were then generated by entering the original question into a fresh session (without prior questions to work with). The anonymized responses were then evaluated by five licensed medical professionals.
- Evaluators chose “which response was better,” then judged “the quality of information provided” (very poor, poor, acceptable, good, or very good) and “the empathy or bedside manner provided” (not empathetic, slightly empathetic, moderately empathetic, empathetic, and very empathetic). Outcomes were transferred to a 5 point scale.
The results were a blowout. The proportion of responses rated as “good” or “very good” quality was 3.6x higher for ChatGPT, with 78.5% of ChatGPT responses scoring ≥4 points versus… 22.1% for physicians.
- The proportion of responses rated as “empathetic” or “very empathetic” was a whopping 9.8x higher for ChatGPT, with 45.1% of ChatGPT responses scoring ≥4 points versus just 4.6% for physicians.
- Disclaimer: It probably wasn’t too hard for the judges to discern between ChatGPT’s well-punctuated prose (not to mention its 211 word average response length) and the off-the-cuff Reddit comments of the physicians (52 word average).
The Takeaway
When it comes to answering the health questions of random Reddit users, tireless AI robots appear to be far better than physicians donating their time. That said, overflowing EHR inboxes remain a leading contributor to physician burnout, and the authors summed it up perfectly with: “Despite the limitations of this study and the frequent overhyping of new technologies, studying the addition of AI assistants to patient messaging workflows holds promise with the potential to improve both clinician and patient outcomes.”