Tool performs well for textbook examples but fails with ambiguous symptoms
Life
Image: Karolina Grabowska via Pexels
ChatGPT Health fails to refer patients to appropriate acute care in more than half of emergency cases. A new study from Mount Sinai in New York shows that the AI tool of OpenAI underestimated urgency especially for less unambiguous symptoms. As a result, millions of users linking their medical records may be given life-saving advice to wait and see.
Research leader Ashwin Ramaswamy argued that the language model particularly struggled with situations where danger was not immediately apparent. In tests involving 60 clinical scenarios, the AI recognised early signs of respiratory failure in an asthmatic patient, but nevertheless advised him to wait rather than seek help. This contrast is striking, as the chatbot did respond correctly in obvious emergencies such as a severe allergic reaction. The researchers concluded that the technology was not currently a substitute for a doctor’s clinical judgement.
The study also uncovered a worrying pattern in the detection of self-harm. ChatGPT Health was supposed to refer users with suicidal thoughts directly to helplines. In practice, however, the banner with this information appeared irregularly. Paradoxically, the system responded more reliably to general reports than to patients who already mentioned a specific method of self-harm. Thus, precisely when the severity of the situation increased, the built-in safety mechanisms were more often omitted.
Despite these results, the authors do not advocate a total ban on AI health tools. Co-author Alvira Tyagi pointed out that these technologies were already in the hands of millions of people. She says it was essential to learn how to thoughtfully integrate these tools into healthcare without seeing them as substitutes for human expertise. As AI models are constantly evolving, constant evaluation remains necessary.
For now, the advice is to contact a healthcare provider directly in case of alarm symptoms such as chest pain or breathlessness and not to rely on a chatbot.
Business AM


