ChatGPT diagnoses emergency room patients as well as a doctor, study finds

In emergency departments, the artificial intelligence (AI) chatbot ChatGPT performed diagnoses at least as well as doctors and in some cases outperformed them, Dutch researchers have found.

The researchers have said that AI could "revolutionise the medical field".

The authors of the study – which was published on Wednesday – stressed, however, that emergency doctors' days were not yet numbered: the chatbot may be able to speed up diagnosis but not replace a human's judgement and experience.

30 cases treated in an emergency department in the Netherlands in 2022 were examined by feeding ChatGPT patient histories, laboratory tests and doctors' observations. The chatbot was then asked to suggest five possible diagnoses. In 87% of cases, the correct diagnosis was found in the practitioners' list, compared with 97% for ChatGPT version 3.5.

The chatbot was "able to make medical diagnoses in much the same way as a human doctor would have done", summed up Hidde ten Berg, from the emergency department at the Dutch Jeroen Bosch Hospital.

Study co-author Steef Kurstjens stressed that the study did not conclude that computers could one day run emergency departments, but that AI could play a vital role in helping doctors under pressure.

The chatbot can "help with a diagnosis and can perhaps suggest ideas that the doctor hadn't considered," he told AFP. Such tools are not designed as medical devices, however, he noted, also sharing concerns about the confidentiality of sensitive medical data in a chatbot.

Related News

ChatGPT has also encountered limitations: its reasoning was "sometimes medically implausible or inconsistent, which can lead to misinformation or incorrect diagnosis, with significant consequences," the study notes.

The scientists also admit some shortcomings in their research, such as the small sample size. In addition, only relatively simple cases were examined, with patients presenting with a single chief complaint. The chatbot's effectiveness in complex cases is unclear.

Sometimes, ChatGPT did not provide the correct diagnosis in all five possibilities, Kurstjens explains, particularly in the case of an abdominal aortic aneurysm, a potentially life-threatening complication, with the aorta swelling. The report also points to medical "blunders" made by the chatbot, such as diagnosing anaemia (low haemoglobin in the blood) in a patient with a normal haemoglobin level.

The results of the study, published in the specialist journal Annals of Emergency Medicine, will be presented at the 2023 European Congress of Emergency Medicine (EUSEM) in Barcelona.