MED-PaLM2 and ChatGPT: complement to improve diagnosis and clinical reasoning

Based on the article: Toward expert-level medical question answering with large language models. Nature Medicine. Published online: 8 January 2025 (access the article in reference 1).

Artificial Intelligence (AI) has revolutionised the speed of computation, but above all its development has given rise to the possibility of ‘talking’ in real time with a machine, which also has algorithms that make it almost humanised, with its alternative questions, or the simple act of saying thank you. In addition, the possibility of access to almost infinite information helps another human limitation such as memory.

Of course these systems are not perfect, in the end they must resort to information present in the networks, with the possibility of biases, especially when certain processes or even social groups are underrepresented.

AI and Clinical Diagnosis

Human pathology is vast, yet many different processes present with common signs and symptoms, and all of this leads to a degree of uncertainty during the diagnostic process. The human mind is capable of establishing casual relationships, and this, for the moment, is something that AI models cannot do.

One of the problems of the doctor’s mind in establishing a possible diagnosis is related to the context in which that professional performs his or her action, his or her experience, but above all also the memory that is limited for the human brain. This aspect of memory can be largely covered by AI models, but not only this. The development of AI can also have an impact as quality control, as the software can serve as an interlocutor to help increase the certainty of the decision to be made, by setting questions to the user of the software as a way to reduce possible uncertainty

What AI tools do we really have?

It is well-known that diagnostic radiology is incorporating AI tools that may be able to outperform the human eye, for example in reading mammograms, or in interpreting biopsy samples, as well as in the already proven role of its ability to recognise diabetic retinopathy early on. But here we want to focus on the clinician, who handles patients in the emergency department or on the ward. What have we got so far? So far the winner is ChatGPT, developed by OpenAI, in its latest version ChatGPT-4. It is the most widely used application, with the possibility of free use, but it has already introduced a paid version.

On the other hand, the company Google (Google Deep Mind), has developed MED-PaLM2, also oriented to medical diagnosis and differential diagnosis, but it still has a private experimental use. There are other applications in progress, but we want to focus on these two.

Which application is more useful for diagnosis?

It is difficult to answer this question clearly, as many of the published papers are difficult and complex to analyse.

At the moment we can say that the ‘winner’ is ChatGPT because it is widely used, while MED-PaLM2 is not yet in free use, but we can advance or have an opinion on it.

Most of the AI applications have been tested with multi-answer question banks, and so, in general, ChatGPT’s accuracy is over 90% on the USMLE exam, while MED-PaLM2 is around 86%.

ChatGPT tends to give more explanations when making a differential diagnosis, while MED-PaLM2 answers more as a terse type. This application has been trained with clinical guidelines, UpToDate and PubMed. In an article published about this application (1), both general practitioners and specialists rated the programme’s answers as reliable as those of the doctors.

Conclusions

Diagnostic aids tools are not new. Decision tools such as the IsabelHealthCare programme, well known in the Anglo-Saxon world, produced lists of diagnostic possibilities but did not introduce probability, and the databank was limited, but the final diagnosis, when compared with clinicians, was usually listed, and both clinicians and the tool had a diagnostic accuracy at the start of the diagnostic process of 60% (2).

The new AI applications have an advantage over previous tools in that access to the databank is enormous, they ‘learn’ with use and allow a ‘conversation’ to refine the final diagnosis.

So far, no clear advantage of AI applications for diagnosis has been demonstrated in relation to medical specialists in their subject, and generally the accuracy figures for both are in the range of 60-65% at the beginning of the diagnostic process, once compared to the final diagnosis.

We don’t really know how many medical or nursing professionals are already using these tools for diagnosis, but they will undoubtedly be a revolution in helping to retrieve data from memory, to train oneself in clinical reasoning, and to improve the safety of both the patient and the professional.

Author: Lorenzo Alonso Carrión

FORO  OSLER

Compartir:
Share