AI Language Models Outperform Medical Models in Diagnosing Complex Cases, Israeli Scientists Say

AI (Shutterstock) (Shutterstock)

The research has several practical applications in healthcare, primarily by improving the speed and accuracy of medical diagnoses.

By Pesach Benson, TPS

A team of researchers from Ben-Gurion University of the Negev has developed a new database to test the ability of AI language models to diagnose complex medical cases.

Their findings, presented at the Association for the Advancement of Artificial Intelligence in Philadelphia, suggest that general-purpose models, like GPT-4o, may be more effective than models designed specifically for medicine.

Traditionally, AI language models have been tested on simpler medical cases, such as exam questions or common diseases.

However, these models have not been evaluated on the kind of complex, real-world cases doctors often face.

To fill this gap, the researchers built a database of 3,562 medical case reports from the BMC Journal of Medical Case Reports, featuring detailed descriptions of unusual medical cases and their diagnoses.

The cases were presented using both multiple-choice and open-ended questions, mimicking real-life diagnostic scenarios.

The results were surprising. GPT-4o, a general-purpose language model, outperformed medical models like Meditron-70B and MedLM-Large in diagnosing these complex cases.

GPT-4o achieved 87.9% accuracy on multiple-choice questions and 76.4% accuracy on open-ended questions, outperforming the specialized models.

“We were surprised to see that general models, like GPT-4o, performed better than those adapted for medicine,” said Ofir Ben-Shoham, one of the researchers. “We showed that large language models can be used to diagnose complex medical cases.”

This research is significant because it demonstrates that AI models like GPT-4o could help diagnose challenging medical conditions more efficiently.

The CUPCase database the team created could become a valuable tool for testing new AI models in the future. The database is open for use and can be expanded with additional cases as new models are developed.

“The goal was to create a system that could evaluate how well language models diagnose real-world, complex cases, not just the common ones,” said doctoral student Uriel Peretz.

Dr. Nadav Rapoport, another member of the research team, explained that diagnosing complex cases can be a lengthy and uncertain process, leading to delays and higher costs for patients.

The CUPCase database, by providing detailed real-world cases, can help speed up this process and improve patient care.

The research has several practical applications in healthcare, primarily by improving the speed and accuracy of medical diagnoses.

AI models like GPT-4o could assist doctors in diagnosing complex medical cases more quickly, reducing diagnostic delays and enhancing patient outcomes.

The CUPCase database, featuring a collection of real-world cases, can serve as a valuable clinical decision support tool, helping doctors make more accurate decisions, especially for difficult or rare cases.

Additionally, the AI model could aid in training medical professionals, offering an interactive resource for learning complex diagnostic processes.

AI-powered tools could also expand access to expert-level diagnostic support in underserved areas, where specialists may be limited. In critical care settings, AI models could provide real-time diagnostic assistance.

Do You Love Israel? Make a Donation - Show Your Support!

Donate to vital charities that help protect Israeli citizens and inspire millions around the world to support Israel too!

Now more than ever, Israel needs your help to fight and win the war -- including on the battlefield of public opinion.

Antisemitism, anti-Israel bias and boycotts are out of control. Israel's enemies are inciting terror and violence against innocent Israelis and Jews around the world. Help us fight back!

STAND WTH ISRAEL - MAKE A DONATION TODAY!