ChatGPT can pass US medical licence exams, study claims

AI-generated answers showed ‘new, non-obvious and clinically valid’ insights in tests usually taken by students after years of study

February 9, 2023

Twitter: @TWilliamsTHE

Source: iStock

Answers generated by artificial intelligence can pass the examinations needed to be granted a medical licence in the US, a new study has claimed.

Researchers said OpenAI’s software ChatGPT scored at or around the 60 per cent threshold in the series of three tests that make up the Medical Licensing Exam (USMLE) with “coherent” responses that “contained frequent insights”.

Achieving a pass in the “notoriously difficult” assessments – usually taken by medical students after at least two years of study – was seen as a “milestone” for the development of AI tools that could have wide-reaching implications for medical education, according to the study’s authors.

But other academics questioned the validity of the findings, published in the open access journal Plos Digital Health, and called the study a publicity stunt for the healthcare company that backed the researchers involved.

Author Tiffany Kung – a clinical fellow in anaesthesia at Massachusetts General Hospital, part of Harvard Medical School – and colleagues used 350 questions from the June 2022 USMLE, incorporating most medical disciplines from biochemistry to diagnostic reasoning.

Their paper found that, after indeterminate responses were removed, ChatGPT scored between 52.4 per cent and 75 per cent across the exams, which usually have a pass threshold of around 60 per cent.

THE Campus resource: ChatGPT has arrived – and nothing has changed

They add that ChatGPT also demonstrated 94.6 per cent concordance across all its responses and produced at least one significant insight – defined as “something that was new, non-obvious and clinically valid” – for 88.9 per cent of its responses.

These were higher scores than those achieved by another AI chatbot, PubMedGPT, which had been trained exclusively on biomedical domain literature. It scored 50.8 per cent on an older dataset of USMLE-style questions.

The authors note that the sample size of questions used was relatively small but feel their study provides “a glimpse of ChatGPT’s potential to enhance medical education, and eventually, clinical practice”.

A preprint of the article circulated on social media had listed ChatGPT as an author as the researchers had asked it to “synthesise, simplify and offer counterpoints to drafts in progress”. The chatbot’s citation was removed ahead of final publication, but Dr Tung stressed that it had “contributed substantially to the writing of [our] manuscript”.

Reacting to the study, Peter Bannister, executive chair of the Institution of Engineering and Technology, said ChatGPT “continues to demonstrate an impressive ability to generate logical content in numerous settings” and the results “serve to highlight the limitations of written tests as the only way of assessing performance in complex and multidisciplinary professions such as medicine”.

“While the results may be of great interest, the study has important limitations that call for caution,” warned Lucía Ortiz de Zárate Alcarazo, a pre-doctoral researcher in the ethics and governance of artificial intelligence at the Autonomous University of Madrid.

“We will have to wait and see what results are obtained when ChatGPT is applied to a larger number of questions and, in turn, is trained with a larger volume of data and more specialised content,” she said.

Ms Ortiz de Zárate Alcarazo added that the results had only been evaluated by two doctors and further studies would need to employ a larger number of qualified evaluators to be able to endorse the findings.

Collin Bjork, senior lecturer in science communication at Massey University, said the claim that ChatGPT could pass the exams was “overblown and should come with a lengthy series of asterisks”.

He noted that all but one of the authors work for Ansible Health, a Silicon Valley-based healthcare start-up that would soon be likely to need more investment capital. “The media splash from this well-timed journal article will certainly help fund their next round of growth,” Dr Bjork said.

He added claims about the insight shown by the chatbot were “misleading” due to the “vague” definition used by researchers for what constituted this. Claims that AI would one day be able to teach medicine were “naive”, Dr Bjork said. “How can an unaware learner distinguish between true and false insights, especially when ChatGPT only offers ‘accurate’ answers on the USMLE a little more than half the time?”

tom.williams@timeshighereducation.com

Read more about

Read more about:

Technology and new media

Register to continue

Why register?

Registration is free and only takes a moment
Once registered, you can read 3 articles a month
Sign up for our newsletter

Subscribe

Or subscribe for unlimited access to:

Unlimited access to news, views, insights & reviews
Digital editions
Digital access to THE’s university and college rankings analysis

Please or to read this article.

Related articles

Montage of metal detectorists on beach with newsprint. To illustrate the scramble to create AI essay detectors.

Inside the post-ChatGPT scramble to create AI essay detectors

Edtech giants and plucky start-ups are vying to create potentially lucrative tools to combat the use of AI in assessments, but will they cause more problems than they solve?

By Tom Williams

6 February

A human outpaces robots on a running track, illustrating that ChatGPT is not a threat

Don’t fear ChatGPT: education will always trump technology

Education has been incorporating and reimagining the threats and possibilities of tech for decades. AI will be no different, says Paul Breen

29 January

Albert Einstein head in a humanoid prototype to illustrate AI will replace academics unless our teaching challenges students again

AI will replace academics unless our teaching challenges students again

Delivery of educational material chunked at the optimal grade for retention by passive student-consumers is ripe for automation, says Andy Farnell

By Andy Farnell

19 January

Man talking with a robot to illustrate ChatGPT assessed as ‘powerful tool for education if used correctly’

ChatGPT ‘a powerful tool for education if used correctly’

AI becoming smarter and more accessible should be viewed not only as a cheating risk, say experts

By Tom Williams

19 December

Reader's comments (2)

#1 Submitted by Mapcar on February 9, 2023 - 9:23pm

Anyone reading the actual paper with a medical background will realise that there is zero visibility on the MCQ sample that ChatGPT is supposed to have successfully answered. Most likely, there is a strong bias towards those not requiring differential diagnosis or pathophysiological reasoning, namely those for whom the answer exists under a near-litteral form in one of the corpora crawled by the LLM.

#2 Submitted by THE3000 on February 10, 2023 - 12:08pm

So back to viva voce exams, in person, with no external links or practical exams in labs?

Sponsored