‘Inconsistent’ AI detection ‘should prompt assessment rethink’

Study finds detectors struggle to accurately identify amount of AI content when papers have been partially human written

Published on
June 14, 2026
Last updated
June 14, 2026
Marking exams
Source: iStock/Fabrique Imagique

The minor use of large language models (LLMs) by students in their work may be overstated by artificial intelligence (AI) detection tools, according to a paper.

At the same time, the research suggests, the tools may be undercounting a heavier reliance on programs such as ChatGPT.

For the study, published in Education and Information Technologies, researcher Lucky E. Atamhenwan fed 81 sample essays into Turnitin. The scripts ranged from those that were 100 per cent LLM-generated – either by ChatGPT, Copilot or Gemini – to those written solely by people.

Turnitin did not flag any of the essays that were 100 per cent human written as being generated by AI.

ADVERTISEMENT

And in every instance in which the detector flagged AI-generated words, it was indeed due to the presence of LLM-generated work in those samples.

But the software struggled with the scripts that were partially AI-written, consistently failing to identify the correct percentage of LLM-generated work included.

ADVERTISEMENT

For essays with a low percentage of LLM-generated words – between 15 per cent and 40 per cent – Turnitin’s AI score, which declares how much of a submission it considers to have been produced by the technology rather than by a human, was often higher than the actual amount.

But for scripts that had a high percentage of LLM-generated words – between 70 per cent and 100 per cent – the score was consistently lower.

Atamhenwan, the founder of AI company Genducate Learning and an academic at Central Queensland University, said the results should prompt universities to design assessments that do not require the use of detectors.

“In most student cohorts, the majority are ethical learners who avoid academic misconduct. Consequently, these findings suggest that students who use generative AI transparently and in line with institutional policies have nothing to fear,” he told Times Higher Education.

ADVERTISEMENT

“Most institutional guidelines specify that an AI detector score alone does not prove misconduct. The findings confirm that relying solely on these scores would be erroneous. Instead, an AI score, especially above 60 per cent, should be treated as one key indicator alongside institutional generative AI usage and academic misconduct policies, and student transparency to evaluate potential academic misconduct.”

Sam Illingworth, a professor researching AI literacy at Edinburgh Napier University, said that the study raised serious questions about the use of AI detection tools.

Describing the use of AI by students whose first language is not English as a legitimate application of the technology, he noted that this could end up being flagged unjustly by some detection tools. Those who need AI’s help to “structure slightly” their essay could similarly fall foul of the tools.

“Why are we policing our students?” Illingworth said. “That’s not why I became an educator. Students should be co-curators of knowledge with us; we should be operating from a position of trust.”

ADVERTISEMENT

In a statement, Josh Johnston, vice-president of AI at Turnitin, said that detecting AI-generated writing “should serve as a kick-off to a conversation” between teachers and their students.

“We developed the tool to minimise unfounded accusations, which is why we do not report AI writing less than 20 per cent, and we test to keep false positives under 1 per cent. Core to our design principle is the trade-off of missing some AI-written text in order to build a better student experience.

ADVERTISEMENT

“That said, no detection tool is perfect. The study’s results show that Turnitin’s AI writing scores move in the right direction: papers with more AI writing receive higher AI scores. At the same time, given the study is looking at a small set of artificially mixed human- and AI-written documents, score similarities or differences could play out differently in real student submissions.”

georgia.luckhurst@timeshighereducation.com

Register to continue

Why register?

  • Registration is free and only takes a moment
  • Once registered, you can read 3 articles a month
  • Sign up for our newsletter
Please
or
to read this article.

Related articles

Sponsored

Featured jobs

See all jobs
ADVERTISEMENT