AI text detectors: a stairway to heaven or hell?

The emergence of GPTZero, OpenAI’s text classifier and Turnitin’s AI detector bring a risk of over-reliance on AI classifiers. Are they a solution or a further problem to be solved?

Miguel de Carvalho's avatar
4 Apr 2023
bookmark plus
  • Top of page
  • Main text
  • More on this topic
bookmark plus
Robot looking through a microscope

Created in partnership with

Created in partnership with

University of Edinburgh

You may also like

ChatGPT and learning design: what online content creation opportunities does it offer?
5 minute read
A robot assistant carrying a pile of books

It is claimed that artificial intelligence (AI) text classifiers are able to check if a text has been written by a human or by AI – and they are being developed by a variety of players, such as OpenAI, Turnitin and GPTZero. Clearly, these powerful tools are welcome given the current widespread use of ChatGPT, but, as with any AI tool, there are a few risks associated with their use. In particular, there is the risk of over-reliance – that is, of users blindly accepting an AI recommendation that might be wrong.

Can proper reliance be achieved?

The performance of AI text classifiers is reported using standard benchmarks, centred around the concepts of true positives and false positives on which we have built literacy during the pandemic, when assessing the performance of Covid-19 tests. In the context of AI text classifiers, false positives are texts that are wrongly assessed as written by AI. For example, OpenAI claims that:

Our classifier correctly identifies 26% of AI-written text (true positives) as ‘likely AI-written’, while incorrectly labelling human-written text as AI-written 9% of the time (false positives).

Another popular classifier, known as GPTZero, created by a senior at Princeton University, claims that it classifies 99% of the human-written articles correctly, and 85% of the AI-generated articles correctly”.

And more recently, Turnitin claims that its forthcoming classifier identifies 97 per cent of ChatGPT and GPT3-authored writing, with a very low, less than 1/100 false-positive rate”.

Although these summary metrics are an excellent reference, it is important to appreciate what aspects they miss.

First, the reported performance, as measured by true-positive and false-positive rates, might change across strata. In other contexts where these benchmarks are used, such as in medical statistics, they are often reported over different subpopulations, because it is well known that the accuracy of a medical classifier can change, for example, along with a patient’s age.

Similarly, in the context of AI text classifiers, performance can vary across different fields – such as biology, philosophy and the like: it can depend on the number of characters in the text and so on. Proper reliance on AI text classifiers thus requires users to be aware of this caveat, noting that the reported metrics might not apply exactly to their field. Hopefully, edtech will provide additional information on this soon, but for the moment there is scant detail available.

Second, it important to emphasise that the reported performance tells us nothing about the probability that a specific text is AI-written, given that the classifier claims it was AI-written. On the contrary, the reported rates are informative only about the reverse – that is, about the probability that the classifier claims a text is AI-written, given that it was indeed AI-written. In other words, the reported rates are not directly informative about the likelihood of a specific text to be truly written by AI; rather, they are mainly informative about the general performance of the AI text classifiers.

Seeing beyond reported performance measures

As noted above, GPTZero has a far superior performance to that of OpenAI in terms of true and false positives. However, keeping in mind the disclaimers made earlier about what these summary measures miss, let’s run two simple experiments on GPTZero and OpenAI. Let’s start with Stairway to Heaven, a song by Led Zeppelin that was released in 1971, more than 50 years before the rise of chatGPT. The outcome of GPTZero for this song is: "Your text may include parts written by AI”, and it highlights the following parts:

Ooh ooh ooh ooh ooh

And she’s buying a stairway to heaven

There’s a sign on the wall

But she wants to be sure

Oh whoa-whoa-whoa, oh-oh

If there's a bustle in your hedgerow, don't be alarmed now

It’s just a spring clean for the May Queen

Yes, there are two paths you can go by, but in the long run

And there’s still time to change the road you’re on

And it makes me wonder

Oh, whoa

Your head is humming and it won’t go

In case you don’t know

Robert Plant and Jimmy Page can rest reassured that AI will not bring them another long copyright case such as the one the band faced already over this song. Evidently, this is a false positive from GPTZero. Running the same lyrics on OpenAI Text Classifier yields: “The classifier considers the text to be unlikely AI-generated.” Nevertheless, an important question a marker would ask that remains unanswered is: How unlikely? What is the actual probability of that text being AI-generated?

Let’s now consider Bohemian Rhapsody by Queen. Again, GPTZero claims that “your text may include parts written by AI”, and it highlights the following parts:

He’s just a poor boy from a poor family,

Spare him his life from this monstrosity

Easy come, easy go, will you let me go


No, we will not let you go

OpenAI on the other hand claims that “the classifier considers the text to be very unlikely AI-generated”.

The much-needed AI text detectors are welcome, but we will need to keep having a holistic view in mind whenever we are judging academic misconduct cases. When they are assessing poor scholarship cases, educators must rely on human expertise and judgement and regard these classifiers as add-ons, whose conclusions require critical analysis. We must remember that it is much more harmful to falsely detect human-written text as AI-written.

Regardless of their potential, these classifiers must never be accepted as error-free oracles that accurately classify all human or AI-written text. If universities are going to use these tools extensively for marking, training should be offered in order to mitigate the risk of inappropriate use. Edtech will benefit from offering further detail on the performance of their detectors, and from inviting scrutiny by external researchers. Proper reliance will require combined action from both edtech and its users.

*The advice in the article is that of the author, and it does not necessarily reflect the University of Edinburgh’s position on the subject.

 Miguel de Carvalho is a reader in statistics at the University of Edinburgh.

If you found this interesting and want advice and insight from academics and university staff delivered direct to your inbox each week, sign up for the THE Campus newsletter.


You may also like

sticky sign up

Register for free

and unlock a host of features on the THE site