Can academics tell the difference between AI-generated and human-authored content?

A recent study asked students and academics to distinguish between scientific abstracts generated by ChatGPT and those written by humans. Omar Siddique analyses the results

Omar Siddique's avatar
15 Feb 2024
bookmark plus
  • Top of page
  • Main text
  • More on this topic
bookmark plus
A human hand writes in a notebook while a robot hand types on a computer

Created in partnership with

Created in partnership with

The University of Adelaide

You may also like

How hard can it be? Testing the dependability of AI detection tools
4 minute read
A yellow toy robot waves hello

There has been an unprecedented increase in the use of Artificial Intelligence (AI) software since OpenAI’s launch of ChatGPT in November 2022. As of December 2023, ChatGPT has 180 million regular users. Students and researchers can take advantage of ChatGPT’s properties to craft literature that mimics human-authored content. But it’s such a recent innovation that there is limited research exploring the practicability of using ChatGPT within research writing.

There is limited understanding of whether individuals can successfully distinguish between AI-generated and human-authored text, and if they regard AI-generated text as higher quality. A recent experimental study we conducted aimed to investigate the following questions:

  1. Can psychology students and researchers distinguish between AI-generated and human-authored journal abstracts?
  2. Does the level of experience the researcher has influence their ability to distinguish between AI-generated and human-authored content?
  3. Do psychology students and researchers evaluate the textual features of AI abstracts differently to the way they evaluate human abstracts?

As AI evolves and more organisations take advantage of models such as ChatGPT, it will become more important than ever to explore their ethical and practical use so we can understand what role they can play in education and research.

Structure of the study

Our sample consisted of 56 participants from accredited psychology institutes, of which 27 were undergraduates, 15 were postgraduate students and 14 were researchers who had completed postgraduate study. Each participant was given a survey that consisted of 10 abstracts from varying fields across the top psychology journals. Five of these abstracts were human-authored and five were generated by ChatGPT. The input or “prompt” we provided ChatGPT was consistent across all generated abstracts. We asked ChatGPT to generate an abstract based on the information we provided such as title, name of journal, the methodology and the results.

Within the survey, we asked participants to evaluate the given abstract as AI-generated or human-authored. Additionally, we asked participants to rate, on a five-point scale, the textual features of the given abstract such as fluency, readability and clarity. This evaluation matrix was referred to as the Text Evaluation Scale (TES) and was scored out of 25.

What did we find?

We discovered that participants were generally quite bad at discerning between AI-generated and human-authored content. With a score just above chance, participants had a 52 per cent success rate at identifying the given abstract correctly. Interestingly, participants with the highest level of researcher experience were the least successful at this task. Undergraduates and ongoing postgraduates had similar scores to each other, and both performed better than completed postgraduates at a statistically significant level.

Even more interestingly, participants evaluated AI-generated abstracts to have significantly better textual quality than human-authored abstracts. Our findings, though significant, should be interpreted with caution - our sample size was relatively small, which means it’s difficult to generalise these findings to the rest of the general population. Furthermore, the TES is a newly created scale that has not undergone sufficient psychometric testing to render it a credible measure of linguistic text. Further research is needed to consolidate our findings.

What does it all mean?

Our study highlights that AI text generated by ChatGPT has the ability to effectively mimic human-authored text, presenting a range of implications. First of all, it establishes that researchers can incorporate AI-generated content within their work to make the research process more efficient. Researchers can focus on more critical aspects of research while delegating written content to AI.

In fact, as AI-generated content is evaluated to be of higher textual quality, researchers can improve their literary compositions using AI. But the inability to discern AI-generated text shows that academic breaches can happen without detection. This makes the attribution of authorship unclear – can human authors claim authorship of content generated by AI?

The innovation of ChatGPT means non-English-speaking researchers could use AI to compose English text that is high in quality, allowing researchers to publish in international journals and foster further collaboration and development within science. However, some institutions could restrict use of such technology among researchers and this would create inequity within the research community.

Pathway to the future

Our study highlights the need for AI literacy to be incorporated into academic education. Students and academics alike must be made aware of the properties of AI and its ethical use through academic programmes or specialised training in university settings. They can take advantage of software like ChatGPT to generate well-worded work but must tread carefully if their institutions regulate such AI use. The use of ChatGPT sparks questions about authenticity and authorship that academics must consider thoroughly.

There is significant potential for models such as ChatGPT to become effective research tools. Academics can utilise our research to build on findings and add a new perspective to this growing and important area of study. Our research focused on the field of psychology, but the use of AI to craft literary pieces has practical implications in a range of fields. Academics could explore ChatGPT’s creative writing aptitude and compare it to human-written content. The implications of ChatGPT are vast, and time will tell how effectively we can use this ground-breaking technology.

Omar Siddique is a postgraduate student at the University of Adelaide.

If you would like advice and insight from academics and university staff delivered direct to your inbox each week, sign up for the Campus newsletter.


You may also like

sticky sign up

Register for free

and unlock a host of features on the THE site