An academic ChatGPT needs a better schooling

AI agents are what they ingest. Rather than scraping the internet, better to confine their diets to books and encyclopedias, says Sorin Adam Matei

十一月 28, 2023
A robot reading
Source: iStock

If you know how ChatGPT works, you won’t be surprised to learn that AI detection filters consider it highly likely that the chatbot had a large hand in writing the US Constitution and the Book of Genesis. Nor will you be surprised that ChatGPT is biased towards the latest intellectual ideas, skewing liberal.

AI agents are prediction engines using the web as their memory. They do no more than predict which words are more likely to follow any other word or group of words in a given language. When you ask ChatGPT a question, it parses it into words and their sequence, returning answers that match those sequences in reverse. It might sound like a simple trick, and it is, yet the secret sauce is the size of the database the AIs use to perform it.

Of the very heterogeneous mix of content used to train ChatGPT, 60 per cent was a hotchpotch of information culled from websites, blogs or social media. Another 20 per cent was content shared on Reddit and evaluated relatively highly by the users. The rest was books typically found in the public domain (mostly older and general purpose), with a bit of Wikipedia (3 per cent) mixed in for good measure.

AIs store for each word the probability that any other word will follow it. The quality and value of these predictions depend very much on how often and in how many circumstances the software encounters any two (or more words) in proximity, how long a sentence goes, and which sentence might follow another. When put together, these predictions favour the most influential texts of a given culture, which shaped generations upon generations of English language teachers and the students they educated.

Fed and raised on the incantations of Shakespeare and the literature that grew out of King James Bibles, this traditional English thought pattern could not but create AIs that could regenerate the Bible or the Constitution as if they were common knowledge. Yet when asked questions about everyday issues, AI agents will be more likely to use a liberal-secular tone because this perspective dominates web conversations.

Frequently, AI content mixes heavenly and earthly perspectives. For example, when you tempt ChatGPT with the prompt “Continue the story: In the beginning there was…” it will promptly deliver a Genesis-style Feynman physics lecture, “In the beginning, there was a profound stillness that seemed to stretch for eternity. Within this void, a single point of unimaginable density and energy existed. This singularity held within it the potential for all that would come to be. Then, in an instant that defied the very concept of time, the singularity erupted in a cataclysmic explosion known as the Big Bang.” (Try it, although your answer might vary.)

The overlap of old and new in ChatGPT-generated texts is not the cause but the result of the ongoing cultural strife of the American mind with itself. This tension should not lead to finger-pointing. But we do need a healthy conversation about the origins and uses of ChatGPT or its siblings, such as Google’s Bard, Facebook’s LLAMA or Anthropic’s Claude.

First, is such training, jumping from green energy and trans rights to sermons and pro-life arguments in one click, appropriate for a tool used in the academy? Suppose we raised the AI models/agents on a diet of 80 per cent books and 20 per cent information from curated encyclopedias, including Britannica. In that case, they would be less focused on the vagaries of the present and more concerned with the age-old dilemmas and gained certainties of academic knowledge.

Creating AI agents that cater to academic needs could be an expensive proposition, of course. However, given the enormous resources of the leading US and European universities, this could be a stimulating problem to be solved by a large consortium of higher education institutions, such as the American Association of Universities (AAU) or the European University Association. ChatGPT 4 cost “merely” $100 million (£79 million) to train. The AAU universities, a group of 69 large state and private universities, received $31 billion in funding in 2021.

Second, ChatGPT was created with a “just in case” mentality. It was meant to answer all questions for all purposes. This leads to tentative, “he said, she said” answers – even to questions whose answers we should be sure of, such as whether vaccines save lives or whether Communism is as genocidal as Nazism. When trained on specialised information, it should express more confidence about matters that truly matter.

Third, ChatGPT speaks like a parrot because its delivery is not automatically adjusted. More research and engineering are needed to calibrate the tool to each request’s real-life intentions and consequences. In academic learning, these situations should be the pre- and post-stages of the research process: finding arguments and packaging them for public consumption.

The in-between, the moment of discovery, should be reimagined in future pedagogies to scaffold around rather than fall back on AI agents. Assignments must connect to specific competencies demonstrated across written, multimedia and oral presentations. A return of the in-class written or oral exams (horribile dictu) should not be out of the question.

In their current forms, ChatGPT and its siblings are like those three-year-olds who can recite entire stories read to them only once. But turning a three-year-old into a learned person takes 20 years of strenuous, structured education. It is time to stop reading AI agents stories and send them to a real school.

Sorin Adam Matei is associate dean of research and graduate education at Purdue University’s College of Liberal Arts.



  • 注册是免费的,而且十分便捷
  • 注册成功后,您每月可免费阅读3篇文章
  • 订阅我们的邮件
Please 登录 or 注册 to read this article.