AI no substitute for human judgement? Even on your 99th exam script?

Humans’ epistemic arrogance belies the fact that subject knowledge is always incomplete and cognitive bandwidth is strictly finite, says Prince Sarpong 

Published on
March 12, 2026
Last updated
March 12, 2026
An academic with their head on their desk surrounded by exam scripts
Source: EyeEm Mobile GmbH/iStock

The discourse surrounding artificial intelligence in higher education has settled into an entirely predictable binary. On one side, proponents herald a new era of efficiency; on the other, sceptics fiercely defend academic integrity and the “human touch”.

Both sides raise valid points, but if we are to have a truly holistic debate about the future of academia, we must put all the cards on the table. Currently, a crucial consideration is missing from the conversation: the biological limit of the human evaluator.

Human cognitive bandwidth is a strictly finite resource. When an academic sits down to manually parse and provide feedback on 100 postgraduate exam scripts – in my subject of finance, tracking cascading numerical errors and verifying theoretical integrations – they are engaging in an act of severe cognitive endurance.

Educational psychology has long understood this through cognitive load theory. Working memory can only hold and manipulate a limited number of novel elements simultaneously. When the brain is forced to process this heavy cognitive load repeatedly, it inevitably hits a threshold of exhaustion.

ADVERTISEMENT

The resistance to delegating marking often comes from a place of deep professional pride but this insistence leads to decision fatigue. By the time an examiner reaches the 40th script in a stack, their brain shifts from analytical reasoning to a reliance on heuristics: mental shortcuts that require less energy but yield shallower judgements. Therefore, the depth, fairness and rigour of the feedback a student receives on script number 40 – never mind number 85 – is guaranteed to be inferior to the feedback provided on script number five. This is not a moral failing or a lack of dedication; it is a biological reality.

Conversely, we could perceive some moral failing in the epistemic arrogance of a marker or external examiner who believes that their judgement is unimpeachable even when there are gaps in their subject knowledge. Consider an external examiner on a PhD dissertation in finance. No doubt they will be a finance professor but if the thesis uses a highly specific, niche econometric technique – for example, a complex machine-learning approach to selection bias – they may find themselves in a bind.

ADVERTISEMENT

No doubt they will be unable to dedicate the dozens of hours that would be required to relearn the underlying mathematics and so, encouraged by a system that rewards them for enacting infallibility, they skim the methodology and rely on surface-level heuristics and their own credentialed authority to pass judgement.

We must weigh this reality against the current empirical landscape of machine intelligence. Frontier AI models have met the hardest professional benchmarks in the world. They pass the US Uniform Medical Licensing Examination, score in the 90th percentile on the Uniform Bar Exam, and, highly relevant to the finance sector, can now pass all three levels of the Chartered Financial Analyst exam, including the notoriously difficult Level III essay questions that require strategic wealth planning.

If a machine can process and apply the codified knowledge required to pass this in seconds, we must ask ourselves a difficult question: is human intellect truly best spent manually checking basic tax calculations or theoretical definitions in a student’s script?

The integration of AI into evaluation is nothing to fear. It is not an abdication of the professor’s role. It is a strategic reallocation of their cognitive bandwidth.

ADVERTISEMENT

Large language models excel in the mechanical application of codified knowledge. They are flawless at identifying structural deviations, mapping facts against a rubric and calculating baseline scores. But machines struggle with the uncodified: the nuanced context of a client’s behavioural biases, the ethical grey areas of a case study and the overarching philosophical intent behind a student’s argument.

By treating AI as a cognitive assistant, absorbing the crushing weight of data processing and algorithmic verification, the academic elevates their role from a line-by-line auditor to a high-level editor and adjudicator. They review the AI’s generated critiques, spot-check complex integrations and refine the feedback based on their own professional judgement.

Ultimately, this might not save a great deal of human time. But it will radically improve marking quality and consistency. The academic who integrates these systems will maintain a level of analytical rigour that a purely manual worker cannot physically sustain.

If we want to protect the “human touch” in education, we must first protect the human mind’s capacity to deliver it.

ADVERTISEMENT

Prince Sarpong is associate professor in the School of Financial Planning Law at the University of the Free State, South Africa.

Register to continue

Why register?

  • Registration is free and only takes a moment
  • Once registered, you can read 3 articles a month
  • Sign up for our newsletter
Please
or
to read this article.

Related articles

Reader's comments (1)

new
With whom is he arguing? Himself or ?

Sponsored

Featured jobs

See all jobs
ADVERTISEMENT