Judgment calls

Amid worries about examining practices, Times Higher Education asked ten academics to mark a first-year paper. Verdicts ranged from zero to a 2:1, but the markers identified an inherent consensus, says Rebecca Attwood

September 18, 2008

In the era of top-up fees, we are frequently told that the beer-swilling, work-shy, daytime-TV-addicted student is, for the most part, a relic of history.

Instead, with today's undergraduates investing thousands of pounds in their education and many employers turning down graduates with a 2:2 at the first hurdle, students are single-mindedly fixed on one goal: getting the "right" class of degree.

But undermining all this effort is the assertion by some academics that an element of chance is involved in assessment because marking is not altogether to be relied upon.

In a recent Times Higher Education opinion article, Martin Luck, associate professor of animal physiology at the University of Nottingham, argued that marking is an "imperfect art" and that academics should acknowledge its subjectivity.

A report published last year by the Quality Assurance Agency says that the class of degree a student achieves depends on the marking practices in the subject studied and that, in general, universities have only "weak control" over the marking practices of examiners.

Educationists are also concerned. Sue Bloxham, co-author of Developing Effective Assessment in Higher Education, says research on marking consistency is "depressing".

She has suggested that students will become increasingly litigious about marking and that procedures could struggle to stand up to legal challenge if the sector persists in claiming that marks are completely accurate.

So, with this in mind, Times Higher Education decided to conduct a simple experiment into marking.

With permission, we obtained a first-year philosophy essay from a student who has now completed his studies. We sent it to ten academics in philosophy departments around the UK who agreed to mark it on condition that their anonymity - and that of their university - was preserved. The title of the essay was "How successful are Descartes' arguments for the real distinction of mind from body? Upon which would you put the most weight?"

Our aim was to examine the breadth of marks a single essay would be given by different markers, to explore the reasons for the decisions and to generate discussion about the processes involved.

Markers were told that the essay had been submitted as 50 per cent of the assessment for a ten-credit first-year undergraduate philosophy module taken by philosophy majors and, as an elective, by students on other joint and single-honours programmes.

They were asked to mark the paper by applying the general principles, conventions or marking scheme for the assessment of first-year philosophy essays used in their department, and to identify its strengths and weaknesses.

The resulting marks were certainly different - they range from a good 2:1 to zero (see box below). But if the highest and the lowest marks are taken away, a different picture emerges, with most marks sitting on or around the border between a 2:2 and a 2:1 - three lecturers awarded a mark of 60 and one a mark of 58.

Within the sample, those who taught at new universities gave the lowest average mark, even when the zero is excluded - apparently debunking the popular perception that new universities are less academically rigorous than their older counterparts.

The zero mark came from Marker 4, an academic at a new university, who said the essay contained extensive paraphrasing and insufficient referencing, which meant that it, in the academic's view, fell into the category of "plagiarism and/or poor study skills". Under normal circumstances, the academic would have asked the student to resubmit the work.

Strikingly, Marker 4 was the only one to identify the paraphrasing, from a commonly used textbook, Descartes, by John Cottingham, professor of philosophy at the University of Reading.

One other (Marker 1) mentioned plagiarism, and would have considered calling in the student for a discussion about research and plagiarism given the "slightly worrying" use of a referenced quote from an essay bank.

But for the most part, the paraphrasing went undetected. And even after being shown the results of the experiment, and the sections of paraphrasing against the original text (see box, page 37), not all markers considered it a case of plagiarism given that it is the work of a first-year student and the textbook is properly referenced elsewhere in the essay.

George MacDonald Ross, senior lecturer in philosophy at the University of Leeds and director of the Higher Education Academy's Subject Centre for Philosophical and Religious Studies, says the zero grade was "an astute bit of marking".

"There would be very good grounds for saying that this is sufficient plagiarism to get a zero mark. It is well known that most plagiarism isn't spotted," he says.

He ran the essay through the plagiarism-detection software Turnitin and confirmed that it would not have alerted a marker to the paraphrasing.

So even the use of a "systematic" approach to the detection of plagiarism did not uncover its presence, leaving the final decision to individual academics, drawing on their experience and judgment.

Among the markers, there was no universal agreement about whether plagiarism had occurred. Marker 9 agreed that the mark of zero was correct, given the definition of plagiarism at their university.

"What this indicates is that there is a difficulty in determining whether an essay is plagiarised where lecturers are not familiar with the curriculum. A further question that this issue raises is whether all institutions adhere to the same definition of plagiarism," Marker 9 says.

But Marker 5 was not convinced that this would constitute plagiarism in the first year and thought a mark of zero was too harsh.

"Plagiarism is a big problem - I have been involved in marking this week and have caught a few cases," Marker 5 notes. "But if a student in the first year paraphrases, and shows an understanding of what is being paraphrased and provides the reference, I wouldn't have regarded it as a case of plagiarism."

According to an expert on plagiarism, these comments imply that a wide range of criteria is in use. Jude Carroll, a deputy director of the Assessment Standards Knowledge Exchange (ASKe) Centre for Excellence in Teaching and Learning at Oxford Brookes University, says: "These markers' reactions ranged from a quite harsh zero to a quite gentle 'talk'. Just as course assessors need to discuss criteria to be sure they all agree, they also need to discuss how quickly and how sternly they apply course regulations about 'your own work'.

"Students would struggle to know what the best way forward might be if all these markers were operating in the same programme," she says.

Beyond the issue of plagiarism, there was clearer consensus among the academics that the two highest marks were way out of the normal range. "If that had happened at this university, I would have been very suspicious," says one marker.

The lecturer who awarded a mark of 66, the highest, says the results had offered "something to think about".

"Among my colleagues, I am one of the people who uses more of the marking range than others. So I can be generous, but I am also prepared to give much lower marks, with justification, than others. I also seem to supply more comments."

Markers were quick to highlight the experiment's clear limitations. Under normal conditions, the essay would have been subject to moderation or second marking, or both, and perhaps even scrutiny by an external examiner. Many markers were concerned that they knew nothing of the content of the course that the student had taken, meaning they had no idea what proportion of the essay might have been taken from lecture notes or picked up from seminars.

Despite this, several chose to regard the results as evidence that marking procedures are reasonably reliable.

"If you take away the two extreme marks, I think that under the conditions it is rather a compliment to those involved," says Marker 5.

Another agrees: "The outcome of the project indicates that we can be confident about the comparability of assessment at different universities. The comments revealed a robust consensus concerning the strengths and weaknesses of the essay, and the caveats of the markers sufficed to explain the divergences in the marks awarded."

Ross says that, if anything, he was surprised that the marks were not more divergent because "the markers had not been given detailed criteria".

In this experiment, there is less variation among the marks than research studies in a variety of disciplines have reported, according to marking expert Margaret Price, director of ASKe.

But she notes that the markers' comments showed different emphasis in their concern about content and arguments, and it was noteworthy that several markers said they might adjust their marks by as much as 10 per cent if they had known more about the context.

"This suggests that the marks, while in the same ballpark, may have been reached by a different route that may come to light in the marking of a more advanced-level essay, where standards might be more specific," Price says.

Meanwhile the use of borderline marks "suggests that each marker is making a professional judgment about whether it is a 2:2 or 2:1 and fine-grading beyond that."

Most of the academics in the experiment felt that their marking decisions normally mirror those of their colleagues closely.

"When we have a discrepancy between a 2:1 and a 2:2 - let's say of ten points - then that is worrying, and we send it to an external. But it is rather rare. Usually you mark very close to each other," says Marker 5.

But some admit that they do not always feel confident about the marks they award.

Marker 4 says: "Marking is difficult. There is always this sense of contingency - why give this a 55, not a 59? Why give this three, not 71? You feel rather exposed.

"(Sometimes) you are reading an essay and you sort of have a 'feeling' about it - which again seems very contingent. You start off thinking it is a 63, then it goes down to a 58, then it's: 'This paragraph is a 70.' Then at the end you drag all those feelings together and come up with a mark.

"After the 25th piece of work you sit there thinking, is this a 2:2 or is this a first? We have university-wide criteria, but it is hard to apply those straightforwardly."

Bloxham says marking is a complex process involving judgment on a wide range of factors such as structure, argument, knowledge and use of evidence. While it might be relatively easy for two tutors working in the same department to agree on a grade, it is important to examine the variations between different groups of tutors.

"Universities have made great efforts to increase consistency and transparency in marking through devices such as assessment criteria and grade descriptors; but these have to be interpreted by the individual marker - and this will be influenced by their prior experience of marking, the influence of colleagues, seeing others' marking etc," she remarks.

"Consequently there is always likely to be some difference in marks across different tutors, and there is considerable debate about how much they improve consistency in marking."

However, the variation should not cause undue concern, Bloxham emphasises, because students on a typical degree course complete 40 or so assignments and examinations and the marks given are subject to moderation. "The influence of any one tutor is limited," she says.

Despite the apparent complexities, most markers in our sample said they had never received any formal training in marking, although some had taken university courses designed for new staff.

"I was supervised, but it was definitely through practice rather than formal training. I think it was fine. I don't know what training would really add," says Marker 4.

Marker 9 says: "One of the reasons that I found the exercise reassuring was that I have received no formal instruction in marking.

"I learnt to mark by showing my marks and my comments to a senior colleague, who advised me whether, in their opinion, I was marking to the correct standard."

Marker 9 pointed out that academics from different universities are never called together explicitly to standardise marking. The academic went on to suggest that, given the lack of instruction in marking, it might be helpful for HEA subject centres to do some work in this area, investigating the comparability of marks on a larger scale and running courses or colloquia where academics could compare experiences.

ASKe agrees that, rather than increasing regulation, a consensual, collegiate approach would be the best way forward. The centre's staff say that academic "community engagement" is a low priority in mass higher education, and they fear that pressure on staff to increase their productivity could damage assessment standards.

To have confidence in professional judgment, there must be forums for the development and sharing of standards within and between disciplinary communities, ASKe believes.

Berry O'Donovan, a deputy director of ASKe, says: "A key ASKe message is that sharing understandings of a 'good piece of work' is pivotal not only for consistency when staff are marking work but also for students, so that they understand what they are aiming for when they are writing their assignments.

"It is exactly this sort of activity - the marking of exemplar assignments - that we would suggest that all large marking teams and their students should undertake."


Ten academics assessed a paper on: "How successful are Descartes' arguments for the real distinction of mind from body? Upon which would you put the most weight?" Here they explain their marks

Marker 1 - (Mark given 63/100)

I'd consider having the student in for a discussion about research and plagiarism, given the slightly worrying use of an essay bank. The descriptions of Descartes' arguments are clear and sensible, and the student raises a number of reasonable objections to them. The discussion of the problem of continuous thought is particularly good.

The essay is fragmentary in the second half: the student mentions, but doesn't really explain or use, a range of different criticisms of Descartes, and the impression is that they are trying to cram in everything they've found about the issue, rather than construct an argument.

Marker 2 - (58/100)

There is evidence of a fair amount of reading, some of it broad. (The essay) sticks to the question, and is organised well in terms of three possible arguments for the "real distinction". However, some of these arguments are not well discussed.

In an essay that was not done under exam conditions one would expect these arguments to be fuller. The claim that Descartes' argument is immune to arguments that adduce scientific evidence for the dependence of mind upon brain is also not supported, but more like an assertion.

Marker 3 - (53/100)

The essay is reasonably well informed, though there's little reference to the text concerned. It's rather formulaic and sometimes gives the impression that its author hasn't fully understood some of the arguments made/quoted; it's an essay that rather worthily reports arguments rather than making them.

There is some rather inaccurate reasoning. It's quite good on the interaction problem ... but doesn't really answer the second question asked. (It) waffles at the end ... (and there are) some referencing problems. I can imagine that in a different context, it might have 10 per cent more.

Marker 4 - (0/100)

This is evidence of paraphrase, if not outright plagiarism. I would stop reading the essay at this point and make an appointment to see this student.

I wouldn't be too harsh, though, if this was the first piece of work at university level. I don't think the plagiarism is malicious. However, due to the scale of the paraphrasing from (the text by) Cottingham and the use of an unattributed - most likely internet - source, I would ask the student to resubmit the work, ensuring that they understand what referencing involves, and the different kinds of plagiarism that exist.

Marker 5 - (56/100)

The writer gives a reasonable answer to the essay question. The essay sets out clearly what arguments the writer intends to discuss, but the discussion of the three arguments differs in quality.

(There is) some lack of clarity, some arguments not properly thought through.

Overall, the essay is relatively well written. It is a promising start, but the author will have to learn to set out the arguments more clearly and tie them into a coherent structure.

Marker 6 - (60/100)

By the standards of a first-year student, the essay provides a decent statement of Descartes' ideas, and it shows that the student has understood the main points.

Its weak point is how it deals with the second part of the question. This is an invitation to critically evaluate Descartes' arguments and possibly develop some independent ideas. The student doesn't take up the opportunity.

Marker 7 - (60/100)

The essay starts well. Over the course of the first two pages the student engages in some useful analysis, appears to have a good grasp of the main points and to be well informed and makes some useful references.

At times the essay reads like it might have been based quite closely on handouts and/or lecture notes and, if this were a student of mine, I would have been able to judge that. A number of points on the first two pages could have been given more depth, clarity and consideration.

The final third of the essay is much weaker. It backtracks to points already covered, includes some new points with too little depth and analysis and finishes with a number of broad conclusions that are insufficiently supported.

I was unable to verify the website reference included on page three as it is from a subscription website. Another reference is incomplete. Finally, there are some basic errors on page three.

Marker 8 - (66/100)

Descartes' arguments were nicely explained. The author picked out some things that many students miss. I would have liked to have seen a little more sustained criticism to justify top marks - say, in the 75-80 range. There were a few criticisms rattled off in one sentence each on page three, whereas a really good essay would have taken one or two problems and pursued them in more detail.

Marker 9 - (60/100)

This is a quite well-structured essay. It is mature in tone and it is supported by secondary reading. It addresses the question in a serious way. I think it indicates that the student who wrote it has both the desire and the potential to do well.

However, from a very encouraging first paragraph, it deteriorates. Grammatical errors start to creep in, a lack of accuracy of expression and argumentation becomes apparent, and there is often a failure to render Descartes' own claims accurately.

Points are adduced from secondary authors but not explained in enough detail for them to mean much. Different paragraphing conventions are adopted; the referencing is poor, as is the bibliography.

Marker 10 - (17/20)

I would say that it is a good essay. The author uses interesting bits of the literature and manages to structure his arguments clearly.

Unfortunately, they are not always as clear as they could be, and some points remain a bit sketchy. For instance, neither does the logical structure of the three arguments become really clear, nor does the author discuss their (possible) interrelation. It looks like a good upper-second.

