A system that is wide of the mark

October 26, 2007

The idea of fair assessment is not only meaningless but the ways in which it is guaranteed are counterproductive, says Sue Bloxham. Most assessment is a matter of professional judgment. A doctor will make an informed judgment when diagnosing symptoms, but the more complex and difficult the nature of the symptoms, the more the diagnosis is open to question. A second opinion is sought for just that reason. But it is only a second opinion, not necessarily any more correct.

Likewise when we make judgments about student work; the more complex the subject matter or task, the greater the number of factors to take into account and the more likely that we will come to different decisions about the correct mark. A second opinion may confirm or contradict our view, but neither will be "correct" because, in most cases, there is no such thing as a correct grade. Marking involves an element of professional judgment informed by our own views of standards in our disciplines that are enormously difficult to codify and apply consistently to student work.

But the myth of the correct (moderated) mark is sustained via the array of measures designed to provide "transparent and fair mechanisms for marking and for moderating marks" (Quality Assurance Agency code of practice for the assurance of academic quality and standards in higher education, 2006) such as second marking, anonymous marking, assessment criteria, grade descriptors and marking schemes. This is despite little research evidence that they make a difference to reliability. Indeed, if work could be marked reliably and accurately (for example, as in a multiple choice test) then no one would be worried about anonymous marking. It is clearly because most work needs a level of professional judgment that there are worries that it is open to abuse.

Experienced staff are often certain about the accuracy of their marking, but research suggests that we develop marking habits that may contradict published assessment criteria.

A small survey by J. Hartley et al (2006) found that tutors were giving significantly higher marks on average to essays in 12-point type than to those in ten point, suggesting that some criteria for assessment are hidden to the markers as well as to students.

Undoubtedly, subjectivity is unavoidable, but what should change is the increasing focus on equitable and consistent assessment procedures at the expense of student learning. This has been made significantly worse by the fact that most student work now counts towards progression or award and therefore quality assurance measures are applied to every piece of assessment - often 40 or more items on a typical degree.

These measures can eat up hours of staff time and may have repercussions for student learning. For example, despite evidence that timely feedback is important, extensive internal moderation can delay the return of students' work; procedures such as anonymous marking create impersonal forms of feedback that appear irrelevant or inaccessible; fear of negative reactions from external examiners leads staff to abandon rich but unusual assessment methods; and markers are asked not to write comments on work as this may prejudice the double marker, despite the fact that such comments could help future learning.

This is not a tirade against quality assurance, more a plea to get the balance right in assessment. We need to look for light-handed moderation processes, perhaps at programme level, that examine patterns of marks and direct our moderation activities at students whose overall profile is confused or borderline or appears to represent some form of bias. It is relatively easy for two tutors to agree on the grade for an individual item, perhaps through "splitting the difference" between their individual marks. However, it is only in the broader patterns of marking that we will begin to identify systematic differences in marks between different groups and different tutors, and it is perhaps to those that we should be paying more attention.

It would not be uncommon for a doctor to share any uncertainty over a diagnosis with the patient, particularly if a set of symptoms pointed to differing diagnoses. But we maintain the myth to our students that there is a correct judgment about the value of their work, to within a percentage point, and that we are capable of making it.

Perhaps it's time to let students into the secret that grading work is a fragile enterprise involving interpretation of criteria that are often only tacitly understood. Only then can we involve them more in their own assessment, encouraging them to develop the skills of professional judgment and justification so relevant in many graduate professions.

The Burgess group final report, which was published this month, focuses on the classification of degrees, but also advocates that UK universities should collectively pay attention to issues of assessment and their fitness for purpose through a national debate. I can only add my voice in support of a rethink of marking and moderation activities that add little to student learning.

Sue Bloxham is professor of academic practice and head of the Centre for the Development of Learning and Teaching at Cumbria University. These are the views of the author and not a reflection of her institution's views.

Register to continue

Why register?

  • Registration is free and only takes a moment
  • Once registered, you can read 3 articles a month
  • Sign up for our newsletter
Please Login or Register to read this article.