Are we missing the mark?

September 3, 1999

Percentages? Grades? A mixture of the two? Paul Bridges wonders, is our marking system really fair?

Is our marking fair? We would like to think so. We devote care to the design of assessments in order to find out what students know, understand and can do. We try to ensure that marks awarded reflect the standard appropriate to the level of the module.

But what about the marking systems we use? There is a broad spectrum of marking systems used in universities and colleges, ranging from the traditional percentage scale to grading scales with quality descriptors for each grade. In the middle of the spectrum, some institutions use a system that combines percentages and grades.

When marking an assignment, seminar presentation or examination paper, we attempt to translate our judgement of the standard of student performance into a mark or grade in accordance with the rules of the designated marking system.

But to what extent does our subject background influence how we do this? Do assessors in different subjects use the same marking scale in the same way? Do assessors in the same subject use the same marking scale in the same way? If there are differences, what are the implications for the aggregate marks that are calculated or the array of grades that are used to decide the awards to be given to students? In short, are our marking systems fair?

The Student Assessment and Classification Working Group has been trying to answer these questions. It has collected mark distribution data for ten subjects at undergraduate levels 2 and 3 in seven English universities and looked at the frequency of module marks and grades.

The group focused initially on mark distributions from subjects using the traditional percentage scale. The study showed that mark distributions for subjects requiring qualitative judgements of the student performance, for example English and history, had a narrow spread from about 30 per cent to 80 per cent, leaving 50 per cent of the scale unused. These distributions appear to reflect the relative confidence of the assessors in using different parts of the percentage scale. Examiners are confident in attributing marks in the central part of the scale, but this confidence appears to ebb dramatically in instances of very poor or very strong student performances.

In English and history, a mark of 80 per cent at levels 2 and 3 denotes work of an exceptionally high standard. It is difficult for the assessor to justify awarding a mark in the 90 per cent-99 per cent bracket, which would imply perfection.

There is also a class boundary effect. Immediately below the boundary mark for each class and division there is a conspicuous drop in frequency compared with the adjacent marks. This also appears to reflect insecurity in attributing these marks. Can the examiner working with the percentage scale - which, of course, has 100 fine divisions - be absolutely certain the student's work is worth the mark immediately below the class boundary and no more?

The examiner is on more secure ground if he or she goes a mark lower or higher. Hence the percentage immediately below each boundary becomes a partial mark exclusion zone. The mark immediately above the boundary also registers an anomalously low frequency because there is usually no pressure to decide whether a work that has attained the boundary mark is worth one more mark.

Subjects that require quantitative judgements on the part of examiners, such as computer studies and mathematics, have a much greater spread extending almost the full scale. There is no evidence of reluctance on the part of examiners to give marks of less than 10 or more than 90 in

respect of works that are of a very low or very high standard. There is some evidence of a partial mark exclusion zone immediately

below the class boundaries, but it is less conspicuous than in the mark distributions of qualitative subjects.

SACWG found that biology, business administration, fine art, French, law and sociology show mark distribution patterns that are intermediate between the contrasting distributions described above.

Some observers may argue that these mark distributions simply tell us that the very best students in mathematics, computer studies and related physical science and engineering subjects are more outstanding than the very best students in the humanities, arts and social sciences. While such differences are always possible on a local scale, this is not a tenable view when one considers a range of universities or indeed the whole sector. SACWG is in little doubt that the differences in the patterns of mark distributions are a reflection of how different groups of examiners interact with the traditional percentage scale.

What then are the implications for consistency and equity in assessment outcomes for students? The diagram below illustrates how two students of equal ability in their respective subjects may receive different classes of award simply as a result of the patterns of mark distributions described above. In the example, a student who combines a qualitative and a quantitative subjectm (student B) gains a higher award than a student who combines two qualitative subjects (student A).

The situation is reversed if the two students are average in one subject, but weak in their second subject. If the award is based on the calculation of an arithmetic mean, the pattern of the mark distributions can have a significant impact on the class of honours degree to be awarded. Although the implications are most evident for multidisciplinary and joint honours programmes, subject-

related mark distributions can also cause inequity of outcome for students on single honours programmes if there is a choice of qualitative and quantitative elements.

So what is the answer to the problem? Some universities have sought to overcome the problem of excessive precision in the percentage system by grouping the marks into bands that carry grades. This does not overcome the problem of differential spread because the traditions of percentage marking in different disciplines continue to underlie the marking process.

A more satisfactory approach is a marking system that has a moderate number of divisions, each of which corresponds to a grade. Each grade carries a descriptor that defines the quality of performance to be reflected by the grade.

The SACWG analysis shows that this kind of marking system produces wider mark distributions for qualitative subjects than percentage systems. Critics might argue that grade systems are not suitable for calculating average marks as indicators of awards and they may display other aberrations. Nevertheless, they have considerable potential. There is a need to understand marking systems rather better than we do.

SACWG is an inter-university research group coordinated by Mantz Yorke of Liverpool John Moores University.

The work outlined in this article will be published in Assessment and Evaluation in Higher Education this autumn.

Paul Bridges is dean of academic planning at the University of Derby.

Register to continue

Why register?

  • Registration is free and only takes a moment
  • Once registered, you can read 3 articles a month
  • Sign up for our newsletter
Register
Please Login or Register to read this article.

Sponsored