Discovering why girls outperform boys at school but fail to do as well at university requires five years' detailed study, argues Gillian Sutherland.
People seem surprised that degree results show the impress of gender. Yet why should they not? It is a commonplace that social factors affect assessment at the ages of seven, 11, 14, 16 and 18. There is a large and technically sophisticated literature on the identification of these and the problems of controlling for and minimising their role. Why should assessment of 21 or 22-year-olds escape?
The surprise has its roots in the amateurism of much assessment at university level. Most of their examining practices are untouched by the technical sophistication which prevails elsewhere.
Universities are the last bastion of the notion of the absolute standard and new lecturers are often expected to absorb the criteria for measuring good - and bad - performances by a process of osmosis. The concern of Higher Education Funding Council for England assessors with delivery, with effective teaching and learning, seems to have stopped short of outcomes. Yet these are surely the ultimate measures of effectiveness.
The single most gender-sensitive subject at British universities, according to the path-breaking work of Gerry McCrum (Oxford Review of Education March 1994), is a social science, history. More detailed analysis of results at Oxford and Cambridge has shown a potent interaction between gender and class, particularly in Oxford. Bottom of the heap come girls from maintained schools; next come girls from independent schools; then come boys from independent schools; and the kings of the castle are boys from maintained schools.
The Cambridge history faculty had its attention forcibly drawn to the apparent under-performance of its women students two and a half years ago by some of those students and has been much preoccupied with it ever since. Its gender working party reported in March 1994 and the faculty board now has an equal opportunities standing committee.
In the summer of 1994 the tripos examiners surprised themselves and everyone else by breaking with the pattern of the past ten to 15 years and awarding roughly equal proportions of firsts to men and women in both parts of the tripos. In 1995 it began to look as though older habits were reasserting themselves. Roughly equal proportions of firsts were awarded to men and women in Part II; but in Part I almost three times as many men as women were awarded firsts.
How to make sense of this? How to account for either the dominant pattern of the past decade or so, when girls have been improving their performance in every other kind of examination, or the wobbles of 1994 and 1995? Since scripts are anonymous and there are no vivas the latter cannot be dismissed as political correctness.
There are no simple answers; in Cambridge, indeed, there remains some uncertainty about what exactly is to be explained. There are, however, a number of ways forward, all of them rooted in a much greater self-consciousness about the processes in which university teachers are engaged as examiners; and experience as an external examiner suggests they have a relevance and a use well beyond Cambridge.
Self-consciousness and self-scrutiny need to develop in two ways in particular. First, we need to look carefully and rather elaborately at what we do when we assess, evaluate and grade; and second, we need to recognise that assessment is not so much a magical exercise, akin to water-divining, but more a set of techniques, with its own expertise.
When we construct a marking scale we need to be much more detailed and explicit than often we are about what it is we are rewarding. How much importance do we attach to eye-catching interpretation, aggressive argument? Is a minimum of information required? What level of information distinguishes a "good" from a "moderate" answer?
There is a general view among teachers and taught that men students find it easier than women students to develop a vigorous polemic, whether they believe its propositions or not; and that jaded examiners often respond over-generously to this. It benefits both examiners and examined to explore these views openly in discussion. One of the most interesting notions to emerge from the current round of discussions in Cambridge is that of the "incremental first", the candidate who does not lead off like a columnist in the up-market weeklies, but rather builds up an answer quietly and steadily, with an accumulation of points and the evidence to support them.
Such a detailed spelling-out of marking conventions and weightings becomes even more of a necessity as large numbers of institutions take to modularisation, often with marking scales of breathtaking brevity and coarseness.
Underpinning a detailed marking convention, we need a scrutiny of the profile of each examiner, to spot easily those who consistently over-mark and those who consistently under-mark - and those who mark entirely within a range of about ten marks - typically 55 to 65. Such profiles are an essential tool in reducing some of the larger absurdities revealed by blind double-marking and clearing the way for concentration on the really difficult cases. Finally, there should be systematic provision for the induction of new and/or inexperienced examiners.
Along with a scrutiny of examiners should go a scrutiny of the pattern of candidates' performances. Do men and women choose the same options; do they do equally well in the ones they choose? Do broad survey papers and/or open-ended questions favour one rather than the other? If there is variety in the modes of assessment in use, is there a gender dimension to this?
We need to do this kind of analysis paper by paper, option by option, mode by mode for at least five years. Then we shall have a respectable body of evidence from which to draw some conclusions.
Variation in modes of assessment raises much larger issues. It seems no coincidence that the so-called "gender deficit" is most marked at two universities whose methods of assessment are among the most traditional. Although both the Cambridge History Tripos and the Oxford History School now have a dissertation option in the final year and Cambridge has just introduced a compulsory long essay into Part I, Oxbridge assessment is still dominated by the three-hour sudden-death examination - a format which puts a premium on argument and the deployment for maximum effect of sometimes sketchy knowledge.
Cambridge is in the throes of a move from an alphabetical marking convention - betas and alphas - to a numerical scale; and much of the debate has focused on the alleged loss of subtlety when highly inflected marks - such as B+?+?a?g - can no longer be given. The role of a hieroglyph like this in offering a profile of a candidate's qualities has been much urged; although sometimes its supporters seem to forget that eventually candidates still have to be ordered on a simple linear scale: class 1, class II (i), class II(ii), class III.
A more substantial and potentially more informative profile could be achieved by varying the modes of assessment in use; by looking at candidates' performances in a range of situations, long essay and/or dissertation, a selection of course-work essays, even take-home examination papers, as well as the conventional three-hour marathon.
It may be asked, finally, what has all this to do with gender? May we not emerge in Cambridge and elsewhere with a similar pattern at the end? Possibly; although the coincidence between course work and the improved performance of girls at 16+ is a suggestive one. But the pursuit of best and most sophisticated practice in university examining cannot be bad for either women or men.
Gillian Sutherland is vice principal and director of studies in history at Newnham College, Cambridge.