Despite some whispers that the research excellence framework (REF) might be scrapped, the government’s higher education Green Paper, published earlier this month, indicates that it will remain – possibly subject to a metrics-based interim “refreshment”. There is even a proposal to introduce a version for teaching.
That is a pity. The REF subpanel to which I was appointed for the 2014 exercise was so unable to provide a reliable assessment of relative departmental quality that I felt compelled to resign.
Many academics, nervous about the ability of metrics to assess research quality, reluctantly fall behind the REF. At least it involves academic judgement, they say. Panellists, many of them distinguished academics, offer their time to help ensure a well-founded evaluation.
But it became clear to me that, in spite of everyone’s best efforts, the system does not constitute peer review in any meaningful sense. There is simply too much material to assess with the care that would be rightly expected for reviews for research grants, publications or promotions.
I had to read about 75 books and 360 articles or chapters, as well as numerous other outputs, including cross-referrals from other panels. I was not given any leave from my institution, and although I spent most of spring and summer at my desk, I could often give only an hour or so to “reading” books, and no more than 20 minutes to articles or chapters. Some colleagues had an even heavier assessment burden.
I understood when I signed up that assessment would be demanding. I resigned only after doing all the work (but before I became aware of my own institution’s results), when it became apparent to me just how much our high-speed review was distorting results. I know of colleagues who, before submission, had spent hours deliberating over whether to submit outputs deemed to be on the borderline between the unfunded 2* grade and the magic 3*. Yet subpanellists often read and discussed those very outputs with alarming brevity.
I was also concerned about how reviewing was allocated. Our efforts would have been much more effective if we had been primarily reading outputs in our own areas of expertise, broadly defined. But – unusually – our subpanel allocated the whole of each institution’s outputs to just two reviewers. In early discussions, some experienced colleagues expressed concern that institutions allocated to “more generous” assessors would benefit unfairly. We asked to see the average scores of each assessor, and the marked disparities suggested that this was a very real danger.
In the 2008 research assessment exercise, one department widely viewed as in decline did extremely well. In the REF, it surprisingly repeated its success. I was shocked to discover that one individual had reviewed nearly all its outputs on both occasions. That reviewer was in no way acting corruptly, and was teamed with another on both occasions. But it seemed incredible that one person could have so much influence over a department’s fate.
A third reason for my resignation concerned rampant grade inflation. We were shown the average scores of all the subpanels under our main panel. It was hard not to conclude that some were finding astonishingly high levels of world-leading research. This had the consequence of making other subject areas look artificially weak, and it put great pressure on other subpanels to protect their fields by raising their own grades.
This happened to us. Confronted by figures suggesting that we had given lower scores than the other subpanels, even though we all felt that our discipline was producing a very considerable amount of excellent research, we undertook a rather farcical and hasty process of “rereading”. Often grades were simply raised at random.
It’s true that institutions that rank high in the REF subject league tables are usually recognisable as good departments, and vice versa. Few good submissions do badly. But my experience taught me that league tables produced from REF scores (especially those based on grade-point average) are in no way a reliable guide to comparative quality.
Evidence suggests that citation metrics would not change the distribution of income much. And they would save us the wretched six-yearly drama of hope, futility, idiocy and waste. But in many subject areas, citations produce almost laughably distorted pictures of quality. The “top” journals are often barely read; the “leading” academics frequently have little real influence on thinking.
Academics should reject the false choice between REF-style “peer review” and metrics. Money should be distributed on the basis of measures that are simple yet do not distort. These could include PhD completions, research-active staff employed and research grant income. A competitive pot could be put aside to enable less research-intensive universities to develop their research, to prevent an ossified elite capturing all the cash.
Such a change could be achieved only by strong leadership from across disciplines and universities – and in my view academic unions should lead the campaign for it. Without a shift, robust assessment will continue to be obstructed by the impossibility of properly reading submissions in the time available, and the understandable tendency of academics to defend their own.
The author wishes to remain anonymous.