Why I had to quit the research excellence framework panel

With no time for proper peer review and with grade inflation inevitable, one academic felt compelled to resign

十一月 19, 2015
James Fryer illustration (19 November 2015)
Source: James Fryer

Despite some whispers that the research excellence framework (REF) might be scrapped, the government’s higher education Green Paper, published earlier this month, indicates that it will remain – possibly subject to a metrics-based interim “refreshment”. There is even a proposal to introduce a version for teaching.

That is a pity. The REF subpanel to which I was appointed for the 2014 exercise was so unable to provide a reliable assessment of relative departmental quality that I felt compelled to resign.

Many academics, nervous about the ability of metrics to assess research quality, reluctantly fall behind the REF. At least it involves academic judgement, they say. Panellists, many of them distinguished academics, offer their time to help ensure a well-founded evaluation.

But it became clear to me that, in spite of everyone’s best efforts, the system does not constitute peer review in any meaningful sense. There is simply too much material to assess with the care that would be rightly expected for reviews for research grants, publications or promotions.

I had to read about 75 books and 360 articles or chapters, as well as numerous other outputs, including cross-referrals from other panels. I was not given any leave from my institution, and although I spent most of spring and summer at my desk, I could often give only an hour or so to “reading” books, and no more than 20 minutes to articles or chapters. Some colleagues had an even heavier assessment burden.

I understood when I signed up that assessment would be demanding. I resigned only after doing all the work (but before I became aware of my own institution’s results), when it became apparent to me just how much our high-speed review was distorting results. I know of colleagues who, before submission, had spent hours deliberating over whether to submit outputs deemed to be on the borderline between the unfunded 2* grade and the magic 3*. Yet subpanellists often read and discussed those very outputs with alarming brevity.

I was also concerned about how reviewing was allocated. Our efforts would have been much more effective if we had been primarily reading outputs in our own areas of expertise, broadly defined. But – unusually – our subpanel allocated the whole of each institution’s outputs to just two reviewers. In early discussions, some experienced colleagues expressed concern that institutions allocated to “more generous” assessors would benefit unfairly. We asked to see the average scores of each assessor, and the marked disparities suggested that this was a very real danger.

In the 2008 research assessment exercise, one department widely viewed as in decline did extremely well. In the REF, it surprisingly repeated its success. I was shocked to discover that one individual had reviewed nearly all its outputs on both occasions. That reviewer was in no way acting corruptly, and was teamed with another on both occasions. But it seemed incredible that one person could have so much influence over a department’s fate.

A third reason for my resignation concerned rampant grade inflation. We were shown the average scores of all the subpanels under our main panel. It was hard not to conclude that some were finding astonishingly high levels of world-leading research. This had the consequence of making other subject areas look artificially weak, and it put great pressure on other subpanels to protect their fields by raising their own grades.

This happened to us. Confronted by figures suggesting that we had given lower scores than the other subpanels, even though we all felt that our discipline was producing a very considerable amount of excellent research, we undertook a rather farcical and hasty process of “rereading”. Often grades were simply raised at random.

It’s true that institutions that rank high in the REF subject league tables are usually recognisable as good departments, and vice versa. Few good submissions do badly. But my experience taught me that league tables produced from REF scores (especially those based on grade-point average) are in no way a reliable guide to comparative quality.

Evidence suggests that citation metrics would not change the distribution of income much. And they would save us the wretched six-yearly drama of hope, futility, idiocy and waste. But in many subject areas, citations produce almost laughably distorted pictures of quality. The “top” journals are often barely read; the “leading” academics frequently have little real influence on thinking.

Academics should reject the false choice between REF-style “peer review” and metrics. Money should be distributed on the basis of measures that are simple yet do not distort. These could include PhD completions, research-active staff employed and research grant income. A competitive pot could be put aside to enable less research-intensive universities to develop their research, to prevent an ossified elite capturing all the cash.

Such a change could be achieved only by strong leadership from across disciplines and universities – and in my view academic unions should lead the campaign for it. Without a shift, robust assessment will continue to be obstructed by the impossibility of properly reading submissions in the time available, and the understandable tendency of academics to defend their own.

The author wishes to remain anonymous.


Print headline: As the panel could not give a reliable view of quality, I had to resign



  • 注册是免费的,而且十分便捷
  • 注册成功后,您每月可免费阅读3篇文章
  • 订阅我们的邮件
Please 登录 or 注册 to read this article.

Reader's comments (8)

Hitler, Stalin and other autocrats have offered strong leadership, so I don't see leadership, strong or otherwise as being any solution to the 'problem' The root of the 'problem' comes down to differences in world views and pedagogy vs politics. I have come across no better account of world views than Stephen Peppper's World Views: A Study in Evidence first published in 1942, a book which few seem to have read or acted upon. Although he later extended his four 'rational' world views to five, when the 'con' is deleted from his contextualist world view, the resulting 'textualist' view (a world of doing, enduring and enjoying) is identical to that of a pedagogy identified by collegiate action. This is not a model easily compatible with his other world views of Formism, Mechanicism, or Organicism. Karl Popper was implicitly in agreement when commenting to the effect that the focus should be on problems not on 'disciplines', for disciplines arose through historical accidents and are maintained for administrative purposes. Pepper's account offers the most parsimonious account of why 'experts' even in the same field so frequently disagree with each other. Mary Douglas also tackled this issue in her How Institutions Think. Either willful ignorance or blindness, to the validity and legitimacy of other world views, is the root of the problem. The difficulty, with reference to Imre Lakatos, is how to effect a change from the ever increasing degenerative problem shifts to the relatively rare progressive problem shifts
I've been thinking a lot about the audit culture and these exercises of late. Here's my brief analysis: Higher Education’s Silent Killer http://briarpatchmagazine.com/articles/view/higher-educations-silent-killer
If you put the resources that went into the administration of the REF, including the TIME spent on the whole exercise by the academics and support staff at the participating institutions, and put it directly into actual research, the UK would have no problem producing consistently world leading research, and academics would be saner, calmer people and better able to do the research and teaching the taxpayer reasonably expects us to do. Given that there is not THAT much movement between rankings and therefore funding decisions between each round, why not just allocate the funding along the same lines annually and save us all the heartache? REF literally drives people MAD: I have seen so many colleagues suffer burnout and stress because of it, and as the person responsible for our panel submission from our institution the workload involved contributed to a serious mental health breakdown that saw me off work for 6 months, and I am by no means the only one. This is NOT good use of highly skilled people's resources. Imagine the contribution I could have made to research in my discipline if I had put all that time and effort into carrying out theoretical and empirical research, publishing it, using it to help inform policy and practice in my area, and applying it to my teaching - not to mention imagine how much of my sanity I would have saved! Now multiply me by the thousands of research-active staff in higher education and you see the huge wasteful cost of this divisive and flawed exercise.
As a late-stage PhD student considering embarking on an academic career this article is worrying. Unfortunately it is backed up by my observations of faculty in my department and in other institutions. The pressure which is piled on researchers does not go unnoticed by their students and I know a number of highly talented early career researchers (often women incidentally) who have decided that such a stressful existence just can't be for them.
The analysis is spot on but the suggested solutions would introduce horrible distortions. If PhD completions counted towards a REF, institutions would instantly rush to award PhDs to everyone and their dog. The standards for PhD completion in UK are already risibly low, with supervisors naming friends as examiners for weak candidates so that they pass and do not stain their supervision records. Counting research grant income rather than scientific output would be tragic, as grants are allocated in an equally debatable manner as REF scores. Furthermore there is a conceptual problem in rewarding the input rather than the output - funds are only good because/if they enable people to do good research. Universities are already too keen to use them as a criterion for promotion (they get overhead after all), there is no need to make this even more extreme.
Congratulations on your honesty, would that most of those who collaborate with the dismal exercise in bullying and mendacity which is the REF would follow your example. But the fact that you remain anonymous is a testimony to the regime of fear by which UK universities are run. Fortunately, despite the lies of vice-chancellors and HMG, no other nation regards UK universities as anything other than a lost cause, ruined in large part by the intellectual and moral cowardice of their inmates.
I have to say that my experience on a 2014 REF subpanel was entirely different to that of 'Contributor'. Yes, the time taken was huge (not much less than 1000 hours, certainly more than 2hr per day over most of a year). No assessor-pair handled all submissions from one institution. Scores within pairs were remarkably similar - including detection of hidden review articles, or vast and expensive studies where the output could have been written at the same time as the grant, or papers where the title/abstract either understated or overstated the quality of the research (all of which would be hard to spot by metrics). Calibrations and duplicate output scores were also consistent. Not surprisingly, there is a decent correlation between metrics, and grant income, and last-times-RAE, and REF score (if not, quite, funding allocated by the 'REF' for 1864 - as some commentators would by extension advocate), but I don't think the correlations are good enough to spend something over £1 billion a year. Looking at the results afterwards not only from my sub-panel, there were a number of institutions where leadership or appointments/losses had changed their position, up or down, and the peer review was able to recognize that. I do agree with Contributor, though, that the effort and way many Universities handled the submissions left much to be desired.
I was on a subpanel the REF before last and refused to sign the final documents------ I think the chair declared me Ill. My reasons were very close to those in the article except we had to read 1000 papers each. Quite impossible, so all kinds of other features (name, reputation, institution) actually determined the grades awarded. The subpanel took all kinds of statistical precautions to ensure uniformity and avoid bias but it was of no avail I believe. The system is absurd.