The REF is perverse to ignore journal hierarchies

The research excellence framework’s reliance on hasty peer review by generalists limits sample size and accuracy, three academics argue

十月 4, 2018
A pile of journals
Source: iStock

In 2014, more than 150 UK higher education institutions submitted nearly 200,000 research outputs and 7,000 impact studies to the research excellence framework (REF), at an estimated total cost of nearly £250 million. Those overall figures are not expected to be reduced this time around, so what do we get for a quarter of a billion pounds? How effective is the REF at assessing quality?

The draft guidance on REF 2021 associates quality with originality, significance and rigour, but its grading criteria remain hazy and subject to variation between units of assessment (UoAs). What counts as “world-leading” originality, for instance? How closely can small panels of multidisciplinary reviewers accurately determine how far an output would need to fall below the “highest standards of excellence” before it is rated 3* instead of 4*?

Then there is the question of sample size. In 2021, institutions must submit an average of 2.5 outputs per academic in a UoA over the seven-year qualification period. Compared with 2014’s requirement of four articles per academic selected for submission, this is an inclusive approach intended to engage a wider proportion of the academic community. However, it is only a selective snapshot of productivity for active researchers and may not fully differentiate between groups. Moreover, such selectivity seems unnecessary when modern electronic systems are able to cope with huge datasets.

Each of the 34 assessment subpanels consists of about 15 experts. Based on 2014 submission figures, each panellist will need to review more than 700 outputs over a few months, assuming each submission is assessed by two people. The impossibility of doing so with the appropriate level of critical insight is exacerbated by the diversity of topics within each UoA, rendering particularly perverse the instruction that panels must disregard journal hierarchies.

A decade ago, a study put the cost of journal peer reviewing at £1.9 billion a year. Although the efficacy of the system is debated, it is a fundamental principle of publication that assessment of papers is undertaken by reviewers selected for their specialist knowledge of the specific topic in question. This is likely to be more rigorous than the REF panels’ generalists are likely to manage. Surely it would be a much better use of taxpayers’ money to drop this duplication and free up the panellists to focus on higher-order evaluations, such as the coherence of work and its impact.

Australia’s REF equivalent, known as Excellence in Research for Australia (ERA), is a case in point. It recently closed its consultation period for compiling the discipline-specific journal rankings on which it largely relies to assess scientific subjects. These rankings do much more than apply a simple journal impact factor: they recognise the prestige of the publication with respect to each area of research, on the understanding that a journal that is highly prestigious in one field may be less so in a neighbouring one.

The rankings make the plausible assumption that if a discipline agrees that a particular journal carries a 4* ranking then most articles published therein will be of that quality. Clearly there is no guarantee of that in all cases but that doesn’t matter at the macro level, particularly if the assessment takes in all outputs published in the relevant period, rather than a REF-style sample.

Apart from being more transparent than the current REF methodology, a fuller desktop evaluation of outputs based on agreed subject-specific publication rankings could be carried out more frequently than every seven years. This would inevitably give a truer insight into each research group’s productivity relative to its quality, and provide a stronger basis for the distribution of research funds.

Andrew Edwards is head of the School of Human and Life Sciences at Canterbury Christ Church University. Tomasina Oh is associate dean of research at Plymouth Marjon University. Florentina Hettinga is reader in the School of Sport, Rehabilitation and Exercise Sciences at the University of Essex. Views expressed are the authors’ own.


Print headline: The REF should rank journals


Reader's comments (4)

It's not just REF, which at least makes explicit it's about research. 'World University Rankings' measure excellence using research publications as proxy, and then only in a relatively small number of English-medium publications. Even measures supposedly of teaching excellence and reputation have measures of research publication or citation built in - are you an institution filled with excellent researchers and inspirational teachers who unfortunately can't write too well in a second language? Too bad, down the rankings you go unless you invest in a massive translation and proofreading programme at a time of swingeing budget cuts and demands for fiscal prudence. REF is flawed, undeniably. It's considerably less so than the reliance placed on it and other highly situation-specific research-based measures to judge institutional quality.
There are questions to be asked about the workloads of REF panel members but the authors of this piece appear not to have read The Metric Tide report (; or the background to the San Francisco Declaration on Research Assessment, which warns of the perverse and distorting effects of using journals to benchmark individual researchers or research papers (, and which is supported by analysis showing the highly variable citation performance of papers in any given journal ( – though of course considerations of the quality of individual work much go well beyond citation counting. A return to journal-based measures would undo much of the recent progress in developing more robust and holistic ways of assessing the qualities of research outputs.
I did a simple exercise when research dean at our institution, which was simply to weight each publication by the 5 year CIF, and then aggregate this up to a total score. It basically replicates the rankings without much issue (I did the same exercise when in Australia with similar results). In reality, no one publication matters much when the sample is 300 or 400 in a subject area and the reading exercise does little other than to add a different type of noise in the system (you replace random flawed human evaluation with random flawed citation data) but that noise is not really much different other than random. While individual academics do not like the metrics (they want their papers evaluated by peers -- believing that their work is somehow different from the norm of the process) this is a institutional flaw introduced by university managers pushing the aggregate evaluation system down onto individual level assessments. While every university says they do not do this, every university actually does this and hence the system becomes not a measure of overall research quality of a group but a process of enforcement of individual KPIs by managers. The requirement that everyone be in the process mitigates this a bit (other than then creating two classes of academics) as does the minimum requirement of 1 article, but the reality is that it is not the intent of the process that creates issues but how it is implemented. Similarly, journal lists are skewed to local journals ... indeed, in business we have the CABS list, which includes a number of dubious 4* journals ... one of which a colleague calls the "British Journal of Last Resort" because it give the illusion that you are publishing in a leading journal when it is a leading journal in the UK but not a leading journal anywhere else. The process could easily be simplified but the reality is that the process is not meant to determine who is the set of the truly best (this is blindingly obvious) but who is in what tier, particularly for schools wanting to be able to advertise they are in the single digits of the rankings.
I think it may have been William Starbuck who found (some years ago) that a great deal of the most innovative research had been published in second and third tier journals because the top-tier journals tended to be more risk-averse. It would be interesting to see if that's still the case.


Log in or register to post comments