Do national research assessment exercises still pass peer review?
Source: Alfred Gescheidt/Getty
“It’s bureaucratic insanity with predictable results – the proliferation of journal articles that no one reads, science that is not reproducible and millions of pounds spent on pointless assessment.”
That is the verdict of Julia Lane on what she regards as the “complete madness” of the UK’s research excellence framework (REF). “It also does little to convince politicians to hand over more money for research,” adds Lane, a labour economist at New York University, who was herself asked by President Obama to make the case for higher science spending, resulting in the Star metrics programme.
Of course, questions about the REF’s utility and cost, estimated at almost £250 million for the 2014 exercise, have been raging for years. But while they have been endlessly scrutinised by numerous reviews – most recently, the 2016 report by Lord Stern of Brentford – the REF’s critics claim that the resulting changes have amounted to little more than tinkering round the edges of a process used to distribute more than £1 billion a year in research block grants. With the submissions phase of the 2021 exercise now over and a major international review under way, many observers expect a far more radical rethink this time around.
Even those in charge of UK science would seem to have little love for the REF, at least in its current incarnation. Four years ago, Dame Ottoline Leyser, chief executive of UK Research and Innovation – whose Research England subdivision runs the REF – argued that the exercise’s focus on individuals should be abolished. Instead, universities should be required to submit a set number of “outputs” to each subpanel, regardless of who wrote them, based on the number of their academics who work in that particular research area. This reform, argued Leyser, in her previous capacity as the Royal Society’s policy lead, would remove the obligation universities feel to only recruit applicants “with the ‘right’ sort of outputs”, allowing them to take a broader view of scientific contribution.
Ministers and Whitehall mandarins, meanwhile, are likely to be more anxious than ever for cost savings in the wake of the pandemic, so the REF’s historical reliance on peer review could be replaced, at least in part, by bibliometrics. The appointment of an international advisory panel, rather than a UK insider, to review the exercise further opens the door to wholesale change.
That said, an international consensus on whether and how research evaluation should be carried out is far from apparent. The UK is not the only country that has been taking a hard look at whether its own framework is fit for purpose. In June, the Australian Research Council agreed to implement recommendations from a wide-ranging review of its Excellence in Research for Australia (ERA) exercise, ahead of the next iteration in 2023. Changes include a recalibration of its “world standard” rating, after 90 per cent of Australian research in the most recent exercise in 2018 met or exceeded that standard: a “scarcely believable” figure that critics said had devalued the exercise. ERA's dual approach of assessing science via citations and humanities by peer review is also claimed to unfairly penalise the latter.
In New Zealand, doubts have been raised over the transaction costs of the individual-focused, peer review-driven Performance Based Research Framework (PBRF). Since its inception in 2002, the exercise has awarded at least half of its funding to two universities – Auckland and Otago, leaving others scrapping over what is left of the NZ$315 million (£160 million) associated funding. Amid concerns that the exercise also undervalues locally focused research, a government review of the exercise was conducted in 2020, although the resulting changes, announced in July, were modest.
In Italy, meanwhile, some scholars say the bibliometrics-centred approach of its own exercise, introduced in 2010, has encouraged cheating by means of “citation exchange clubs”, leading to a more insular academy and the mere illusion of improvement.
Nevertheless, no other country has sought to copy the REF. “We discussed different types of research assessment for a whole day in Copenhagen – not one person said they wanted a REF-like system,” recalls Gunnar Sivertsen, research professor at the Nordic Institute for Studies in Innovation, Research and Education in Oslo, Norway, on his involvement in Denmark’s talks on research assessment two years ago. Sweden – where, as in the Netherlands, each institution runs an internal research evaluation with the help of international experts – also rejected a UK-inspired evaluation system back in 2016, he adds.
Institutional self-evaluation, rewarded with government funding, is, for Sivertsen, a better way to encourage universities to pursue excellence in their respective strengths. “All countries make a distinction between research-intensive and ordinary universities and allocate research funding accordingly – do you really need a REF to justify how you do it?” he asks.
This is one of the big questions that UKRI’s eight-strong advisory panel, containing figures from industry, academia and overseas research bodies, is beginning to address. Its chair, New Zealand’s former chief scientist Sir Peter Gluckman, tells Times Higher Education that the panel will not shy away from “provocative” questions about the REF’s future. “I am not someone who is afraid to say the emperor has no clothes,” says Gluckman, who believes that one of the big strengths of the REF – known in its pre-impact incarnation as the research assessment exercise – is that “it has never been static and has always been open to frank review”.
However, those hoping for a complete overhaul may be disappointed, he suggests. “We shouldn’t throw out the baby with the bathwater or ignore what has been achieved through the REF and other research practices, namely turning Britain into a science superpower under some difficult circumstances,” he says. “There are, of course, some logistical issues – some people say it is very bureaucratic, though some would argue that this comes from universities rather than the state itself.”
Gluckman’s panel has yet to get into the weeds of the framework and has instead focused on fundamental questions that will inform their thinking. One key issue is “what does excellence mean?”, he explains. “You could have a team that has a hypothesis that is very interesting and does a brilliant piece of work, but the results are disappointing and appear in a minor research journal. Another team might do some rather less impressive research that produces an outcome of fundamental importance that is published in Nature.” A world-class system should have space for both outcomes because having a “culture that promotes excellent research activity, even when the results are not that exciting” is vital, he says.
Leyser recently repeated her concern that despite this REF iteration’s reduction in the required number of outputs per submitted researcher from four to one, the exercise remains too focused on individuals. Echoing such sentiments, the review panel will also consider how team science can be more effectively supported, says Gluckman. “There is no doubt that the nature of enquiry has changed,” he explains. “While I accept there is a big difference between how genomics and humanities researchers operate, there is a much greater sense that, for many subjects, team-based, transdisciplinary research is becoming much more impactful when it comes to solving global and national problems. The Stern report recognised this problem in 2016 but I’d argue the world has moved on a lot since then.”
Gluckman emphasises that while there is no “perfect system”, whichever one is adopted “must be seen to be fair” given that the future of departments and livelihoods are at stake. “And you cannot have that without transparency,” he adds. “No one would say it was fair if five experts came in from around the world and decided to give this much to Oxford and Cambridge, or this much to Bristol.”
Some might also see that comment as a coded signal that the primary role of peer review by UK academics will be retained, given the controversies that continue to swirl around the obvious and frequently touted alternative: metrics.
At present, some 400 subject experts, assessors and specialist advisers across four main panels and 34 subpanels are scrutinising the thousands of outputs submitted by universities before the 31 March submission deadline – a 10-month effort that, in the 2014 iteration, was estimated to have cost £19 million in panellists’ time. A far larger sum – £212 million, or about £4,000 per submitted researcher – was spent by universities collectively on the submission process, while £55 million was spent preparing some 7,000 impact statements, according to official estimates.
Adam Tickell, vice-chancellor of the University of Sussex, is leading a government-commissioned review of research bureaucracy that is due to report its interim findings in the autumn. Tickell, who will take over at the University of Birmingham in January, is aiming for “bold rather than timid” recommendations on cutting red tape, and he concedes that “the REF creates quite a burden for institutions in the run-up to the deadline”. However, the exercise comes around only every six or seven years, and “if you compare the REF to individual [grant funding] rounds, it is much more efficient and much less burdensome,” he argues.
But others fear that the official £250 million cost of the REF may be a considerable underestimate, while a significant number of academics insist that citation metrics would produce similar assessments at a fraction of the cost.
“I just can’t see the added value of reassessing publications that have already been assessed by two to five experts in the field,” says Anne-Wil Harzing, professor of international management at Middlesex University, who was able to create a REF ranking virtually identical to the 2014 outcomes over the course of two hours on a rainy Sunday afternoon in 2017.
Nor is she the only academic to claim that metrics offer a much cheaper, quicker and more up-to-date snapshot of research quality that leads to similar distributions of research funding. But the landmark review of metrics carried out by James Wilsdon, Digital Science professor of research policy at the University of Sheffield, in the wake of the 2014 REF came down against any significant expansion in the use of bibliometrics because of widespread scepticism about their reliability and a sense that peer review remains the “gold standard” of research assessment.
However, the case for a “more streamlined, metrics-driven system that isn’t a nightmare to administer” has become increasingly compelling since then, says Steve Fuller, Auguste Comte professor of social epistemology at the University of Warwick. For instance, the REF’s system of disciplinary subpanels doesn’t encourage cross-disciplinary research, Fuller contends. “There is also the fact that each iteration makes less and less difference to the outcomes – things do not change that much so the return on investment is diminishing,” he adds.
More broadly, Fuller wonders if the REF may have outlived its initial purpose when it was conceived in the 1980s to bring some accountability to a research sector where institutional laxity let some scholars produce almost nothing while outstanding young minds were forced to leave for the US. “There was a lot of concern about Britain’s brain drain and general levels of research productivity, so the REF [known then as the research selectivity exercise] was partly about bringing in new people to create a research culture that was lacking in Britain,” recalls Fuller, who himself arrived in the UK from the US in 1994 as part of a REF hiring round.
If team science is the new priority then an assessment involving bibliometrics would be a better way to do it, Fuller contends. “If you look at those people with enormous h-indexes, it’s usually because they are working in teams – their citations reflect how they work across a number of groups,” he says.
But framing the future of the REF as a face-off between bibliometrics and peer review is a trap that has hampered previous reviews, says Sivertsen, an adviser to several national research evaluation reviews across Scandinavia, who will give evidence to the Gluckman panel. “Academics will always choose peer review, so the changes made are always limited,” he says.
Could the inclusion of non-bibliometric indicators offer a better way forward? Under New Zealand’s PBRF, 60 per cent of funds are allocated based on peer review assessments of individual researchers’ portfolios, but the remaining 40 per cent rides on institutional-level evaluations of the number of research degrees completed (25 per cent) and the amount of research income received from external sources (15 per cent).
“One reason for having universities is that they generate the high-level conceptual thinking that a workforce needs, so research degree completions recognises this,” says Roger Smyth, an independent tertiary education consultant and the former head of tertiary education policy in New Zealand’s Ministry of Education. “Research income from the private sector also gets a higher weighting, as does income generated from overseas,” he adds, because this incentivises collaboration with industry and international institutions.
One frequent complaint about the PBRF from academic unions is the “elaborate CVs” and evidence portfolios that each researcher must compile for submission, admits Smyth, who was involved in the exercise’s creation in the early 2000s, when he was a government adviser. Could a more team-based approach, as suggested by Leyser, solve such concerns? “Many people believe this would be a good thing, but the problem, as the latest review panel concluded, is that there is much more potential for game-playing,” he argues. Universities might, for example, seek to hide their weaker research scientists by recording them in the history department, thereby skewing results, suggests Smyth.
That said, the impact of game-playing matters less in New Zealand, which distributes a far lower proportion of research income via its PBRF than the REF, says Smyth; Auckland’s annual budget of NZ$1.1 billion is almost four times the size of the entire national PBRF budget, for instance. “There has always been anxiety about a British system because the financial consequences are so high, including the threat of closing some departments down and moving them to service teaching if they don’t perform,” he explains. “That’s less of a risk with individual assessment, where you might have one or two excellent researchers of value to an institution, though their department isn’t outstanding.”
What about the Australian model? There has been "a lot of back and forth" in policy terms between UK and Australia over the years, “but they will now need to look at very different things”, says Gemma Derrick, senior lecturer in higher education at Lancaster University, who was part of the advisory board for the ERA review. Derrick predicts that the UK’s review will seek to broaden the variety of outputs that can be submitted – adding software to the list, for instance – and expand the REF’s scope to include research-related staff currently not eligible for submission. “Looking beyond authorship to the wider research community in universities is a good idea but will also lead to more administrative paperwork and increase the REF’s burden,” she notes.
Paul Wellings, who recently retired after a decade as vice-chancellor of the University of Wollongong, having previously led Lancaster for nine years, also sees different challenges for the REF and ERA, given their divergent emphases. “The REF has focused on quality by building up islands of excellence and has been a strong driver of reputation, while the ERA has never been used for reputational advantage – it assesses research, but its impact is modest,” he says.
For Wellings, the growing number of research outputs may pose a headache even for future ERAs, even though peer review is limited to subjects where citation analysis is not commonly used. “I’m not sure the Australian model is sustainable…with the volume of research outputs growing 9 per cent every year,” he says. Moreover, the massive loss of overseas fee income that has propped up Australian research activity could precipitate a rethink of the ERA’s stated role as a nationwide stocktake of discipline strengths and areas for development. “It has become a managerial exercise that will be severely challenged over the next few years,” he says.
With the Covid pandemic intensifying interest in the impact of universities’ research, Wellings also wonders whether UK policymakers will be quite so interested in a framework largely unknown to the public whose interest lies mainly in the institutional rankings it generates. “Governments are starting to ask how to reset the economy and will wonder if knowledge exchange is more important than just measuring excellence in research,” he says. In this respect, the nascent knowledge exchange framework (KEF) and Australia’s “more conservative” efforts in this area “may become a challenge to the REF and ERA”, he says.
Of course, the REF also measures research impact. But in pursuit of economic growth, governments seem increasingly willing to direct research funding towards priority areas. A case in point is the UK’s recently unveiled innovation strategy, which prioritises seven “strategic technologies”, following up on former science minister David Willetts’ “eight great technologies” initiative, launched in 2013. Such emphases could undermine the case for an assessment that is still primarily based on the excellence of publications and other “outputs”, according to Wellings (60 per cent of REF 2021 scores depend on outputs, versus 25 per cent for impact and 15 per cent for research environment).
An increased emphasis on teaching – reflected in another recent UK initiative, the teaching excellence framework – could also undermine the REF and ERA, Wellings thinks. He notes that some universities seem willing to discard subject areas where they excel in the REF if undergraduate recruitment is weak or simply no longer aligns with their more vocational focus: “We’ve seen this in Australia where some institutions are happy to have some areas which are vocational and teaching-only, rather than having a strong research culture and attempting to drive their reputation through research,” he says. “It feels like one of those moments in history…when institutions are having a larger reset as society asks if they are still serving its needs.” This confluence of factors may be “fairly lethal to the REF”.
So what about Lane’s approach to tracking the impact of research spending? The country’s diverse sector, with its variety of missions and funding sources, has not lent itself to a national research assessment exercise. But then along came the Star metrics programme (“Science and Technology for America's Reinvestment: Measuring the Effects of Research on Innovation, Competitiveness, and Science”), a partnership between the federal government and research institutions that mined existing datasets to examine the effect of university funding not only on regional employment and corporate prosperity but also health and other socio-economic factors.
Lane and various collaborators then followed up with Umetrics (Universities: Measuring the Impacts of Research on Innovation, Competitiveness and Science), which takes efforts to track the impact of research spending even further. Lane’s approach goes far beyond the REF’s focus on the direct effects of the research itself, and, she argues, does a far better job than the REF of documenting the full impact of the national research budget. Umetrics, for instance, takes account of how many people’s salaries depend on the country’s $100 billion annual research budget (about 720,000). Its latest report found that nearly 31,000 PhD students who were funded by research grants went on to get jobs in the US, while, in their first three years after leaving university, 64 per cent of research-trained graduate students were employed in the private sector, where they made an average of $94,000 per year.
“Ideas are transmitted through people, not publications, so any evaluation of whether research funding is successful or not should centre on people,” she says – hence her consistently harsh criticism of the REF.
Whether the UK will go in such a direction is open to question, particularly given that the KEF covers some of the elements that Lane focuses on. Moreover, while the review panel contains figures from Canada, Australia, the Republic of Ireland, South Africa and Europe, as well as New Zealand, there is no US representative.
Gluckman promises that “as an expert panel, all we can do is be honest, critical and provocative”. But he acknowledges that however honest, critical and provocative his report may be, its implementation will be entirely hostage to political fortune. Still, bold changes may still be ahead. The UK government’s recent R&D People and Culture Strategy, for instance, states that “across the whole sector, there is a strong business case for increasing the diversity of people and ideas and for working in partnership to drive progress based on what works”. And “frameworks, assessment and incentives at an institutional level” should encourage this.
“It is up to the UK government to determine what is done,” Gluckman reflects. But if history is any guide, the UK academic community will have a great deal to say on the matter, too. And if the verdict is largely negative, it will take a determined government to press on regardless.