Academic colleagues were ‘initially like rabbits in the headlights, absolutely panicking because they had never evaluated a case study before’
UK research can seldom have witnessed a cat placed more emphatically among the metaphorical pigeons than when the inclusion of “impact” in the 2014 research excellence framework was first mooted in 2009. So the coos of relief and triumph emanating from the sector since the REF results were announced on 18 December are doubtless eliciting purrs within the UK funding bodies.
Although the concept of assessing the impact of research on the basis of case studies had originally been developed for Australia’s abortive Research Quality Framework in the mid-2000s, this was the first time that such assessment would be carried out in practice. And the fact that the annual £1.6 billion quality-related (QR) research budget would partially ride on the outcome made a lot of academics extremely nervous, if not downright hostile.
The funding bodies were swift to make clear that cultural impact would score just as well as its economic equivalent, provided it had comparable “reach” and “significance”. However, hearts continued to flutter about the likely interpretation of these terms by the REF’s 36 assessment subpanels, and some observers predicted that the influence of impact – which counted for 20 per cent of institutions’ overall scores – would severely clip the wings of at least some established research powers.
The academics appointed to the panels felt the pressure, too. Malcolm Skingle, director of academic liaison at pharmaceutical firm GlaxoSmithKline, was one of the “research users” recruited to Main Panel A, which oversaw the life sciences. According to him, his academic colleagues were “initially like rabbits in the headlights, absolutely panicking because they had never evaluated a case study before”.
However, a series of “calibration exercises” early in the process helped to identify “hawks and doves” in scoring terms and establish a consensus on the standards to be applied.
“I was quite sceptical at first but I think [the assessment of impact] was wholly transparent and fair, and I fail to see how it could have been done much better,” Skingle says. “Compared with the outputs, case studies were pretty easy to review and assess. They were only four pages long, had a start and a finish, and if you weren’t sure about whether they were making fair claims, you could check the audit trail back to original research, or ask for corroboration if you needed it – but for the most part you didn’t.”
The REF results made two things immediately apparent. First, except at the margins, the established pecking order had not been overturned by impact’s influence: generally, universities that scored well for outputs also scored well for impact. Second, impact scored very highly across the disciplines, being awarded an overall grade point average of 3.24 (out of 4), compared with 2.90 for outputs.
One interpretation of the high scores is that the academics on the panels had marked leniently, lest their disciplines should be seen by funders and politicians to have lower impact than others. Willy Maley, professor of Renaissance studies at the University of Glasgow and a member of the English language and literature subpanel, admits that some academics did require a “reality check” in the calibration exercise from the research users (people from outside academia whose input he regards as invaluable) about how much impact they were really having beyond the college walls. But, according to Skingle, the opposite was true in the life sciences.
“The academics would look at something absolutely stellar and give it a 4*. So anything less, in their view, had to be marked lower than that. The users and international members eventually convinced them to imagine a 6* rating for the really stellar stuff. [That meant] you could still have case studies rated 4* that weren’t quite at that level but were still 4* by anybody’s reckoning.”
It has been widely noted that impact scores – and, hence, scores overall – were particularly high in the life sciences: the overall GPA given under Main Panel A was 3.50, compared with 3.17 for Panel B (physical sciences), 3.14 for Panel C (social sciences) and 3.13 for Panel D (arts and humanities). But, according to Skingle, this is only to be expected given the amount of funding pumped into those disciplines in recent years.
“They would need their arses kicking if they couldn’t get impact from that level of investment,” he says.
Whether the REF results will lead to even higher QR funding levels for the life sciences, as some have predicted, will depend on the details of the funding formulas. England’s formula will be announced by the Higher Education Funding Council for England towards the end of March. David Sweeney, director of research, education and knowledge exchange at Hefce, believes that the published impact scores are “fair and reasonable”, but also invites “all those interested in university research” to read the case studies, which were published in January, and “form their own judgements”.
Sweeney is anxious to see the evaluation of impact assessment currently being carried out by RAND Europe (also scheduled to be unveiled at the end of March). But he believes panel feedback already entitles him to say that the case study approach “worked effectively, has differentiated [between universities] and produced results the community is accepting”.
He adds: “The case studies confirm to me that academic research makes a vast contribution to society, and I am particularly pleased that its contribution to policy development and cultural life has been captured. It is not just about money.”
The case studies confirm to me that academic research makes a vast contribution to society. It is not just about money
Even the disapproval of such an ardent critic of impact as Philip Moriarty, professor of physics at the University of Nottingham, has been mildly assuaged by the high scores achieved by some non-commercial impacts: “I couldn’t go so far as to say my opposition to impact has mellowed, but it is encouraging, at least, that public engagement seems to have been taken seriously,” he says, citing Nottingham’s Sixty Symbols online science videos as an example.
However, as Dorothy Bishop, professor of developmental neuropsychology at the University of Oxford, has pointed out (“Good works”, Times Higher Education, 29 January), public engagement “only really counted [in the REF] if you could point to a piece of research that changed people’s behaviour”.
Given the largely favourable reception, it seems inconceivable that impact will not be part of the next REF, likely in 2020. Nevertheless, the results this time around have thrown up significant concerns that need to be addressed.
One is the heavy weight, in terms of overall score, carried by each impact case study. This was because, roughly speaking, only one case study was required for every 10 academics submitted, meaning that the difference between a 4* (outstanding) and a 3* (very considerable) rating could be significant. One solution would be to require universities to submit more case studies. However, given the concerns about the workload involved in preparing them – acknowledged by Sweeney – this seems highly unlikely.
But the issue will become more marked only if the funding councils fulfil their original intention of raising the impact weighting from 20 to 25 per cent in the next REF – as they were urged to do by Encouraging a British Invention Revolution: Sir Andrew Witty’s Review of Universities and Growth (2013).
According to one observer, the effective weighting of impact is already more than 25 per cent. In an analysis published today on his blog, Seb Oliver, professor of astrophysics at the University of Sussex, reveals that because the scores for impact (and, indeed, environment) typically show a wider variation than for output, they in effect count for more than their nominal weighting in determining the overall scores. Only in public health, health services and primary care did impact have an effective weighting of less than 20 per cent (namely, 19.6), while in physics and sociology it reached almost 39 per cent: higher, in both cases, than the effective weighting of outputs. Overall, impact’s average effective weighting in the REF was 29 per cent, while for outputs – which officially counted for 65 per cent of overall profiles – it was 47 per cent.
Oliver speculates that impact’s wider spread of scores is partly because of its novelty, meaning that “some units of assessment didn’t know how best to present or select their best impact”. By 2020, views are likely to have crystallised around what constitutes a good case study in each discipline. But Oliver also notes that the low number of case studies compared with outputs inherently presents a larger “margin for error” in submissions.
Another possible way to compensate might be to cap the amount of QR funding distributed on the basis of impact scores to 20 per cent of the total. But this would still not correct impact’s disproportionate effect on the scoring itself, which is also important in reputational terms. Oliver suggests that the funders should consider standardising the scores for the REF’s different elements before combining them.
“I am not anti-impact, but if it has such a high effect on their overall position in league tables, it will drive the universities to focus disproportionately on that metric,” he says. “There is a danger that all the academics in the country [could] start diverting significant effort away from their research and into impact and I am not sure we want to go that far. I am not sure policymakers were intending [the effect of assessing impact] to be that significant.”
Indeed, there is evidence that universities have already cottoned on to the huge significance of the quality of each impact case study they submit. That would explain the highly disproportionate number of REF submissions that contain staff numbers just below the threshold for submitting an extra case study, as highlighted by THE in January.
Moriarty says: “It was as clear as day right from the start – to all but Hefce, it seems – that this type of game-playing would happen. Researchers across the country were excluded from the REF – with the concomitant morale-sapping effect this has – so that their departments could “play the numbers” on impact cases. That’s a pretty strong distortion: it remains to be seen to what extent exclusion could affect the careers of these researchers.”
Graeme Rosenberg, REF manager in the funding bodies’ REF team, admits the issue needs to be “looked at”.
Other issues likely to be examined by the funding bodies include whether greater “granularity” of impact grading could be attained by formally adopting Skingle’s imaginary extra star categories. The impact template, in which institutions set out how their case studies fit into an overall strategy for maximising impact, is also likely to be revisited. According to Maley, many institutions “struggled” with it, and he suggests it might be better rolled into the environment section of the exercise (which counts for 15 per cent of the overall score).
“Impact takes time and institutions will have to think that through,” he says. “Short-term goal-setting and expecting impact everywhere in a hurry doesn’t strike me as very sensible.”
Further enhancement of calibration methods is also possible. Steve Furber, chair of the computer science and informatics subpanel and ICL professor of computer engineering at the University of Manchester, would like hawkish and doveish marking tendencies to be formally corrected for statistically. Meanwhile, Dame Ann Dowling, chair of Main Panel B and professor of mechanical engineering at the University of Cambridge, advocates greater efforts to calibrate impact scoring across the main panels – although Skingle is sceptical that it makes sense to compare impact in the life sciences with that in, say, the humanities.
There is a danger that all academics in the country could start diverting significant effort away from their research and into impact
Then there is the workload issue. The funding bodies are likely to permit updated versions of 2014 case studies to be submitted in 2020, if by that time the impacts have become more mature. Finding evidence of impact is also likely to be made easier by the systems universities have now put in place to help them track it as it occurs. However, Jonathan Adams, chief scientist at technology company Digital Science, which is working with King’s College London on analysing the case studies, speculates that assessors next time around will be less surprised by how much impact universities unearth – and will therefore be harder to impress.
One obvious way to cut the workload would be to ditch case studies and turn to metrics instead – an idea being mulled over by an independent review commissioned by Hefce. As regards impact, “altmetrics”, which capture data such as social media mentions, are sometimes suggested, but no one Times Higher Education spoke to believes that they yet amount to an adequate replacement. The RQF’s replacement, known as Excellence in Research for Australia, focuses on innovation statistics, such as the number of patents registered and the volume of commercialisation income. But Claire Donovan, reader in science and technology studies at Brunel University London and part of the team that developed the RQF, warns the UK sector not to commit “metricide” and embrace a measure it knows to be flawed just because it is weary of REF returns. Even if “wonderful data” could be produced, the information “still needs to be set in some kind of context”, she says. And the fact that metrics typically favour the sciences opens the way to “philistine arguments that humanities don’t have any impact so why should they receive any public funds”.
Adams agrees that a move to any standardised metrics would be “absolutely bonkers” because of the sheer complexity of how research actually makes an impact. And he notes that even a shift to metrics would probably not cut universities’ workloads since they would “still put a wholly disproportionate amount of effort into making sure they maximise their presentation on those indicators. Academic culture is so driven by the focus on the REF that it can’t self-regulate the amount of effort put in.”
Despite the positive noises coming from the panels, some observers continue to regard case studies as fundamentally flawed. Patrick Dunleavy, professor of political science and public policy at the London School of Economics, has memorably dismissed them as “fairy tales of impact”. And while Adams predicts that the rest of the world will be quick to follow the UK’s lead, the US is pioneering an altogether different approach that involves systematically tracking the impact of university trainees (see box, opposite). According to one of its architects, Julia Lane, institute fellow at the American Institutes for Research, asking academics to track their own impact is ludicrously amateurish. “If you give me £2 million, I can tell you I have had an impact,” she says. However, she adds, the measure of impact should be relative to an appropriate counterfactual about what would have happened if the money had not been spent.
“How is a biochemist, with his own little view of the world, going to figure that out? That is not science, it is storytelling. You need to unpick the process to inform the way we do research rather than saying: ‘We are just really good, keep sending money’ – which is all case studies do.
“I am not against stories, but you want to be able to summarise it to a minister in fewer than 7,000 case studies. How many ministers are actually reading them?”
But Adams is enthusiastic about the capacity of case studies to demonstrate universities’ “pervasive” impact on education, society, welfare, health, law, policy, the economy, the environment and culture.
“No other country has so much information about what research in universities is actually delivering,” he says.
Skingle agrees, arguing that case studies, especially those located in their own geographical areas, are well placed to enthuse MPs and the public.
“The percentage of GDP being spent on R&D means nothing to Joe Public, but if you tell them about which engineering or medical project has come to fruition…that has got to be a good thing,” he says.
He is also clear that the inclusion of impact in the REF – as well as in research council grant applications – has made universities more anxious to engage with industry.
But Barbara Pittam, director of academic services at the London-based Institute of Cancer Research, agrees with Lane that writing case studies is “always going to be painful because it is never your data” they depend on: “It is about what happens to your results externally.”
And despite her institution’s focus on “making a difference to patients” and its top rank for impact in THE’s ranking, she still fears that the impact agenda will distort research priorities.
“Even for us, it still feels like the tail wagging the dog. We have deliberate strategies to create impact, but we are very clear we can’t do so without absolutely fundamental science,” she says.
“There is clearly a value to being able to tell impact stories, but whether that should be part of the assessment of actual research, I don’t know. I am not sure that, politically, that is a question we have been able to ask.”
‘Not only straightforward but also quite interesting’: the panellists’ viewpoint
“We went into the exercise somewhat concerned about how easy it would be to make sensible assessments of impact case studies, and came out rather happy and a little surprised it had turned out to be not only relatively straightforward but also quite interesting.”
This is the view of Steve Furber, chair of the computer science and informatics subpanel and ICL professor of computer engineering at the University of Manchester. He said the impact templates (which counted for 20 per cent of the total impact score) were read by two academics and one “research user”, while the case studies were read by two users and one academic.
The academic led on assessing whether the underpinning research was of at least 2* quality (which was not true in all cases, leading to an “unclassified” grade). For this reason, the burden of impact assessment on the academics was relatively light. Furber’s subpanel registered the lowest impact GPA of any subpanel, but still scored 2.99. And the “huge diversity” of case studies submitted conveyed “a strong message that there is work with impact going on right across the sector – not just where you would expect to find it in high-end institutions”, he says.
Meanwhile, Willy Maley, professor of Renaissance studies at the University of Glasgow and a member of the English language and literature subpanel, also reports finding impact “more readily assessable than people might have expected beforehand”. And while he laboured over impact templates, he found the case studies just as engaging to read as the outputs, and relatively straightforward to grade, too.
Experience of handling submissions referred to more than one panel (such as interdisciplinary research) also convinced him that similar standards were being applied across the board. “I approached impact as a sceptic because I am interested in older ideas of the universities being, in some cases, necessarily insulated so the real, slow work that will produce impact can take place,” he says. “But, by and large, [the REF assessment] worked and if it worked for a discipline such as mine, which might not be obviously oriented in that direction, that to me is a good sign.”
Bulletin sounding board: the case study writer
Chris O’Brien, communications specialist at academic consultancy Bulletin, estimates that his firm had varying degrees of involvement in between 400 and 500 impact case studies across a wide range of disciplines and universities, despite “not widely marketing ourselves as a REF consultancy at the time”.
“There was definitely a level of panic among universities because it was the first time they had written case studies,” he observes.
Many of the examples he saw initially failed to meet even the basics of eligibility, describing impacts that occurred outside the assessment period or not based on any underpinning research.
As managers and academics got their heads around the guidance, the quality improved. However, Bulletin’s services remained in high demand, with some universities involving the firm in drafting or advising on virtually all their case studies.
Typically, O’Brien would receive academics’ first drafts – which varied wildly in “complexity and coherence” – and then liaise with the authors on improving them – beginning with an hour-long interview. One common tendency, he notes, was for academics to “undersell” their impact. Many also struggled to construct a coherent, digestible 750-word narrative bringing out the key points. The fact that humanities researchers were, in general, better at this offset the greater difficulties they typically had in “quantifying or evidencing their claims succinctly”.
“I saw a lot of case studies relating to medical advancements or technological developments that were still in a very early stage of having an impact [via a marketable product]. So I don’t feel that the fears from some quarters that humanities would be at a disadvantage were borne out,” he says.
O’Brien’s other key task was to make sure every claim was backed up with evidence. This often required prompting academics to seek testimony from the organisations they were claiming to have influenced. In a small number of cases – especially when the university’s interest in the case study was motivated by a desire to recognise “the research excellence and profile of the author” – the evidence proved impossible to marshal, prompting O’Brien to recommend that the institution abandon the study.
“[By doing that] we might well have saved the university from slipping a few places down the rankings,” he notes.
Tracing impact’s footprints: the case study sceptic
Julia Lane, institute fellow at the American Institutes for Research, is pioneering an approach to assessing impact that differs fundamentally from the REF.
For her, the key impact research makes is not via academic papers but through the training of students and postdoctoral researchers, who then move into other areas of the economy, taking their knowledge with them.
She cites the example of the range of fonts available on early Apple computers: the result, she says, not of “some calligrapher writing a paper”, but of founder Steve Jobs attending a calligraphy class at university.
She also points out that while the impact of papers is not geographically constrained, since they are available all over the world, the creation of a “thoughtful, literate workforce” is likely to have more local benefit, creating a “powerful story” to tell funders and politicians.
“If you can say that 70 per cent of graduate students and postdocs went into industry and the high wage sector, and those firms grew faster and had more exports than other firms in the economy, that is a compelling story and it is evidenced,” she says.
Such information is being made available to universities in the US, on an automated basis, through Lane’s federally funded STAR metrics and the related UMetrics programmes.
Lane’s next challenge is to capture research’s “additional social impact”. Her plan is to identify university leavers who “are going off and saving the world” by moving, for instance, into the non-profit sector. Although world-saving initiatives might be achieved by other means than employment, she notes that “in reality, in order to do anything, you have to have some kind of footprint, so we are trying to capture those footprints electronically”.
She admits it will be another step again to assess success in saving the world.
“But this is difficult stuff,” she says. “If this were easy it would have been done a long time ago. But, in five years, we have come a very long way down a much less burdensome path [than the UK].”
Case study examples from the 2014 REF
University of Bath research into engines’ “parasitic losses” led to an estimated 40,000 tonne reduction in carbon dioxide emissions by Ford engines in 2012, saving over £18.7 million in fuel
Algorithms developed at Cardiff University to improve data security in printing and network environments led to the development of patented software by Hewlett-Packard
University of Exeter research influenced pedagogy by pointing out that pupils with special educational needs often don’t require specialist teaching
Psychology, psychiatry and neuroscience
A new international standard of loudness arose from University of Cambridge research into how sound is perceived. This is now widely used by industry
A University of Birmingham academic’s monograph made a major contribution to developing the law of duress in Singapore and the Commonwealth
An Edge Hill University academic’s media appearances and campaigns improved public understanding of the Israel-Palestine conflict by “extending the range and improving the quality of argumentation and evidence”
King’s College London research into the contribution of Lord Byron and Romanticism to the creation of the Greek nation state in the early 19th century challenged the modern perception of Greek national identity
Research by a University College London academic provided the framework for a charity’s pioneering work on self-directed support in social care, which influenced the government’s Putting People First strategy