Evolution of the REF

As the 2014 REF census date approaches, Paul Jump talks to the architects of previous rounds of assessment about how it all began and their views on the research excellence framework

October 17, 2013

Source: Mark Airs

Per pound distributed, the RAE and REF are vastly cheaper than distributing the same sums via grant applications to the research councils

When Peter Swinnerton-Dyer exchanged the vice-chancellor’s office at the University of Cambridge for the chairmanship of the University Grants Committee in 1983, he was puzzled to find that no one on the committee could give him an “intelligible” explanation of how it allocated research funding.

Then, as now, teaching funding was distributed between universities predominantly on the basis of the number of students taught by each institution. But the committee, which consisted largely of senior academics, distributed the research funding passed to it by the Department of Education and Science according to a process “shrouded in mystery”.

“I think in fact they looked at last year’s grant and adjusted it for [recent] circumstances,” Swinnerton-Dyer, now 86, recalls. “Universities could go up or down but they had to make an ad hoc case” for increased funding on the basis of “some new development”.

However, the squeeze being applied to higher education funding by the Thatcher government in the 1980s quickly led the number theorist to conclude that such a lack of transparency would soon become untenable – not so much because of ministerial demands for more accountability, but because universities needed to be reassured that their shrinking allocations were determined fairly.

“There had always been rumblings of discontent but one could see the rumblings were bound to increase,” Swinnerton-Dyer says.

The UGC’s “very limited” resources meant the new mechanism had to be one that “didn’t demand very much effort on the part of the committee and, consequently, didn’t demand very much effort on the part of the universities”.

Swinnerton-Dyer’s solution was to establish a “research selectivity exercise”, the first of which ran in 1986, to assess the quality of university research. This was the precursor to the research assessment exercises that – unbeknown to him – would ultimately become such a significant feature of academic life in the UK.

Today, with only two weeks to go until the census date (the cut-off point for including staff) for the 2014 research excellence framework, it is a subject much discussed. But just how different was the process in those early days?

The 1986 exercise certainly contrasted in scale. The UGC’s existing subject subcommittees were charged with assessing departments in their area of expertise, chiefly on the basis of just five research outputs – books, papers or patents – from the previous five years, on which the department in question would be “content for its total output to be assessed”. This compares with four outputs per academic submitted today.

“We didn’t say ‘best outputs’ because that would have led to all sorts of disputes from philosophers,” Swinnerton-Dyer says.

Departments were also permitted to submit up to four pages of general description about their strengths, which could include industrial links as well as pure research prowess. Not everyone, however, followed this guidance. “One department totally ignored that and sent in 52 pages – we pulped the last 48,” Swinnerton-Dyer reveals.

He would have been prepared to confine the selectivity exercise to the sciences, which accounted for the bulk of research spending and which he sensed would find it most acceptable. But, in the event, humanities departments were keen to be involved.

One subject committee was so confident that it already knew all it needed to know about each university’s departmental quality, it produced a provisional classification before it received any submissions and “when it got all the extra evidence it saw no reason at all to alter any of the classifications”. But, for Swinnerton-Dyer, the subject committees’ pre-existing depth of knowledge – gathered partly through their regular departmental “visitations” – was a distinct strength of his design, and he laments that by the time of the second selectivity exercise in 1989, they, along with the UGC itself, had been abolished.

In its place was the Universities Funding Council, an essentially “non-expert body” chaired by former Cranfield Institute of Technology (now Cranfield University) vice-chancellor Lord Chilver. Swinnerton-Dyer became the inaugural chief executive.

Some academics complained that the first exercise had been biased towards larger departments and lacked consistency across subjects. However, judging by the scarcity of complaints he received, Swinnerton-Dyer concluded that universities were “essentially unworried” by it. “They welcomed the more transparent allocation machinery and accepted that it must involve a research assessment process of some kind and [recognised that] the process was very undemanding.”

But the Committee of Vice-Chancellors and Principals – the forerunner of Universities UK – did have one “real” complaint: that departments “hadn’t had the opportunity to put forward their full strength”. Wary of vice-chancellors complaining to ministers about his obstinacy, Swinnerton-Dyer permitted universities to submit two publications for every member of staff for the 1989 exercise – even though he thought it was “a silly idea”. Information was also sought on a department’s total volume of publications; Swinnerton-Dyer had chosen not to collect this in 1986 on the grounds that it would “only encourage the production of low-quality papers”.

There were other major changes, too. Consultation with universities resulted in the replacement of 1986’s 37 cost centres (subject areas) with 152 subject units of assessment, assessed by nearly 70 panels assembled for the purpose, significantly increasing the scale of the operation. Each panel would now use a new five-point scale to judge the quality of research. Necessarily “woolly” universal definitions for each point related to national and international standards. However, Swinnerton-Dyer emphasises that this did not make the quality of research in different subjects comparable on an absolute scale because, for instance, “you can be world-leading in dentistry research but still be pretty mediocre”.

Lord Chilver also vetoed Swinnerton-Dyer’s plans to further develop assessment of departments’ industrial links, which he accepts were “not very satisfactory” in the first selectivity exercise.

On funding, accounts of the history of the exercise differ. Swinnerton-Dyer recalls using the selectivity exercise to allocate the whole of the UGC’s and UFC’s research-related funding. However, other accounts, such as Roger Brown and Helen Carasso’s recent book, Everything for Sale? The Marketisation of UK Higher Education, cite evidence that only about 40 per cent was allocated on the basis of panels’ assessments, with the rest being distributed according to staff and student numbers, research grant income and volume of external research contracts.

Bahram Bekhradnia, director of the Higher Education Policy Institute, also recalls that the research selectivity exercise was not a “primary instrument” of resources allocation. As director of policy for the Higher Education Funding Council for England (which replaced the UFC in 1992), Bekhradnia was charged with deciding how the research funding mechanism should respond to 1992’s abolition of the “binary line” between universities and polytechnics.

By then, Swinnerton-Dyer and all other senior figures associated with previous selectivity exercises had moved on and, rather than poring over the past, Bekhradnia devised the rules afresh for what was to be known as the research assessment exercise or RAE.

He concluded that allocating research funding partly on the basis of student numbers would no longer be tenable once the teaching-focused polytechnics joined the system. The large expansion in university numbers would not be matched by a proportional rise in the overall research budget so, if large transfers of funding from research-active staff in pre-1992 universities to largely teaching-focused staff in the post-1992s were to be avoided, it would be necessary to allocate all research funding on the basis of the RAE.

But this meant that the RAE had to become “much more robust and rigorous” than the “embryonic” system Bekhradnia had inherited. Principally, this entailed setting out the rules more clearly and checking outcomes more thoroughly, since “when so much was at stake, we were going to be subject to legal challenge if we didn’t do it right”.

Despite Bekhradnia’s best efforts, the 1992 results were indeed challenged in the courts by the now defunct Institute of Dental Surgery. The funding council won the case, but the judge was “unimpressed” with its processes and “warned that administrative law was moving very fast in the direction of transparency and it was no good experts simply saying they used their expert judgement to decide whether this was a good or bad submission”.

Hence, panels in the subsequent RAEs were required to be even more explicit about their criteria and working methods. Rules about which staff could have work entered also had to be tightly stipulated given the decision to allow universities to submit only academics they deemed “research active”: this was a response to the fact that many academics in former polytechnics focused exclusively on teaching.

Mark Airs feature illustration (17 October 2013)

Large research institutions were able to ‘hide a very long tail [of poor work] and still get a 5*’…those in the tail were effectively being over-funded

The 1992 exercise was also the first to stipulate that eligible staff must be in post on a specific census date; previous exercises had assessed all staff in post for any part of the review period, sparking criticism that they were unduly retrospective given that staff often moved on. The resulting phenomenon of universities “poaching” each other’s top research staff ahead of the census date was addressed in the 2001 exercise by allowing departments to share credit for academics who had moved during the assessment period. But Rama Thirunamachandran, director of research, innovation and skills at Hefce between 2002 and 2008, disapproved of this measure and repealed it for the 2008 RAE.

“Although there was some staff movement around the time of each RAE it was not excessive,” he argues. “The key issue is that the results inform funding for the following six years. It was therefore important that funding followed individuals, rather than funding excellence relating to the past at someone’s former institution.”

Another one-off was 1992’s separate assessment scheme for pure and applied research, introduced to counter the accusation that the latter had previously been unfairly disadvantaged. According to Bekhradnia, this was dropped from subsequent exercises because most panels were unable to distinguish clearly between pure and applied work.

In 1996, Bekhradnia moved to end any suggestion that RAE success depended more on volume than on quality by no longer seeking any information about the volume of research produced – although he admits that this “silly” misconception lingers to this day.

The next major change to the RAE came in 2008. According to Thirunamachandran – who took up the post of vice-chancellor of Canterbury Christ Church University on 1 October – the single grades that had hitherto been awarded to departments hid a “multitude of sins”.

Large research institutions were able to “hide a very long tail [of lesser work] and still get a 5*”, he recalls. Since such departments received 5*-level funding for every academic they submitted, those in the tail were effectively being over-funded. Conversely, pockets of excellent researchers in low-graded departments were unable to access any quality-related funding. Thirunamachandran’s solution, which will be preserved for the 2014 research excellence framework, was to introduce more detailed “quality profiles”, which indicate the proportion of each department’s research in each quality category, with funding awarded to high ratings wherever they are found.

This new system had the added advantage, for Thirunamachandran, of mitigating the concentrating effects of confining funding almost exclusively to the top two grades on the scale, which “isn’t good for the dynamism of the system”.

However, Thirunamachandran’s efforts to cut the workload associated with the ever-expanding RAE – which, by now, had become a major concern – were not successful. A review he carried out with the late Sir Gareth Roberts in 2003 proposed an institutional opt-out for teaching-focused institutions in return for a base level of funding to sustain research capacity. However, this was rejected by the institutions, which, in Thirunamachandran’s estimation, judged that opting out of the “mainstream competition for research funding” would have sent the wrong message to potential funders and business partners.

Nor was the Treasury any more successful in its efforts to trim the cost of conducting the exercise, which in 2008 is estimated to have amounted to some £47 million across English universities. In 2006, then chancellor Gordon Brown announced – “completely out of the blue”, according to Bekhradnia – that the RAE’s reliance on peer review was to be replaced by an assessment that, especially in the sciences, was based on metrics such as citations, research income and postgraduate numbers.

Thirunamachandran did not regard such a system as plausible even in the sciences but, after convincing the chancellor that the 2008 exercise had proceeded too far under the old rules to be abandoned, he dutifully went about mapping out and consulting on Brown’s proposed system, before concluding that it was unworkable. However, as an inducement for the Treasury to agree to drop its proposal, Thirunamachandran offered instead to introduce an assessment of “impact” to what he had renamed the research excellence framework. (The new name aimed to free it from some of the negativity that had grown up around the RAE.)

“The Treasury was interested in the not-unreasonable question of ‘what does £1.6 billion a year in QR [quality-related] funding buy in practical, lay terms, economically, socially and culturally?’ That can only be articulated in terms of impact and, to some extent, it was a sort of unstated compact that if we could demonstrate and assess impact, the sort of mindless cry for metrics would subside from the Whitehall end,” he says.

He regards that compact as a “decent” one; it was his successor at Hefce, David Sweeney, who reached the controversial conclusion in 2011 that impact should count for 20 per cent of total scores in the 2014 REF.

Thirunamachandran admits that the assessment of research “impact” has added another element to the process of assessment but, because other elements have been simplified, he does not believe the effort required from universities and panellists has increased overall. He also argues that a significant amount of the REF-related workload is imposed by universities themselves, as it is “in our nature to cross the t’s and dot the i’s because we want the best possible outcome for our university”.

Both Thirunamachandran and Bekhradnia point out that, per pound distributed, the RAE and REF are vastly cheaper than distributing the same sums entirely via grant applications to the research councils. They also say there is no doubt that the introduction of the RAE has led to an increase in the quality of UK research, measured against international comparisons. In Thirunamachandran’s view, this increase had also convinced successive governments to invest vastly more resources in university research.

“The RAE has put a focus on research that didn’t exist pre-1986, when research was a residual activity that a proportion of academics did. Now it is really core and mainstream,” he says.

But he also acknowledges that this success has a downside: universities’ comparative neglect of teaching. Hefce has sought to address this with a variety of initiatives, such as the creation of the Higher Education Academy, an organisation focused on the quality of university teaching. But, according to Bekhradnia, a satisfactory answer will always be difficult to find when research funding is competitive and teaching funding is not. Nor would it be sensible to allocate teaching funding competitively, for the “utterly compelling” reason that it would not be “ethically sustainable” to say to a student: “You are going to study history at a university that is not very good at teaching history and, by the way, you are going to have fewer books in the library and less money per student and a worse staff-student ratio [than other institutions].”

Another of Bekhradnia’s regrets about the RAE is the “gamesmanship” that the necessity of stating the rules more explicitly has promoted. “The ideal RAE would be one where people wake up and find they got a score but don’t know how they got it. If being explicit helps people to change their practices in a way that makes things better then that is desirable, but I doubt if that is the case,” he says.

However, his objection is chiefly to institutions’ “wasteful” expense of energy and money on game-playing. Except in a few marginal cases, he doubts that universities’ efforts to skew the results to their advantage are successful.

Indeed, the increasing stability of results over successive RAEs has only heightened suggestions that it simply is not worth all the time and effort that goes into it, and the process has been likened to “a Frankenstein’s monster” that is out of control.

According to Swinnerton-Dyer, no one has ever seriously doubted that his 1986 scheme was “adequate for getting the estimates roughly right”. This is not surprising, since, in his view, “if the best five elements [of a department’s output] are good, the overall quality is probably good as well”.

For him, post-1986 iterations of the RAE “try to produce more information in more detail than a tolerable process could do”. He believes the rot really set in when vice-chancellors ceased to see the RAE as a funding mechanism and regarded it, instead, as a “free-standing assessment of research quality”, with the added advantage of being “useful as a means to get rid of people not doing any research or to make them do more teaching”.

“If that is what vice-chancellors want, they can conduct their own internal processes, but, nationwide, I don’t think such an exercise is justified,” Swinnerton-Dyer says. “I don’t think you can run an assessment of individuals without an absolutely intolerable level of work.”

So what does he think of the design of the 2014 REF and its much-debated and controversial requirement for universities to demonstrate the impact of their research? Despite his enthusiasm for assessing industrial links, Swinnerton-Dyer dismisses impact assessment, describing it as “a licence for lying” – since the evidence, in his view, is “uncheckable”. And while the presenting of next year’s results through quality profiles will give a “slightly more nuanced” picture of departmental quality, he doubts whether the additional work involved in obtaining it is merited.

“The one question a modern civil servant fails to ask,” says Swinnerton-Dyer despairingly, “is ‘is it worth the extra effort?’.”

Mark Airs feature illustration (17 October 2013)

A developmental process: how research assessment has changed

The first “research selectivity exercise” launched by the University Grants Committee. Subjects were divided into 37 “cost centres”. For each one, universities were asked to submit five outputs – which could include patents – and up to four pages of general description of the unit’s research and industrial strength.

Assessments were made by the UGC’s subject subcommittees and results were scored on a four-point scale from “below average” to “outstanding”.

The Universities Funding Council oversaw an exercise involving 152 subject units of assessment, evaluated by nearly 70 peer review panels. Units were asked to submit up to two publications for every member of staff. Information was also gathered on research student numbers and research income. Results were presented on a five-point scale related to national and international standards.

The Higher Education Funding Council for England ran the exercise, now called the RAE, on behalf of the four UK funding councils. For each “research active” member of staff in post on the census date, universities were asked to submit up to two publications and two other forms of public output. A letter grade in the results indicated the proportion of staff in each department submitted. An extra year was added to the assessment period for arts and humanities.

Separate assessments were planned for applied and basic research in science and engineering, but many panels struggled to differentiate the two.

About 2,800 submissions in 72 units of assessment were rated by 63 subpanels.

Submissions were audited for accuracy. No funding was given to departments assigned the lowest of the five grades.

Universities filed up to four publications per academic submitted.

Sixty panels assessed work in 69 units of assessment produced over four years for the sciences and six for the humanities. Each sub-panel published its assessment criteria and working methods ahead of submissions and more outside assessors, including academics from abroad and people working beyond the sector, were recruited. Data on output volume were no longer sought.

Two extra levels were added to the now seven-point assessment scale: 3 was split into 3a and 3b and 5* was added at the top. Departments in the lowest two categories received no funding.

Nearly 2,600 submissions were made to 69 units of assessment. Five “umbrella groups” of panel chairs in related disciplines met to try to achieve greater consistency and fairer assessment of interdisciplinary work. Submissions awarded top grades were reviewed by international experts. Reductions in the number of research outputs required were allowed for some staff in certain circumstances.

More feedback on results was given to vice-chancellors, and panels’ assessment of the strength of their disciplines was published.

Universities whose staff were poached ahead of the census date were permitted to submit two of their outputs, but all the funding still went to their new departments. Departments rated in the top two categories contained nearly 40 per cent of academics, compared with only 13 per cent in 1992.

Funding was withdrawn from departments rated 3b and the amount of funding for those rated 4 was steadily reduced.

The 67 subpanels were overseen by 15 main panels. Explicit criteria were introduced for the assessment of applied, practice-based and interdisciplinary research.

Results were presented as “quality profiles”, setting out the proportion of each department’s submissions that fell into five quality categories. Research in the top three categories was originally funded, but this has since been reduced to the top two.

The renamed research excellence framework will incorporate four main panels overseeing 36 subpanels. It will incorporate an assessment of non-academic impact, accounting for 20 per cent of the final marks. Treasury proposals for the exercise to be metrics-driven were abandoned. Figures collected by the higher education funding bodies indicate that UK universities plan to submit the research of some 54,300 academic staff for assessment in the REF, up from 52,400 in the 2008 RAE.

Sources: Hefce; The Evolution of the UK’s Research Assessment Exercise: Publications, Performance and Perceptions by Valerie Bence and Charles Oppenheim (Journal of Educational Administration and History, 2005); Everything for Sale? The Marketisation of UK Higher Education by Roger Brown and Helen Carasso (2013).

You've reached your article limit

Register to continue

Registration is free and only takes a moment. Once registered you can read a total of 3 articles each month, plus:

  • Sign up for the editor's highlights
  • Receive World University Rankings news first
  • Get job alerts, shortlist jobs and save job searches
  • Participate in reader discussions and post comments

Reader's comments (1)

There are two problem with all the methods described above [except metrics] 1. Peer Review is always to some extent corrupt. you favor your mates 2. All these methods attempt to evaluate departments. High impact research is created by individuals. The best and fairest method is to evaluate individuals by citations to their published work. About twenty percent will be found to be high impact researchers and they should receive about 80 percent of the available funds, the rest going to people new to the game who are trying to be recognized.

Have your say

Log in or register to post comments

Featured Jobs

Post-doctoral Research Associate in Chemistry

University Of Western Australia

PACE Data Support Officer

Macquarie University - Sydney Australia

Associate Lecturer in Nursing

Central Queensland University
See all jobs

Most Commented

women leapfrog. Vintage

Robert MacIntosh and Kevin O’Gorman offer advice on climbing the career ladder

Canal houses, Amsterdam, Netherlands

All three of England’s for-profit universities owned in Netherlands

Mitch Blunt illustration (23 March 2017)

Without more conservative perspectives in the academy, lawmakers will increasingly ignore and potentially defund social science, says Musa al-Gharbi

Alexander Wedderburn

Former president of the British Psychological Society remembered

Michael Parkin illustration (9 March 2017)

Cramming study into the shortest possible time will impoverish the student experience and drive an even greater wedge between research-enabled permanent staff and the growing underclass of flexible teaching staff, says Tom Cutterham