The (predicted) results for the 2014 REF are in

Research team hopes that predictions will help to clarify the value of metrics in assessment

November 27, 2014

Who comes out on top based on h-indices?

Source: Getty

Click image to enlarge

A team of researchers is hoping that its predictions of the results of the research excellence framework in four disciplines, based on the submitting departments’ “h-index”, will help to resolve whether the next REF should rely more heavily on metrics.

Broadly, the h-index measures the number of a department’s citations versus the number of academic papers it has produced. A department that has published 50 papers that have been cited 50 times or more has an h-index of 50. The index is sometimes preferred to average citation counts because it supposedly captures both productivity and quality.

Dorothy Bishop, professor of developmental neuropsychology at the University of Oxford, claimed in her blog last year that a ranking of psychology departments based on their h-indices over the assessment period of the 2008 research assessment exercise “predicted the RAE results remarkably well”.

This month, a paper was posted on the arXiv preprint server that reports similar correlations for another four units of assessment from 2008: chemistry, biology, sociology and physics. The paper, “Predicting results of the research excellence framework using departmental h-index”, claims that the correlation is particularly strong in chemistry and biology. It also uses calculations of departments’ h-indices over the REF assessment period to produce predictions for the 2014 REF, the actual results of which will be published on 18 December. One of the paper’s authors, Ralph Kenna, reader in mathematical physics at Coventry University, said that as the results are not yet known, the predictions could be considered to be unbiased.

Some subpanels in the 2014 REF are allowed to refer to metrics but not to rely on them. The Higher Education Funding Council for England has commissioned an independent review into the use of metrics in research assessment. Professor Bishop has made a submission supporting the use of departmental h-indices.

Noting that the calculation for psychology departments took her only about three hours, she wrote: “If all you want to do is to broadly rank order institutions into categories that determine how much funding they will get, then it seems to me it is a no-brainer to go for a method that could save us all from having to spend time on another REF.”

But Dr Kenna questioned whether a correlation between h-indices and peer-review rankings of even about 80 per cent, as he had calculated for chemistry, could be considered acceptable as that would still mean that somedepartments would suffer the “tragedy” of being inaccurately ranked. He also feared that the adoption of the h-index by the REF would amount to a “torpedo to curiosity-driven research” as researchers would seek to maximise their own indices.

“If we are honest, we hope the correlation doesn’t hold. But we are just trying to do a neutral job that anyone can check,” he said.

Times Higher Education free 30-day trial

Register to continue

Why register?

  • Registration is free and only takes a moment
  • Once registered, you can read 3 articles a month
  • Sign up for our newsletter
Please Login or Register to read this article.

Reader's comments (11)

If metrics can explain an overwhelming majority of the results of the REF, then why not abandon the REF and go entirely based on metrics? Think of the savings in time and effort to the academics, who could actually get back to doing impactful research and teaching!
Any suggestion of metrics produces howls of protest from those who think it is too gross a way to measure quality. The problem is that the alternative - peer review - is also deeply flawed, as argued here in a blogpost by Derek Sayer: My original argument was purely pragmatic: of course there are numerous problems with an H-index but the question is whether it is any worse than alternatives - especially given its cost-effectiveness. In fact, I have recently done an analysis that suggests we could actually do away with the peer-reviewed REF and with metrics, and just allocate funding in relation to N active researchers in a department. See
Hi Dorothy I would agree with much of what you say - especially your proposal to do away with (or severely reduce) intrusive and distortive approaches. In earlier work we found that there is often a linear correlation between RAE measures of quality and group size up to a certain size which we termed "upper critical mass". This is actually a "Dunbar number" - it represents a limit on the amount of communication which can take place in a group. The academic Dunbar number, or upper critical mass, is discipline dependent (e.g., about 4 for pure mathematics, about 25 for experimental physics). Beyond the Dunbar number, quality is mostly independent of quantity and funding is essentially proportional to N, the number of academics submitted. This is in line with your own observation. For such "large" research groups, one could indeed allocate funding in simple proportion to N. Of course, one would have to have a good method to decide what N is, to stop game-playing -- to establish that each of the N academics submitted are bona fide researchers producing research in an identified discipline and above a quality threshold while at the same time nurturing early-career researchers. Additionally, peer evaluation could continue to help gauge the environments provided by universities and hence encourage their continuous improvement. For smaller groups, quality tends to be size-dependent. These may continue to need a greater degree of peer review to encourage their universities to support them and help them to grow in quantity (to promote cooperation) and quality. For this reason I don't think we can do away with evaluation altogether for small/medium groups. RAE has been good for many newer universities (the ones who typically have smaller research groups - including many "pockets of excellence" found by RAE2008) and has shown some of these universities the value of research and encouraged them to invest in research. So we may still need a driver to encourage support for small/medium groups to bolster them. Of course one would have to have a think to plug any gaps in an N-based system, but another advantage of it would be to put a halt to meaningless media rankings, wherein one group may be ranked above another even though they differ marginally in their RAE/REF scores (which do not come with error bars). Ralph Kenna
One problem that all this discussion ignores is that it considers the h-index as it is now. If income were dependent on h-index, it doesn't take much imagination to think about the bullying emails from managers urging you to increase your h-index by fair means or foul. The result would be an even greater corruption of science than has already resulted from obsession with silly metrics. The standard of evidence produced by bibliometricians for the efficacy of their snake oil is barely better than one would expect from a homeopath.
I get the point that metrics would be far faster and cheaper to use and that peer review has its flaws but I have a some concerns over the proposal that metrics alone be used. 1. The evidence of the link between citations and quality as far as I am aware largely comes from comparing RAE/REF outcomes to citations. To what extent did the panel members use citations to help them decide on rankings? If they used them (whether officially or not) then this puts a question mark over the findings. A correlation would be inevitable regardless of the validity. 2. The evidence generally shows a positive correlation between citations and peer review ranking but there is always a wide spread so that a simplistic application of some formula would results in many individual papers being ranked differently to how a panel would rank them. If we regard the panel as the expensive 'gold standard' then this is a problem. If we say, well the panel is wrong in those cases then we undermine the whole basis of the evidence linking quality to citations. 3. The use of metrics may be more reliable when averaged over a department but, realistically, within a department it is individuals who will be assessed by this process- with all the problems associated with point 2 above.
John - You are referring to what was a journalistic misinterpretation, since corrected in the above text. You are advised to read the draft paper on the arXiv, which the article helpfully links to, for further details. Ralph.
Thanks John Regarding (a), I think nobody assumes that the REF is error-free. The REF is riddled with flaws and these are well documented. But the point is that REF is special because it exists already - it is in use already and accepted by the powers that be. So if a new (cheaper and less intrusive) scheme matches REF well, perhaps that will be accepted instead. Obviously we are not talking about anything close to exactness here. Regarding (b), yes - that is the point of Dorothy's suggestion to match funding with N. This is quite reasonable, I think, as a first approximation. But one should keep in mind that REF (or whatever) can be (should be) used as a driver to improve the volume and quality of research by improving the conditions that allow top-quality, curiosity-driven research to thrive. A simple metrics based system alone will not do that. E.g., using the h-index alone will spur people to chase h-indices by working strategically on fashionable areas rather than chasing curiosity. Many (myself included) would regard curiosity-driven research as a raison d'être of universities and a foundation for science. (Industry does finance-driven research, for example, which is a different thing. ) A system based on metrics would undermine that. That is one reason why metrics are so dangerous. Ralph
Brave of you to publish your predictions, and I am sure that every REF lead dreams of an algorithmic REF, whilst at the same time dreading the algorithmic game playing that would result. It's good to get the debate started early for 2020. I was rather taken aback by the throw away line in the caption to the prediction tables: 'Scopus data were not available for some HEIs due to technical reasons and these are omitted from the corresponding lists.' I found only a cursory descroption of the filtering process by which the data had been extracted from Scopus, and I'd like to know more. In the case of Physics (the only table I checked) I think that ten of the 42 REF2008 departments have been excised. This indicates rather a large hole in the data. I expect that correlations and rankings surivive omitting departments, but it left me wondering how complete the data is for the remaining departments. After all, the most important problem for bibliometrics is ensuring good data as a starting point; this is a particular problem in my area (computing) but I would have expected the Physics data in Scopus to be rock solid. Do you have a more detailed discussion available of the process that you used to collect the data? It is quite hard to reproduce your results as the paper stands.
The results do not seem to match these predictions- to say the least!
If you want to know the outcome of these predictions, see the report in Times Higher from February 2015: "Hit and miss metrics: ‘Throw of dice would give more accurate REF prediction’", THE number 2115, 12-18 February 2015, page 6.
Here is the link: