The (predicted) results for the 2014 REF are in

Research team hopes that predictions will help to clarify the value of metrics in assessment

November 27, 2014

Who comes out on top based on h-indices?

Source: Getty

Click image to enlarge

A team of researchers is hoping that its predictions of the results of the research excellence framework in four disciplines, based on the submitting departments’ “h-index”, will help to resolve whether the next REF should rely more heavily on metrics.

Broadly, the h-index measures the number of a department’s citations versus the number of academic papers it has produced. A department that has published 50 papers that have been cited 50 times or more has an h-index of 50. The index is sometimes preferred to average citation counts because it supposedly captures both productivity and quality.

Dorothy Bishop, professor of developmental neuropsychology at the University of Oxford, claimed in her blog last year that a ranking of psychology departments based on their h-indices over the assessment period of the 2008 research assessment exercise “predicted the RAE results remarkably well”.

This month, a paper was posted on the arXiv preprint server that reports similar correlations for another four units of assessment from 2008: chemistry, biology, sociology and physics. The paper, “Predicting results of the research excellence framework using departmental h-index”, claims that the correlation is particularly strong in chemistry and biology. It also uses calculations of departments’ h-indices over the REF assessment period to produce predictions for the 2014 REF, the actual results of which will be published on 18 December. One of the paper’s authors, Ralph Kenna, reader in mathematical physics at Coventry University, said that as the results are not yet known, the predictions could be considered to be unbiased.

Some subpanels in the 2014 REF are allowed to refer to metrics but not to rely on them. The Higher Education Funding Council for England has commissioned an independent review into the use of metrics in research assessment. Professor Bishop has made a submission supporting the use of departmental h-indices.

Noting that the calculation for psychology departments took her only about three hours, she wrote: “If all you want to do is to broadly rank order institutions into categories that determine how much funding they will get, then it seems to me it is a no-brainer to go for a method that could save us all from having to spend time on another REF.”

But Dr Kenna questioned whether a correlation between h-indices and peer-review rankings of even about 80 per cent, as he had calculated for chemistry, could be considered acceptable as that would still mean that somedepartments would suffer the “tragedy” of being inaccurately ranked. He also feared that the adoption of the h-index by the REF would amount to a “torpedo to curiosity-driven research” as researchers would seek to maximise their own indices.

“If we are honest, we hope the correlation doesn’t hold. But we are just trying to do a neutral job that anyone can check,” he said.

paul.jump@tesglobal.com

Times Higher Education free 30-day trial

You've reached your article limit

Register to continue

Registration is free and only takes a moment. Once registered you can read a total of 6 articles each month, plus:

  • Sign up for the editor's highlights
  • Receive World University Rankings news first
  • Get job alerts, shortlist jobs and save job searches
  • Participate in reader discussions and post comments
Register

Reader's comments (17)

If metrics can explain an overwhelming majority of the results of the REF, then why not abandon the REF and go entirely based on metrics? Think of the savings in time and effort to the academics, who could actually get back to doing impactful research and teaching!
Humanities....coughs....
Any suggestion of metrics produces howls of protest from those who think it is too gross a way to measure quality. The problem is that the alternative - peer review - is also deeply flawed, as argued here in a blogpost by Derek Sayer: http://cdbu.org.uk/problems-with-peer-review-for-the-ref/. My original argument was purely pragmatic: of course there are numerous problems with an H-index but the question is whether it is any worse than alternatives - especially given its cost-effectiveness. In fact, I have recently done an analysis that suggests we could actually do away with the peer-reviewed REF and with metrics, and just allocate funding in relation to N active researchers in a department. See http://deevybee.blogspot.co.uk/2014/10/some-thoughts-on-use-of-metrics-in.html
Hi Dorothy I would agree with much of what you say - especially your proposal to do away with (or severely reduce) intrusive and distortive approaches. In earlier work we found that there is often a linear correlation between RAE measures of quality and group size up to a certain size which we termed "upper critical mass". This is actually a "Dunbar number" - it represents a limit on the amount of communication which can take place in a group. The academic Dunbar number, or upper critical mass, is discipline dependent (e.g., about 4 for pure mathematics, about 25 for experimental physics). Beyond the Dunbar number, quality is mostly independent of quantity and funding is essentially proportional to N, the number of academics submitted. This is in line with your own observation. For such "large" research groups, one could indeed allocate funding in simple proportion to N. Of course, one would have to have a good method to decide what N is, to stop game-playing -- to establish that each of the N academics submitted are bona fide researchers producing research in an identified discipline and above a quality threshold while at the same time nurturing early-career researchers. Additionally, peer evaluation could continue to help gauge the environments provided by universities and hence encourage their continuous improvement. For smaller groups, quality tends to be size-dependent. These may continue to need a greater degree of peer review to encourage their universities to support them and help them to grow in quantity (to promote cooperation) and quality. For this reason I don't think we can do away with evaluation altogether for small/medium groups. RAE has been good for many newer universities (the ones who typically have smaller research groups - including many "pockets of excellence" found by RAE2008) and has shown some of these universities the value of research and encouraged them to invest in research. So we may still need a driver to encourage support for small/medium groups to bolster them. Of course one would have to have a think to plug any gaps in an N-based system, but another advantage of it would be to put a halt to meaningless media rankings, wherein one group may be ranked above another even though they differ marginally in their RAE/REF scores (which do not come with error bars). Ralph Kenna
One problem that all this discussion ignores is that it considers the h-index as it is now. If income were dependent on h-index, it doesn't take much imagination to think about the bullying emails from managers urging you to increase your h-index by fair means or foul. The result would be an even greater corruption of science than has already resulted from obsession with silly metrics. The standard of evidence produced by bibliometricians for the efficacy of their snake oil is barely better than one would expect from a homeopath.
I get the point that metrics would be far faster and cheaper to use and that peer review has its flaws but I have a some concerns over the proposal that metrics alone be used. 1. The evidence of the link between citations and quality as far as I am aware largely comes from comparing RAE/REF outcomes to citations. To what extent did the panel members use citations to help them decide on rankings? If they used them (whether officially or not) then this puts a question mark over the findings. A correlation would be inevitable regardless of the validity. 2. The evidence generally shows a positive correlation between citations and peer review ranking but there is always a wide spread so that a simplistic application of some formula would results in many individual papers being ranked differently to how a panel would rank them. If we regard the panel as the expensive 'gold standard' then this is a problem. If we say, well the panel is wrong in those cases then we undermine the whole basis of the evidence linking quality to citations. 3. The use of metrics may be more reliable when averaged over a department but, realistically, within a department it is individuals who will be assessed by this process- with all the problems associated with point 2 above.
The original version of the article had suggested that with a correlation of 80%, there would still be 20% of departments suffering the tragedy of inaccurate ranking. Amusingly, that statistical misinterpretation appears hastily to have been removed (in truth the nonshared variance would have been distributed throughout the population) though no one noticed that correlations scale from -1 to +1, and the quoted figure is probably a coefficient of determination instead. Even so, the logic is still highly questionable. Surely no-one claims any system, least of all peer review, is completely accurate (or even as Derek Sayer notes, if it is even 'peer' review). So why is all the error or inaccurate ranking (whether a department tragedy or the catching of a lucky break) assumed to be the responsibility of the metric? Any system will create relative winners and losers. The point at issue here is the notion that with respect to transparency and cost effectiveness and the investment of scholarly and administrative time, the alternative to current REF practices is a clear winner, whilst delivering an outcome that highly matches what we currently have. It comes with issues, sure, but that's a good starting point.
John - You are referring to what was a journalistic misinterpretation, since corrected in the above text. You are advised to read the draft paper on the arXiv, which the article helpfully links to, for further details. Ralph.
Thanks: I had indeed read that paper before posting. I felt it was a useful analysis and informative contribution, and it was brave to make the prediction. The key issues remain though, as represented in the piece we are commenting on: (a) the common presumption that any deviation from a REF judgement, using an alternative such as the metric you use, is attributable entirely to inaccuracy in that alternative. Why do we not consider whether it is the REF ranking that is a tragedy? Why is the REF system assumed to be error-free? (b) at what point do we stop focusing on the mismatch and recognise instead the magnitude of the overlap, given the financial and logistical complexity of the current system set against the simplicity and cost effectiveness of the exercise you undertook?
Thanks John Regarding (a), I think nobody assumes that the REF is error-free. The REF is riddled with flaws and these are well documented. But the point is that REF is special because it exists already - it is in use already and accepted by the powers that be. So if a new (cheaper and less intrusive) scheme matches REF well, perhaps that will be accepted instead. Obviously we are not talking about anything close to exactness here. Regarding (b), yes - that is the point of Dorothy's suggestion to match funding with N. This is quite reasonable, I think, as a first approximation. But one should keep in mind that REF (or whatever) can be (should be) used as a driver to improve the volume and quality of research by improving the conditions that allow top-quality, curiosity-driven research to thrive. A simple metrics based system alone will not do that. E.g., using the h-index alone will spur people to chase h-indices by working strategically on fashionable areas rather than chasing curiosity. Many (myself included) would regard curiosity-driven research as a raison d'être of universities and a foundation for science. (Industry does finance-driven research, for example, which is a different thing. ) A system based on metrics would undermine that. That is one reason why metrics are so dangerous. Ralph
Brave of you to publish your predictions, and I am sure that every REF lead dreams of an algorithmic REF, whilst at the same time dreading the algorithmic game playing that would result. It's good to get the debate started early for 2020. I was rather taken aback by the throw away line in the caption to the prediction tables: 'Scopus data were not available for some HEIs due to technical reasons and these are omitted from the corresponding lists.' I found only a cursory descroption of the filtering process by which the data had been extracted from Scopus, and I'd like to know more. In the case of Physics (the only table I checked) I think that ten of the 42 REF2008 departments have been excised. This indicates rather a large hole in the data. I expect that correlations and rankings surivive omitting departments, but it left me wondering how complete the data is for the remaining departments. After all, the most important problem for bibliometrics is ensuring good data as a starting point; this is a particular problem in my area (computing) but I would have expected the Physics data in Scopus to be rock solid. Do you have a more detailed discussion available of the process that you used to collect the data? It is quite hard to reproduce your results as the paper stands.
Dear Adrian, Indeed 10 HEIs for Physics were omitted (6 for Biology and 1 for Sociology). This is because some HEIs are not included into the list "Affiliation" after refining the search results. To be able to compare RAE scores, h_2008 and h_2014 - we considered only HEIs which are in all three lists. Naturally, 10 of 42 is really big gap, but we rank the remaining 32 HEIs independently, not taking into account the skipped ones. So, this should not influence on ranks (of the 32). Concerning the search procedure: (i) First, we searched all the papers which correspond to "United Kingdom" in the field "Affiliation Country". (ii) Secondly, we refined our results to get only the papers which were published between 2001 and 2007 (for h_2008) or between 2008 and 2013 (for h_2014). (iii) And finally, we choose the corresponding HEI from the list "Affiliation Name". Some HEI are not included in this list (probably, HEIs with small number of papers according to search request). Regarding the quality of the data, our approach fast and cheap and certainly could be improved upon (if this would be considered a worthwhile endeavour). E.g., the outputs and individuals involved in our determination of the departmental h-indices from Scopus are surely different to those used for RAE/REF (but one expects some overlap). We mention this in the paper, the spirit of which is rather to see what sort of job the h-indices might do - as a first approximation. If it is good, it may be worthwhile pursuing this further. If it is bad, perhaps one should abandon the search for a metric to replace REF. Our paper is not yet published (except for the draft on the arXiv) and we are grateful for comments on how to improve it and clarify aspects like the above. Thank you for your interest!
Why don't we just use post code? Of course, to an extent that already happens. While REF might conceivably be assessed in an objective manner, the formula for the distribution of funds isn't generated until after the results are known. The reason for this is presumably to ensure that an 'appropriate' proportion of money goes to the correct post codes within the golden triangle. Based on the past 2 exercises, despite having improved our rank, we received less money. Go figure.
I think Mike Eysenck and I were among the first to correlate publication metrics with RAE/REF results and, reading the forgoing comments, I am struck by how little has changed in the intervening 12+ years. For one discipline (psychology), we correlated citations made in 1998 with the 1996 and 2001 RAE results and found correlations above 90%. The limitations of metrics voiced then were the same as those in circulation now (we discussed many of them in our report; some of them are valid). Our bottom line was that whatever it is that the RAE (REF) measures, citation metrics measure the same thing. Either approach can be criticised as severely as one wishes but in the face of such a high correlation, nether can be singled out for greater criticism than the other. There was in 2001, and still is now, a strong prima facie case for replacing the RAE/REF expert panels with metrics, purely on pragmatic grounds: saved time and expense. The argument was not heeded for REF 2014 but perhaps, with confirmation that the now-popular h-index also correlates strongly with REF, it will be considered for REF 2020 on these relative grounds, rather than being shot down for its absolute weaknesses. Our original report can be downloaded here: www.pc.rhul.ac.uk/vision/citations.pdf
The results do not seem to match these predictions- to say the least!
If you want to know the outcome of these predictions, see the report in Times Higher from February 2015: "Hit and miss metrics: ‘Throw of dice would give more accurate REF prediction’", THE number 2115, 12-18 February 2015, page 6.
Here is the link: http://www.timeshighereducation.co.uk/news/academic-estimates-real-cost-of-ref-exceeds-1bn/2018493.article

Have your say

Log in or register to post comments

Most Commented

question marks PhD study

Selecting the right doctorate is crucial for success. Robert MacIntosh and Kevin O'Gorman share top 10 tips on how to pick a PhD

Pencil lying on open diary

Requesting a log of daily activity means that trust between the institution and the scholar has broken down, says Toby Miller

India, UK, flag

Sir Keith Burnett reflects on what he learned about international students while in India with the UK prime minister

Application for graduate job
Universities producing the most employable graduates have been ranked by companies around the world in the Global University Employability Ranking 2016
Construction workers erecting barriers

Directly linking non-EU recruitment to award levels in teaching assessment has also been under consideration, sources suggest