The research excellence framework is often held up as the “gold standard” in national research assessment: a giant exercise in peer review that no metrics-driven exercise could hope to replicate. But while the REF might do a good job of identifying the quality of research in particular subjects, we believe that it falls short when it comes to comparing the quality of different fields.
Comparability between the grades awarded by the 36 “sub-panels” that oversee particular subject areas is supposed to be ensured by the four “main panels” that preside over them. However, our analysis of the results under Main Panel C, which covers the social sciences, suggests there are demonstrable inconsistencies.
A higher proportion of 3* and 4* grades was awarded in the REF than in the 2008 research assessment exercise. The Higher Education Funding Council for England justifies this by arguing that international citations have increased during the same period, adding that comparison of the average proportion of 3* and 4* scores given by each subpanel demonstrates that they acted consistently.
However, the real discrepancy is in the allocation of 4* scores. This is the category that shows the biggest proportionate increase across the 2008 and 2014 exercises, rising by 42 per cent for outputs (impact case study scores, of course, are new for 2014). It is also the category with the greatest financial return under the present funding formula and is, therefore, a crucial part of Hefce’s overall responsibilities.
Looking at changes in the proportion of output rated 4* in the 2008 RAE and 2014 REF, the different subpanels of Main Panel C show very different patterns. The social work and social policy subpanel rated a significantly higher proportion of outputs 4* than in 2008, while politics and international studies saw a moderate increase and sociology saw very little.
Then there is the distribution of 4* grades. While sociology had a slightly higher overall GPA than social work and social policy in all areas of assessment in 2014, its highest rated departments had lower GPAs than the highest rated departments in social work and social policy. This meant that even where departments ranked highly within the sociology rankings, they often did so with lower GPAs than other high ranking social science subjects within their local institution. This potentially puts even the highest rated departments at risk of closure, particularly if the funding formula continues to favour 4* activities.
The anomaly is made all the more stark when you consider that sociology had a higher proportion of submissions from research-intensive institutions and, therefore, had an expectation of higher scores (an expectation realised for other subjects at those institutions). Within sociology, 13 of the 29 submissions were from the Russell Group of research-intensive universities, covering 52 per cent of the staff submitted. A further 7 per cent were at the University of Essex, the top recipient of Economic and Social Research Council funding in 2013-14. Meanwhile, in social work and social policy, 15 of the 62 submissions were from the Russell Group, covering 36 per cent of staff. (The Panel C average was 48 per cent of staff, ranging from 67 per cent in economics to 15 per cent in sport and exercise sciences, leisure and tourism.)
It is also worth bearing in mind that the number of submissions to the sociology subpanel reduced from 38 in 2008 to 29 in 2014, largely accounted for by the withdrawal of those departments that had previously appeared in the bottom part of 2008 ranking.
Consideration of Gini coefficients, which measure statistical dispersion, reveals that sociology (along with anthropology and development studies) has the most concentrated – or egalitarian – distribution of 4* grades (weighted by staff numbers) under Main Panel C. Economics and econometrics and social work and social policy are the least egalitarian. We might expect high concentration when the departments are largely from similar types of institution, but, as with inequality more generally, egalitarian outcomes are partly produced by lowering the top as well as raising the bottom.
It was the explicit function of the main panels to secure comparability of judgements and it is worrying that Main Panel C did not notice these evident disparities. In the current debate over metrics, we conclude that peer review is robust only if proper procedures are used to monitor and correct for its possible distortions.