Clarity of purpose in the TEF and the REF

Ministers must demonstrate that assessing ‘excellence’ does more good than harm, says Dorothy Bishop

Published on

March 3, 2016

Last updated

July 13, 2016

The UK’s research excellence framework has come in for a lot of criticism. It is now under review by a panel chaired by Nicholas Stern, with a call for evidence that closes later this month. At the same time, we have a Green Paper setting out plans for a teaching excellence framework. This is motivated in part by the view that the attention given to research and teaching has got out of balance. The REF has provided universities with strong incentives to put resources into research, and teaching has consequently been neglected, goes the argument (although see here). So what do we need to even things up? A TEF.

The problem for both REF and TEF is that, at the end of the day, they aim for a single scale on which universities can be rank ordered so that we can compare quality. But everyone agrees that the things we are measuring – research and teaching excellence – are complex and multifactorial.

There are basically two ways forward. Option A is to use some kind of proxy measure, recognising its limitations but taking the view that it is good enough for purpose. Option B involves trying to measure the complex multifactorial construct in all its richness.

There are a number of factors that influence choice of approach. Because everyone recognises that things are complex, option A is unlikely to be acceptable to the academic community. Simple measures are often easy to game. On the other hand, the complex multifactorial measures of option B can be debated endlessly, often involve elements of subjective judgement, are not immune to gaming, can be extremely expensive to administer, and can be hard to integrate into a single ranking.

James Wilsdon has noted that with regard to the REF, before deciding which system of measurement to use, we have to have a clear idea of what we are trying to achieve. As far as the REF goes, its purpose has changed and mutated over the years. It started out with a pretty simple goal: to find a formula to determine allocation of quality-related funding from central government to universities. However, as Wilsdon notes, it has subsequently been used for four additional purposes: to demonstrate accountability, to provide a measure of reputation, to influence research culture, and as a tool within universities for managing academics. He notes that: “If all we want from the REF is a QR allocation tool, then we can certainly do that in an algorithmic, metric-based way” (ie, option A). But he argues that the REF needs to fulfil the other functions too, and, as was amply demonstrated in his report The Metric Tide, for those other purposes, a simple metrics-based system is inadequate.

I agree with much of what Wilsdon says, but I think that we could save ourselves a lot of trouble by reverting to the original purpose of the REF, ie, treat it purely as a mechanism for allocating funding. As I have argued previously, if that is all you want to do, then you don’t even need to bother with metrics of the kind discussed in his report. A simple measure of the number of active researchers present in a department gives a remarkably high correlation with the amount of QR funding received, and this works well for most subjects in arts and humanities as well as sciences.

But what about gaming? When I proposed this idea a couple of years ago, people said, wouldn’t universities just designate the departmental cleaner as an active researcher, or take on more research staff? I don’t see these problems as insuperable. It would be important to specify stringent criteria for research staff to meet: these would include terms of employment (casual staff would be excluded), as well as evidence of research activity. If one counted only those staff who had been employed at the institution for some minimum period, such as three to four years, this should prevent institutions catapulting in overseas researchers on Mickey Mouse contracts, or taking on short-term staff to give a temporary blip in researcher numbers.

A more serious objection to my proposal is that there is no explicit measure of research quality – an institution could take on a large number of weak researchers and look as good as a competitor with an equal number of excellent researchers. But would this happen? Remember, researchers would need to be on the institutional payroll for a period of three to four years prior to the evaluation, so the institution would need to commit to the expense of employing them. This would not be worthwhile if staff then failed to meet the criteria set for research-active staff. Academics who did not count as active researchers would end up being a net cost to the institution.

I’m not saying that it would be easy to fine-tune such a system to avoid gaming or unintended consequences, just that it could be done, and I suspect would be much less difficult than devising an entirely separate system for evaluating research quality.

My case falls apart if, like Wilsdon (and many other people who have been involved in the REF) you think that the REF should fulfil additional purposes. Then, because no one measure is suitable for all purposes, you need something much more complicated. But I do agree with Wilsdon that, if that’s what you want, you need to be clear about it – and about the need for a diverse set of measures appropriate to different goals.

What about the TEF? Well, when you dig beneath the surface, you find that the parallels between the REF and the TEF are purely superficial. The purpose of the TEF is not to allocate funding – there is no funding to allocate. The stated purposes are as complex and multifactorial as the notion of teaching excellence itself: to help students select courses, to increase access of under-represented groups to higher education, to provide a basis for allowing universities to raise fees, and to provide criteria for “new entrants“ (ie, private institutions) that wish to enter the higher education market. According to a recent Business, Innovation and Skills Committee report, it’s also intended to provide incentives “to ensure that higher education institutions meet student expectations and improve on their leading international position”. Quite what it means to improve on a leading international position is not specified.

In attempting to develop a measure that will cover all these functions, those promoting the TEF have tied themselves in knots, as illustrated by this wonderfully circular statement from the same select committee report: “In the absence of any agreed definition or recognised measures of teaching quality, the government is proposing to use measures, or metrics, as proxies for teaching quality. Therefore the challenge is to identify those metrics which most reliably and accurately measure teaching quality, as opposed to other factors that contribute to the results achieved by students.”

This is worrying. The only positive thing one can say is that there are signs that government may be starting to recognise some of the problems. The select committee report cautions the need not to rush into a TEF, and notes reservations both about the measures proposed and the proposed link between the TEF and fee-raising powers. The report concludes by encouraging academics to work with BIS to develop appropriate metrics for the TEF – the impression is that government is aware if they get it wrong then universities may just decide not to play ball. One of the members of the select committee, Amanda Milling, wrote in Times Higher Education that “the higher education sector has a responsibility to engage with TEF to make it work”.

But do we? I would argue that the responsibility lies with the minister, to make a proper case for the TEF.

As the select committee report points out: “It is important to note the high quality of teaching generally available in our higher education system at present…The debate around teaching excellence should therefore be viewed within the context of enhancing an already excellent system or, as the minister for universities and science put it, ‘to continue to make a great sector greater still’”. These weasel words mean that if universities resist the TEF, they can be accused of complacency. But where’s the evidence that the TEF will “make a great sector greater still”? A considerable amount of time and money will be sucked up by this exercise, which has multiple confused aims and has the potential to tie up a great sector in pointless bureaucracy and waffle. The whole idea is seriously misconceived and has been rushed through without adequate justification or cost-benefit analysis.

We are now being told that the TEF will be introduced by degrees, with measures being developed over time, but I am not reassured. If the government wants academics on side, it needs to demonstrate more coherent arguments, with clear specification of the goals of the TEF, and evidence of validity of the measures it proposes to achieve those goals. And most of all, it needs to show us that more good than harm will result from this exercise.

Dorothy Bishop is professor of developmental neuropsychology at the University of Oxford. This post originally appeared on her blog.