The results of year two of the UK’s teaching excellence framework have been published this week. Although many fewer universities entered this time around, the results will no doubt be pored over by newspapers, students and parents alike. But should they be?
Back in 2016, the Royal Statistical Society responded to the Department for Education’s consultation exercise on year two of the TEF by expressing some serious concerns. Like several others, our response focused on the exercise’s many statistical and scientific shortcomings, as well as on our general unease about the proposed methodology.
The Office for Statistics Regulation, part of the UK Statistics Authority, understood our position and subsequently wrote to the department, asking it to ensure that our concerns were “addressed and published”. Quite simply, this never happened in any meaningful way. Worse still, the problems now risk being repeated.
In March, the DfE launched a new consultation on the subject-level TEF, and we find, worryingly, that many of our previous reservations remain valid. Few, if any, of the substantive concerns have been properly addressed, and the latest proposals give us little confidence in the future trustworthiness, quality or value of the TEF as a whole.
As an organisation championing the good use of statistics, the RSS has to begin by questioning whether a consultation is really the most appropriate way of addressing a range of important statistical and scientific questions.
We estimate that, in 2016, approximately three-quarters of the consultation’s responses came from education providers or students’ unions. It is only right that these constituencies are consulted because they will undoubtedly raise important points. However, it seems wrong that key statistical design issues could be decided by unscientific opinion polls. It is a bit like asking learner drivers to influence the contents of the driving test, or how examiners should assess them.
In this year’s consultation, which has now closed, “opinion poll” questions abound. For example, it asks whether the government should adopt one of two possible designs for the TEF. Model A is “a ‘by exception’ model, giving a provider-level rating and giving subjects the same rating as the provider where metrics performance is similar, with fuller assessment (and potentially different ratings) where metrics performance differs”. Model B is “a bottom-up model, fully assessing each subject to give subject-level ratings, feeding into the provider-level assessment and rating”.
The RSS believes that an exercise aimed at assessing teaching excellence should probably attempt to assess some teaching – in the same way, maybe, that the research excellence framework assesses actual research. But our opinion is that both these models are flawed in many ways. These would need several pages to fully explain, but, in short, we believe that Model A would inadvertently introduce biases into the assessment. We are also concerned about its curious proposed system to feed subject rankings back into the provider ranking, which strikes us as unnecessary and statistically dangerous.
Model B partly assesses subjects in groups. One problem is that the number of disciplines per group varies widely: for example, arts consists of just one subject (creative arts and design) whereas humanities contains no fewer than eight. Inevitably, the style and content of the subject submissions would differ purely because of the varying numbers in each group.
The proposals are even more problematic when you consider that different institutions often have different subject mixtures and varying subject “homes” within each institution's faculty structure. There is also the perennially thorny question of how to cope with joint and multi-subject programmes.
Furthermore, the consultation recognises that there are not only substantial differences in scale for the various subject metrics but also wildly different metric clustering, which is nigh impossible to reconcile into a set of simple, meaningful statistics. The consultation’s conclusion is that such differences are because of real differences in teaching and outcomes. However, we see no evidence to support this conclusion.
On measuring teaching intensity, the consultation is open-minded and asks one of its “what do you think?” questions. It also states that “any teaching measure should encompass what is most relevant to students”. This implies that simplistic measures, such as combinations of student numbers and contact hours, do not really capture the essence of whether a teaching activity is successful. More complex measures are proposed, but these would require new forms of data collection and, ironically, the government is apparently not keen on this where it would impinge on an institution’s autonomy or ethos.
The RSS is also mindful of the Goodhart Principle (“when a measure becomes a target, it ceases to be a good measure”) and the strong likelihood of institutional gaming, which can be played out most effectively within systems such as the TEF.
Overall, we believe that the consultation’s statistically inadequate approach would lead to distorted results, misleading rankings and a system that lacks validity and integrity.
Even worse, significant resources are clearly being devoted to an already discredited system. Now is the time for the government to pause and review whether a TEF system is really the right, or cost-effective, way to assess excellence in higher education teaching.
Guy Nason is vice-president for academic affairs at the Royal Statistical Society.