NSS and teaching excellence: the wrong measure, wrongly analysed

How the government is proposing to use NSS data to assess teaching excellence is nonsense, says Dorothy Bishop

January 4, 2016
Employee evaluation

Like most of my academic colleagues, I have for years been pretty uninterested in university politics.

We moan at bureaucratic exercises, such as the research excellence framework (REF), and at endless monitoring of our performance, but we mostly go along with it, sensing that there is an inevitability about how things are structured. Indeed, for some people, there is fear that protest could be dangerous, because there is ample precedent in the sector for redundancies to be announced with little warning.

Even though I disagree with the ideology of the current government, I had assumed that those formulating their policies were basically competent and intelligent. In the past few weeks, however, such illusions have been dispelled.

I have been looking at the higher education Green Paper Fulfilling our Potential: Teaching Excellence, Social Mobility and Student Choice. This consultation document, which outlines the government’s plans for higher education in England, presents proposals for radical change that are so incoherent and poorly justified that they insult the intelligence of the academic community.

The Green Paper is a long document, and there is so much wrong with it that it is impossible to give a full critique in a single blogpost. I will, therefore, just briefly illustrate one of the problems inherent in the paper: evaluation of teaching excellence.

In a previous post, I noted how a dubious analysis of data from the National Student Survey (NSS) appeared to be the sole basis for the claim in the Green Paper that “there are many examples of excellent teaching within the higher education system but…teaching quality is variable”.

The first problem, as many others have noted, is that NSS data cannot be regarded as indicative of teaching quality. Indeed, there is empirical evidence that student satisfaction is higher when students are given straightforward assignments and awarded high grades. Relying on NSS to measure teaching quality is like using people’s choice of food to evaluate nutritional content.

Read more: Lukewarm support for TEF from university staff, survey shows

Further, if you are going to use NSS data to identify inadequate student satisfaction, then you need to define inadequate. I have been given further information by the Department for Business, Innovation and Skills since I wrote my last blogpost, and I can confirm that the metric that was used was benchmarked in a way that – in my view – makes it inevitable that a proportion of universities will score badly.

The methodological details are fairly technical, but essentially boil down to using the complete set of NSS data to predict percentages of satisfied students from background variables such as  gender, entry qualifications and so on. The obtained score is then compared with that predicted from a regression equation, and those falling furthest below prediction on any one survey item are deemed inadequate. This method is problematic for no less than four reasons:

First, it is inevitable that there will be a spread of scores around the average; if the whole distribution of scores shifted up or down while retaining the same spread, you’d get the same result, because the definition of low satisfaction is a relative one: distance from the average. We could draw an analogy with measuring body mass in elite athletes – if you used the BIS method to define obesity, you’d always end up with a proportion of athletes deemed obese, no matter how lean they were in absolute terms.

Second, the precision of the estimate of student satisfaction will depend on the number of student respondents at that institution. This is recognised in the BIS method, which uses the standard error (a measure sensitive to sample size) to measure departures from prediction. However, this creates the anomaly that it is much harder to achieve an adequate satisfaction rating if you are a large institution than if you are a small one.

In fact, the BIS method recognises that for very large institutions, even a minuscule difference between predicted and obtained percentage may be significant on this criterion. Accordingly, an additional, entirely arbitrary, second criterion is added, whereby a difference from predicted level must be at least three percentage points for the score to be deemed inadequate. Even so, if we split universities into small (less than 1,200 students), medium (1,200 to 2,599 students) and large (2,600 or more), we find that on overall satisfaction 1/41 (2 per cent) of small institutions is inadequate, compared with 5/41 (12 per cent) of medium-sized institutions and 9/42 (21 per cent) of large institutions.

Third, benchmarking according to student background variables creates problems of its own, because it leads to the situation in which lower levels of satisfaction are acceptable in an institution with a more disadvantaged student intake. Benchmarking makes sense for some measures, but when applied to student satisfaction it simply reinforces the idea that those from poorer backgrounds should expect lower satisfaction with their courses.

Fourth, the most egregious abuse of statistics in the BIS approach is to treat as inadequate any institution that falls below the predicted level on any one of the 22 items in the survey. It is a statistical inevitability that the more items you take into account, the higher the probability that any one institution will fail to meet a predicted target.

In sum, by applying arbitrary, relative cut-offs to a large number of items, you can get any result you choose. Furthermore, use of benchmarking coupled with a measure that depends on sample size explains the anomaly whereby, on overall satisfaction, the universities of Bristol and Edinburgh, with satisfaction ratings of 84 per cent, are deemed inadequate, whereas 106 other institutions with ratings lower than this – including 21 with ratings below 75 per cent –  are rated satisfactory.

Even if the NSS were a valid method for assessing teaching quality, this would be a nonsensical approach to identifying inadequacies in the system.

Dorothy Bishop is professor of developmental neuropsychology in the department of experimental psychology, University of Oxford.

Register to continue

Why register?

  • Registration is free and only takes a moment
  • Once registered, you can read 3 articles a month
  • Sign up for our newsletter
Please Login or Register to read this article.

Related articles