The UK government’s recent confirmation that the National Student Survey will be part of the teaching excellence framework has made even more urgent the question of whether satisfaction surveys are a reliable measure of teaching quality. My recent scrutiny of the evidence from the US suggests that they are not.
Customer satisfaction surveys are, of course, commonplace in the commercial world. But surprisingly enough, higher education was probably the first sector to adopt them. Student evaluation of teaching (SET) was developed in the 1920s by two US psychologists, Herman Remmers and Edwin Guthrie, and used at their respective institutions, Purdue University and the University of Washington.
Originally, the evaluations were intended only to help instructors improve their teaching. But they were soon adopted by department chairs and faculty deans to help them make important personnel decisions, such as hiring, salary increases and promotions. By now, these ratings have become standard procedure in colleges and universities in the US, as well as in Europe, and are seen as the single most important indicator of teacher effectiveness.
There is one important difference between customer evaluations of commercial and educational service providers. Whereas with commercial providers ratings are unilateral, ratings are mutual in the education system. As well as students evaluating their teachers, instructors evaluate their students – such as by their exam performance. In US studies, these ratings have been found to be positively correlated: students who receive better grades also give more positive evaluations of their instructors. Furthermore, courses whose students earn higher grade point averages also receive more positive average ratings.
Proponents of SETs interpret these correlations as an indication of the validity of these evaluations as a measure of teacher effectiveness: students, they argue, learn more in courses that are taught well – therefore, they receive better grades. But critics argue that SETs assess students’ enjoyment of a course, which does not necessarily reflect the quality of teaching or their acquisition of knowledge. Many students would like to get good grades without having to invest too much time (because that would conflict with their social life or their ability to hold down part-time jobs). Therefore, instructors who require their students to attend classes and do a lot of demanding coursework are at risk of receiving poor ratings. And since poor teaching ratings could have damaging effects at their next salary review, instructors might decide to lower their course requirements and grade leniently. Thus, paradoxically, they become less effective teachers in order to achieve better teaching ratings.
There is empirical evidence for the assumption that giving good grades for little work is an effective strategy for achieving good teaching ratings. Studies that added a question about teaching leniency to the SET found that instructors who were perceived as lenient graders received more positive ratings. Further evidence comes from an analysis of student ratings of nearly 7,000 professors from 369 institutions in the US and Canada on the RateMyProfessors.com website. One of the rating dimensions is “easiness”, defined as the possibility of getting an “A” without much work. The quality of a course is reflected by the combined ratings of a teacher’s helpfulness and clarity of delivery. Researchers found that ratings of the quality of a course highly correlated with the “easiness” ratings. Again, this suggests that teaching ratings reflect course enjoyment rather than course learning.
However, the most disturbing evidence comes from studies that assess students’ learning using the grades they achieved in a subsequent course that built on the knowledge acquired in the previous one. For example, one would expect students who worked hard and learned a great deal in “introductory statistics” to do better in “advanced statistics” than students who worked less hard. Therefore, if SETs measure teaching effectiveness, students of a highly rated introductory course should receive better grades in an advanced course than students of a poorly rated introductory course. Based on this logic, several studies have analysed the association between student ratings of introductory courses with the grades they receive in subsequent courses. The surprising finding is that students of highly rated introductory courses actually do less well in subsequent courses than students from lower-rated courses.
These findings are difficult to reconcile with the assumption that SETs measure teaching effectiveness.
As the authors of one of these studies – conducted at a European university – conclude: “A more appropriate interpretation is based on the view that good teachers are those who require their students to exert effort.” The problem, according to the paper “Evaluating students’ evaluations of professors”, published in 2014 in Economics of Education Review, is that “students dislike it, especially the less able ones”. As a result, these teachers receive poorer evaluations.
Because student ratings appear to reflect their enjoyment of a course and because teacher strategies that result in knowledge acquisition (such as requiring demanding homework and regular course attendance) decrease students’ course enjoyment, SETs are at best a biased measure of teacher effectiveness. Adopting them as one of the central planks of an exercise purporting to assess teaching excellence and dictating universities’ ability to raise tuition fees seems misguided at best.
Wolfgang Stroebe is a visiting professor in the Faculty of Behavioural and Social Sciences at the University of Groningen, the Netherlands. His paper “Why good teaching evaluations may reward bad teaching” will be published in the journal Perspectives on Psychological Science.