Student evaluations of teaching: no measure for the TEF

The National Student Survey, one of the pillars of the TEF, is more likely to measure enjoyment than learning, says Wolfgang Stroebe

June 9, 2016
Elly Walton illustration (9 June 2016)
Source: Elly Walton

The UK government’s recent confirmation that the National Student Survey will be part of the teaching excellence framework has made even more urgent the question of whether satisfaction surveys are a reliable measure of teaching quality. My recent scrutiny of the evidence from the US suggests that they are not.

Customer satisfaction surveys are, of course, commonplace in the commercial world. But surprisingly enough, higher education was probably the first sector to adopt them. Student evaluation of teaching (SET) was developed in the 1920s by two US psychologists, Herman Remmers and Edwin Guthrie, and used at their respective institutions, Purdue University and the University of Washington.

Originally, the evaluations were intended only to help instructors improve their teaching. But they were soon adopted by department chairs and faculty deans to help them make important personnel decisions, such as hiring, salary increases and promotions. By now, these ratings have become standard procedure in colleges and universities in the US, as well as in Europe, and are seen as the single most important indicator of teacher effectiveness.

There is one important difference between customer evaluations of commercial and educational service providers. Whereas with commercial providers ratings are unilateral, ratings are mutual in the education system. As well as students evaluating their teachers, instructors evaluate their students – such as by their exam performance. In US studies, these ratings have been found to be positively correlated: students who receive better grades also give more positive evaluations of their instructors. Furthermore, courses whose students earn higher grade point averages also receive more positive average ratings.

Proponents of SETs interpret these correlations as an indication of the validity of these evaluations as a measure of teacher effectiveness: students, they argue, learn more in courses that are taught well – therefore, they receive better grades. But critics argue that SETs assess students’ enjoyment of a course, which does not necessarily reflect the quality of teaching or their acquisition of knowledge. Many students would like to get good grades without having to invest too much time (because that would conflict with their social life or their ability to hold down part-time jobs). Therefore, instructors who require their students to attend classes and do a lot of demanding coursework are at risk of receiving poor ratings. And since poor teaching ratings could have damaging effects at their next salary review, instructors might decide to lower their course requirements and grade leniently. Thus, paradoxically, they become less effective teachers in order to achieve better teaching ratings.

There is empirical evidence for the assumption that giving good grades for little work is an effective strategy for achieving good teaching ratings. Studies that added a question about teaching leniency to the SET found that instructors who were perceived as lenient graders received more positive ratings. Further evidence comes from an analysis of student ratings of nearly 7,000 professors from 369 institutions in the US and Canada on the RateMyProfessors.com website. One of the rating dimensions is “easiness”, defined as the possibility of getting an “A” without much work. The quality of a course is reflected by the combined ratings of a teacher’s helpfulness and clarity of delivery. Researchers found that ratings of the quality of a course highly correlated with the “easiness” ratings. Again, this suggests that teaching ratings reflect course enjoyment rather than course learning.

However, the most disturbing evidence comes from studies that assess students’ learning using the grades they achieved in a subsequent course that built on the knowledge acquired in the previous one. For example, one would expect students who worked hard and learned a great deal in “introductory statistics” to do better in “advanced statistics” than students who worked less hard. Therefore, if SETs measure teaching effectiveness, students of a highly rated introductory course should receive better grades in an advanced course than students of a poorly rated introductory course. Based on this logic, several studies have analysed the association between student ratings of introductory courses with the grades they receive in subsequent courses. The surprising finding is that students of highly rated introductory courses actually do less well in subsequent courses than students from lower-rated courses.

These findings are difficult to reconcile with the assumption that SETs measure teaching effectiveness.

As the authors of one of these studies – conducted at a European university – conclude: “A more appropriate interpretation is based on the view that good teachers are those who require their students to exert effort.” The problem, according to the paper “Evaluating students’ evaluations of professors”, published in 2014 in Economics of Education Review, is that “students dislike it, especially the less able ones”. As a result, these teachers receive poorer evaluations.

Because student ratings appear to reflect their enjoyment of a course and because teacher strategies that result in knowledge acquisition (such as requiring demanding homework and regular course attendance) decrease students’ course enjoyment, SETs are at best a biased measure of teacher effectiveness. Adopting them as one of the central planks of an exercise purporting to assess teaching excellence and dictating universities’ ability to raise tuition fees seems misguided at best.

Wolfgang Stroebe is a visiting professor in the Faculty of Behavioural and Social Sciences at the University of Groningen, the Netherlands. His paper “Why good teaching evaluations may reward bad teaching” will be published in the journal Perspectives on Psychological Science.

POSTSCRIPT:

Print headline: An easy mistake

Register to continue

Why register?

  • Registration is free and only takes a moment
  • Once registered, you can read 3 articles a month
  • Sign up for our newsletter
Register
Please Login or Register to read this article.

Related articles

Reader's comments (7)

I'm not at all clear how a student responding to the NSS that they have had detailed feedback on their work is doing anything other than answering a question which asks them to consider evidence independent of their feelings. Same for feedback helping clarify things student didn't understand. Same for whether staff are enthusiastic, or whether the student has been able to contact staff when they needed to. These questions aren't a simple popularity contest - as can be seen by the very wide variation within subjects in some areas, especially for the assessment questions which can be rated very badly even when students have a very high level of 'overall' satisfaction.
I agree with Polycarpus. I would add two things. Firstly, there is research from the US that suggests that in sophisticated teaching assessment surveys, (not rate my professor) students are more fair minded that often given credit for, they don't usually punish "dull" lecturers if they felt they learnt from them, and they usually will those who entertain but don't teach them anything (I accept it is better to be both!). Secondly, it is claimed that as there is a correlation between giving good grades and getting good teaching satisfaction scores that students are being bribed into rating highly. Has it occurred to the author that cause and effect may be the other way round, i.e students who enjoy their lectures and learn from them will earn better grades?
Another factor that I think we need to look at is enrollment figures over time. In addition to students giving better or worse evaluations, what's also important is whether or not they simply walk away. My own experience has been that the more one works to improve the learning outcomes and the students' acquisition of knowledge, the lower the enrollments from one year to the next. Of course there may be other factors at work here, so it's hard to tell exactly what's going on, but I do get the feeling that a large proportion of students want maximum return for minimum effort, and these students either shop around for the courses that give them that, or just go through a process of elimination, withdrawing from the courses where they're expected to read, listen, talk, write and think. The corollary of 'teaching excellence' is 'learning excellence', ie, thought and work, and there are quite a lot of students today not that interested in that - for all sorts of reasons, including what they describe as anxiety, stress, depression, and so on. Having said all this, I also think that the implication of many of these studies might simply be that we need to become better at generating a persuasive narrative for students about what they've learned. In other words, we probably need, more than we did in the past, to make very explicit what the point of each assessment exercise has been, what they've learned, what skills they've acquired, etc., because while it may seem obvious to us, it isn't to them. I always think of the painting scene in The Karate Kid.
It would have been helpful if in-line citation was provided. I would love to check the references of your arguments.
The author *does* actually mention the view that more effective teaching leads to higher grades and higher student evals. The author does not mention another recent study showing that evals are also gender-biased. This cleverly-designed study was able to determine gender bias by having students evaluate online instructors, one male and one female. See Inside Higher Ed "Bias Against Female Instructors". This may vary by discipline, but if you are in a very male-dominated field, as I am, the study results sound depressingly familiar. I think another factor to consider in connection with this issue is the trend toward adjunctification in Higher Ed. Part-time workers often have little or no job security and are sometimes under enormous pressure to get stellar evals. At the same time, their heavy workload (teaching multiple intro-level courses, sometimes on several campuses) makes it difficult to give the kinds of assignments (with feedback) that cultivate real learning. Finally, I would like to hear from others on a question that nags at me in regards to student evals. I do not mean this to be inflammatory, but is there something just absurd about asking an 18, 19, or 20 year old to evaluate an expert in the field? I suppose there are some very precocious and gifted students who might be able to grasp the value of a course, but in my experience, most young people at this age are still very much learning how much they don't know, what the value of hard work is, and how a class may subtly shift one's skills and outlook in a way that is not easily quantifiable.
Very keen observation on biased evaluation system dominating the present survey system.Proper analysis of important parameter catering to needs of every student need to be done before conducting a survey which would be more fair .
I'm currently doing a survey on value addition from Kenya's university education (which in turn could affect graduate employability) and one of my variables is students ratings on the quality of teaching being offered by their faculty. One of the challenges i anticipate to experience is the issue of grade inflation. I would like to suggest a way of controlling biased responses from the students. I think it would help to rate only the about to graduate seniors/students in their final year. This is because, at that stage of their studies, they are concerned about obtaining more than just their generic skills and wouldn't be swayed to give high ratings to lecturers who make them laugh in class compared to the fresh students. I believe they would be logical in their assessment.

Sponsored