‘Tide turning’ against using student evaluations to rate staff

Experts predict ripple effect from ruling against Canadian university

July 26, 2018
Source: iStock

Experts have predicted a global “sea change” away from the use of student evaluations to measure lecturers’ suitability for promotion or tenure, after a union successfully challenged the practice.

An arbitration case between Ryerson University in Toronto and its faculty association that had stretched on for 15 years finally concluded with a ruling that course surveys can no longer be “used to measure teaching effectiveness for promotion or tenure”.

Arbitrator William Kaplan said that “insofar as assessing teaching effectiveness is concerned – especially in the context of tenure and promotion – SETs [student evaluations of teaching] are imperfect at best and downright biased and unreliable at worst”.

The Ontario Confederation of University Faculty Associations argued that the outcome could pave the way for other Canadian universities to negotiate similar changes. The case had “established an important precedent for faculty associations and lends support to others who have been arguing that student questionnaires are deeply problematic instruments for the purpose of evaluating faculty members’ teaching effectiveness”, the organisation said.

But Philip Stark, associate dean of the Division of Mathematical and Physical Sciences at the University of California, Berkeley, who was an expert witness in the Ryerson case, said that the impact could be much broader.

The result, alongside recent voluntary changes to the use of teaching evaluations at US universities, including the University of Southern California, shows that the “tide is starting to turn”, he said.

“Seeing this happen at the level of a union dispute in Ontario is terrific. I think this is the start of something much bigger. I really hope that there is going to be a sea change in how people administer and rely on student evaluations and the items that they use,” Professor Stark said.

Professor Stark added that he hoped that other unions representing academics in Canada, the US and elsewhere would “negotiate to reduce or eliminate reliance on student evaluations” and that universities of their own accord would “move towards more sensible means of evaluating teaching”.

“I think that the time is right for class-action lawsuits on behalf of women and under-represented minorities against universities that continue to rely on student evaluations as primary input for employment decisions [and that this] will induce universities to do the right thing,” he said.

Professor Stark said that studies have shown that “there is little if any connection between students’ ratings of instructor effectiveness and actual instructor effectiveness, as measured by student performance on anonymously created final exams”, and evidence suggests that any association is “generally negative not positive”.

There is also evidence that students’ ratings are strongly influenced by academics’ personal characteristics, such as gender, race, age and physical attractiveness, he said.

Bob Uttl, professor of psychology at Mount Royal University in Alberta, who has published research showing that student evaluations of teaching and student learning are unrelated, said that the ruling provides a “strong reminder” that “one ought not to make high-stakes personnel decisions based on unreliable and/or invalid assessments”.

He added that the ruling is “likely to have substantial impact and encourage faculty associations to pursue discontinuation of SETs as [a] measure of teaching effectiveness in high-stakes personnel decisions including promotion, tenure, merit pay and teaching awards decisions”.


Register to continue

Why register?

  • Registration is free and only takes a moment
  • Once registered, you can read 3 articles a month
  • Sign up for our newsletter
Please Login or Register to read this article.

Related articles

Reader's comments (3)

The criticisms made here were obvious from the beginning of this practice. It was just a trendy fashion (along with SWOT analysis). Timothy Treffry, retired lecturer University of Sheffield
The research on SETs is inconsistent, largely because the instruments vary so widely. Their many potential limitations are outlined in the article, though interestingly one of the strongest correlations is not mentioned: giving higher grades almost always boosts your SETs (though the best students sometimes rebel against easy As for all). Nonetheless the article gives no hint as to how teaching effectiveness should be evaluated instead. There are, of course, some academics who think teaching quality shouldn't matter - students' learning is their problem, not ours' - or is too controversial or evanescent to be evaluated. Those of us who think otherwise will wonder how else the judgements of those most affected by teaching might be sought. In systems with tenure, we have to worry that, with the cloak of invisibility afforded by lack of SETs, useless but unsackable teachers will proliferate. 'Peer review of teaching', which I've experienced in the UK and Canada, can sometimes provide constructive feedback and advice, but is useless as an evaluative process: academics are usually not qualified or trained in either teaching or the assessment of teaching, and at my university academics seeking tenure or promotion simply ask their friends to review them. The alternative would be some equivalent to Britain's OFSTED, with its trained assessors, but you can imagine how popular that would be; academics subjected to something like OFSTED would soon be clamouring for the reintroduction of SETs. Students are not consumers. Nonetheless, they are service users who incur substantial debt, in most countries, and spend several years of their lives studying for their degrees. Along with their own responsibility for learning, they have a reasonable expectation that universities and scholars will take their opinions seriously. Moreover, they will continue to share their opinions unregulated on social media. In my experience, the value of students' opinions is proportionate to the seriousness of the instrument used. Thin, generic, silly SET questions get commensurate responses. Sustained discussion lacks anonymity, but yields much more interesting, valuable, considered insights. In other words, SETs are not wrong because they value students' opinions too much, but because they don't take them seriously enough. Student evaluation of teaching is too important to be left to trivializing questionnaires, but universities (who want a cheap instrument) and academics who are disdainful of their own students form a 'bishops and bootleggers' front against anything more effective.
At our university (of Groningen) we also try to ‘turn the tide’: https://www.ukrant.nl/op-ed-change-the-course-evaluations/?lang=en Nicolai Petkov