A slough of research has come out in the last few years (and there’s more forthcoming from my collaborators and me) showing that these end-of-semester ratings that students give teachers, usually on a scale from 1-5 or so, are significantly biased against female professors. The obvious question is: if not student evaluations of teaching (SET), how should we evaluate instructors? I recently saw this article on Twitter. It argues that “female faculty should receive an automatic correction” on their SET scores, meaning that the administration would add a fixed number to every female instructor’s score in order to make it comparable to male instructors’ scores. This adjusted score would be used to decide whether the instructor should be rehired to teach, be given tenure, etc.
I don’t believe this can be done, for a number of reasons. There are other biases and confounding variables besides gender that make it impossible to find a single number to add to every female instructor’s score.
- Biases are not consistent across fields. For example, at Sciences Po in Paris, there is a greater proportion of female instructors in sociology than in economics, and the observed gender bias is less in sociology than in economics. Any correction to SET would have to vary by course matter.
- Biases depend on student gender as well. Our research shows that in some schools, male students rate their male instructors significantly higher than their female instructors while female students tend to rate them the same. This is a problem for adjusting scores because the gender balance in the class will affect the instructor’s score. For instance, imagine a hypothetical male instructor who teaches two identical classes. On average, his male students give him a rating of 4.5 and females give him a rating of 4. In the first class, the gender balance is 50/50, so the average rating is 4.25. In the second class, there are 80 males and 20 females, so the average rating will be 4.4. There’s no one magic number to add or subtract from this average to cancel out the gender bias when comparing this score to the SET of other instructors.
- There is some evidence that SET are biased by the instructor’s race and age as well. We lack data on this, but similar work on bias in hiring decisions has showed that people (men and women alike) comparing identical resumes will tend to prefer job applicants with male, European-sounding names. Anecdotally, instructors who have accents or are above average age (even as young as mid-thirties in some places!) fare worse on their SET.
The list could go on — I’m sure there are a ton of other confounding variables, like time of day of the class, difficulty of the course material, etc. which affect how students tend to rate their instructors. In order to find a correcting factor for each female instructor, you’d have to look at all of these variables and average them out. In fact, you ought to do that for male instructors too, since gender isn’t the only bias. This just highlights the fact that SET aren’t measuring teaching effectiveness in the first place; they’re a better measure of how comfortable or satisfied a student is in the class.
Admittedly, the title of this post sounds combative. But it’s not — of course something needs to be done about the pervasive gender bias that’s causing female faculty to lose teaching positions and costing them job promotions. I’m merely arguing that it is impossible to effectively “correct” for gender bias, and so alternative, more objective means for evaluating teaching effectiveness should be used instead of SET.