Workload and reduced fecundity — try not to work too hard, ladies

Here’s a vaguely misogynistic study for you.  This article, “Women who work or lift a lot may struggle to get pregnant,” discusses the findings from a recent paper in Occupational and Environmental Medicine.  The authors surveyed women trying to conceive in the Nurses’ Health Study 3, a large cohort of predominantly Caucasian nurses.  Their covariates of interest were how many hours per week women worked and how often they lifted more than 25 lbs in a day; the primary outcome was time to conception.  The authors concluded,

Working more than 40 hours a week was linked with taking 20 percent longer to get pregnant compared to women who worked 21 to 40 hours.

Moving or lifting at least 25-pound loads several times a day was also tied to delayed pregnancy, extending the time to conception by about 50 percent.

The unstated interpretation is that women’s bodies can’t handle working a full day or lifting any weight, so women at reproductive age should think twice about what they do for a living.

Here’s the original paper.

One potential issue is that the study is cross-sectional and the authors didn’t actually follow the women from the time of first interview until they got pregnant.  Instead, they used one survey to ask how long the women had been trying to conceive, then used a survival analysis method to estimate the time to pregnancy based on the self-reported times.  This method of analysis is biased: women who had no trouble conceiving are underrepresented in the sample and women who have taken a long time to get pregnant are overrepresented.  Furthermore, we don’t know the true outcomes for these women, only the ones estimated by a parametric model.

My biggest issue with this study is that they attach any meaning to their findings at all, saying that working more has a “detrimental impact on female nurses’ ability to get pregnant”.  They use duration of pregnancy attempt “as a surrogate for fecundity”.  Fecundity implies some biological ability to reproduce.  However, using time to conception as a proxy for fecundity relies on the assumption that everyone is trying equally hard to get pregnant.  If that were the case, then any variation in time to conception would be due to fecundity.  This isn’t something they checked or measured, and differences in women’s ideas of what “trying to get pregnant” means are probably what’s actually driving the trend the authors reported.

The Reuters article quoted someone sensible:

“If this effect is real, it is likely due to the fact that these women are having less frequent intercourse due to their work demands,” Lynch, who wasn’t involved in the study, said by email.

Nobody needed to do a study to figure that out.  Anyway, we could come up with all sorts of other plausible explanations for why women who work more are having less frequent intercourse.  If they redid this study on a cohort of women working in tech, I’m sure they’d find a similar relationship between number of hours worked and time to conception.  The point is, working more hours or picking up 25 lb boxes probably has no effect on anyone’s biological capacity to reproduce.  The authors are making a mountain out of a molehill.

I routinely lift 100 lbs over my head, so I guess I’ll really be screwed when I want to have a baby.

Things I’ve learned from my students

This summer I’m the TA for an undergraduate intro stats class for non-majors.  It’s been an enlightening process.  I’m learning a lot about teaching obviously, as it’s my first time actually running discussion sections and being fully in charge.  I’m glad to be doing an intro course because I’m really nailing down the basic things I need to know as a statistician, and I don’t need to worry about what I’m presenting so much as how I’m presenting it.  Interacting with the students is the most rewarding part of the experience, and observing them has been an exposition of the best and worst habits that people have.  A few things I’ve noted:

 Top students put in the time.  Even the naturally smart students who do well on tests don’t score as high as the ones who come to class consistently, every day. Those who put in the time get the most out of the class. The takeaway is that even when things seem easy, you can’t get cocky and stop paying attention; there is always more to learn.

How to hack the brainstem.  I love this idea.  I stole it from my advisor, Philip, who once said that the best way to convey statistical ideas to non-statisticians is to “hack the brainstem using metaphors.”  I have to teach things that I’ve internalized and taken for granted for so many years.  It’s really tough to boil things down to their essential parts when I see how topics connect and relate to each other, but the students just do not.  The metaphor I like relates chance variability in a random sample to measuring the concentration of a chemical solution: if your solution is well-mixed, it doesn’t matter if the drop you sample comes from a test tube or a gallon jug.  It should always have the same concentration.  Similarly, if you take a random sample of people, it doesn’t matter whether the population they came from is 1,000 or 1,000,000 people; your estimate of the average/percentage of some characteristic based on the sample will have the same accuracy.

Another version I like is hacking the brainstem using hyperboles.  To illustrate confusing concepts, take them to their limit: what happens if you flip a coin 1,000 times, then 10,000 times, then 100,000 times? What happens with non-response bias if all the unhealthy people in your sample die before the survey?

– Kindness makes a difference.  There are a handful of students who say hello, goodbye, and thank you to me every day.  I’m sure they do it without even thinking twice, but I definitely notice.  It’s nice to feel appreciated, and it’s a reminder to show others how much I appreciate their help and support regularly.

– My generation is spoiled by technology.  I look around when the students are working on exercises and I see them all on their phones.  Some have headphones on, some are clearly texting, while the rest are actually using tiny touch screen calculators to solve the problems.  Yes, it’s convenient, but wouldn’t they benefit more from actually talking to each other about the material instead of isolating themselves on their phones?  I’m guilty of this myself.  Sometimes, you just need to shut it off.

I’ve noticed that students overuse technology in other ways.  They think it’s okay to email pictures of their problem sets instead of turning in a hard copy, without even writing a message of explanation.  As if messy math scribbles on paper weren’t hard enough to grade, now the end product is one step removed and only visible on a screen, and it comes without acknowledgement that the student is bending the class rules.  It’s as though the availability of email makes people forget basic politeness.  I hope not to be like that; I try to be cordial, straightforward, and clear in my electronic communications.

There’s only so much I can do.  And that’s okay.  No matter how much time and effort I put into the class, some students are still going to fail.  It’s not a reflection of me as a TA if the student doesn’t hold up their end of the deal.  Are you emailing me asking for extra help?  You better come to office hours.  Are you asking me to return your graded homework?  Then you better show up to class and pick it up.  The failures of others aren’t a reflection of my work, and I shouldn’t take any of it personally.

Female instructors should not get bonus points to correct for gender bias

A slough of research has come out in the last few years (and there’s more forthcoming from my collaborators and me) showing that these end-of-semester ratings that students give teachers, usually on a scale from 1-5 or so, are significantly biased against female professors. The obvious question is: if not student evaluations of teaching (SET), how should we evaluate instructors? I recently saw this article on Twitter.  It argues that “female faculty should receive an automatic correction” on their SET scores, meaning that the administration would add a fixed number to every female instructor’s score in order to make it comparable to male instructors’ scores. This adjusted score would be used to decide whether the instructor should be rehired to teach, be given tenure, etc.

I don’t believe this can be done, for a number of reasons. There are other biases and confounding variables besides gender that make it impossible to find a single number to add to every female instructor’s score.

  • Biases are not consistent across fields. For example, at Sciences Po in Paris, there is a greater proportion of female instructors in sociology than in economics, and the observed gender bias is less in sociology than in economics.  Any correction to SET would have to vary by course matter.
  • Biases depend on student gender as well. Our research shows that in some schools, male students rate their male instructors significantly higher than their female instructors while female students tend to rate them the same.  This is a problem for adjusting scores because the gender balance in the class will affect the instructor’s score. For instance, imagine a hypothetical male instructor who teaches two identical classes. On average, his male students give him a rating of 4.5 and females give him a rating of 4.  In the first class, the gender balance is 50/50, so the average rating is 4.25.  In the second class, there are 80 males and 20 females, so the average rating will be 4.4.  There’s no one magic number to add or subtract from this average to cancel out the gender bias when comparing this score to the SET of other instructors.
  • There is some evidence that SET are biased by the instructor’s race and age as well.  We lack data on this, but similar work on bias in hiring decisions has showed that people (men and women alike) comparing identical resumes will tend to prefer job applicants with male, European-sounding names.  Anecdotally, instructors who have accents or are above average age (even as young as mid-thirties in some places!) fare worse on their SET.

The list could go on — I’m sure there are a ton of other confounding variables, like time of day of the class, difficulty of the course material, etc. which affect how students tend to rate their instructors.  In order to find a correcting factor for each female instructor, you’d have to look at all of these variables and average them out.  In fact, you ought to do that for male instructors too, since gender isn’t the only bias.  This just highlights the fact that SET aren’t measuring teaching effectiveness in the first place; they’re a better measure of how comfortable or satisfied a student is in the class.

Admittedly, the title of this post sounds combative. But it’s not — of course something needs to be done about the pervasive gender bias that’s causing female faculty to lose teaching positions and costing them job promotions.  I’m merely arguing that it is impossible to effectively “correct” for gender bias, and so alternative, more objective means for evaluating teaching effectiveness should be used instead of SET.