Stochastic model of the spread of ebola

A few weeks ago I heard Oskar Hallatschek from the physics department give a talk about the spatial dynamics of evolutionary spread. He was modeling evolutionary drift, but the stochastic models can apply to epidemics too: the analogy is between mutations popping up in nearby populations and disease outbreaks. The talk was very timely considering the ebola crisis. In my mind I keep imagining a haunting series of animated simulations of spread events that he showed (you can see a static picture of the end result here).  I’m curious how the dynamics of ebola relate to these stochastic models.  It’s intuitively obvious that the way outbreaks spread depends on how many people are around, how contagious the disease is, etc, but it’s not so clear how to model and estimate these parameters.

The best study of the dynamics of this ebola outbreak that I’ve seen so far came out of PLOS.  People can speculate about what will happen all day, but I always appreciate when there’s some data backing up speculations.  Plus, their data visualizations were nice enough to be shown as evidence in Congress the other day.  Using standard stochastic models for epidemics, the authors estimated the basic reproductive number in West Africa using data from the beginning of July.  They found the reproductive number is around R0 = 1.8 which means that on average, one person with ebola will generate 1.8 more ebola cases.  In this model if R0 > 1, then the disease will tend to spread and become an epidemic.

Then, after the models for local outbreaks in West Africa were selected, the authors combined the models with airline data to estimate the spread of ebola internationally via simulations.  This is something that models like Hallatschek’s evolutionary spread dynamics and typical epidemic models aren’t able to capture because it incorporates “importation events”, big jumps across distances that wouldn’t be possible but for modern transportation.  The authors considered two cases, if the Nigeria outbreak would be contained during the timeline of their simulations and if it were not (fortunately it has since been contained).  They estimated both the probability of importation events (see the plots below) and the number of cases generated by an importation event in each country.  Judging by their results, it seems like the spread of ebola across the globe is quite likely, but vigilant public health efforts will be able to prevent a large number of cases.

This study came out more than 6 weeks ago so I’d be interested to see updated projections given the new data since then.

Does social dependence compromise statistical independence? Why I still eat conventionally grown foods

The food world is buzzing about the study that came out last week claiming that organic foods contain higher levels of antioxidants and lower pesticide residues than non-organic foods.  It has gotten so much media attention that I just can’t not comment.  I don’t necessarily think it’s nonsense, but I am skeptical of its conclusions.

The study is a meta-analysis of previous papers on the topic.  The authors read through 300+ studies of organic crops and aggregated data from them.  The idea is that with a bigger sample size and more data, we should have more power to detect a difference between the compounds in organic and non-organic foods.  Frankly, I didn’t read the paper in great detail; I’m generally mistrustful of meta-analyses.  In their essay “Statistical Assumptions as Empirical Commitments”, Berk and Freedman criticize meta-analyses, first on the grounds that it doesn’t necessarily make sense to assume treatment (organic farming practices, in this case) should have the same effect across all studies:

“If we seek to combine studies with different kinds of outcome measures (earnings, weeks worked, time to first job), standardization seems helpful.  And yet, why are standardized effects constant across these different measures?  Is there really one underlying construct being measured, constant across studies, except for scale?  We find no satisfactory answers to these critical questions.”

The studies used in the meta-analysis were done in countries all across Europe.  Certainly there are regulations about what can be considered organic, but there’s no telling how different farms handled crops differently and differences in how the outcomes were measured across studies.

Furthermore, a successful meta-analysis relies on the assumptions of random sampling and statistical independence.  Since the “units of analysis” are research studies, these assumptions hardly make sense.  They clearly are not sampled randomly; the authors carefully read through hundreds of papers to find data and chose the ones that met certain requirements.  The assumption of statistical independence is even less justified.  Freedman and Berk bring up an interesting point, the human side of how they simply cannot be independent:

“The assumed independence of studies is worth a little more attention.  Investigators are trained in similar ways, read the same papers, talk to one another, write proposals for funding to the same agencies, and publish the findings after peer review.  Earlier studies beget later studies, just as each generation of Ph.D. students trains the next.  After the first few million dollars are committed, granting agencies develop agendas of their own, which investigators learn to accommodate.  Meta-analytic summaries of past work further channel the effort.  There is, in short, a web of social dependence inherent in all scientific research.  Does social dependence compromise statistical independence?  Only if you think that investigators’ expectations, attitudes, preferences, and motivations affect the written word – and never forget those peer reviewers.”

And here’s the kicker: the study was funded by an organization that funds research in support of organic farming practices.  They state at the end of the paper that the “design and management” weren’t influenced by the funding organization, but it’s not difficult to imagine biases in how the proposal and research questions were formulated from the get-go.

It’s going to take more than a meta-analysis to get me to go organic.

Chart junk from the USDA

If you know me at all in real life, you know I’ve been on board the eat-real-foods train since I started cooking for myself.  I care a lot about making informed decisions about the foods I eat and enabling others to make the best choices about what to eat.  Marion Nestle is one of my food heroes and her blog led me to the USDA’s recent report of how consumers use nutrition information labeling in restaurants and fast food places.

I was thoroughly unimpressed by the plots in their report.  After all, charts are often the most effective way to communicate data, especially exploratory comparisons like this study.  I looked to Howard Wainer’s “Dirty Dozen” ways to display data badly for the right words to describe some of these plots.

Fig-9

  • Label illegibly, incompletely, incorrectly, ambiguously:  After studying the plot for a while, I think I’ve figured out what’s going on.  My first question was what on earth are the numbers above the bars?  I’d guess that they’re grams of sugar intake, but the color legend seems to indicate that blue means no sugar and yellow means yes sugar.  But wait, who eats no sugar?
  • Change the scales mid-axis: This isn’t quite a change of measurement scale, but change of the survey sample mid-axis.  Take a look at the footnote and you’ll see that only people who answered yes to the first question are included in the second pair of bars, and the same goes for the second and third pairs of bars.  The side-by-side placement of the three pairs leads us to believe that we’re comparing different statistics for the same group of people, when in fact the people surveyed for subsequent questions are a subset of the earlier group.

Fig-11

  • Emphasize the trivial, ignore the important: Perhaps my confusion here ties back to the unclear numbers on top of the bars.  But to me, it seems pretty trivial – if the righthand 3 bars are supposed to represent people who ate at fast food restaurants more than 5 times per week, then why would you even include a bar to say that 98.3% go to fast food restaurants?  Why isn’t it 100%?  My eye is drawn to these taller blue bars, whereas I think what they meant to convey is in the green bars.

A bonus complaint: the titles!  The data was taken from NHANES, a national survey.  These plots display summary statistics from the survey group.  It’s one thing to make statements about the sample of people, but it’s another to use the sample to draw inferences about all Americans.  The titles seem to suggest that the data definitively tells us things about all Americans (e.g. Americans who go to fast food/pizza places have higher sugar intake).  Significance tests were done to say that these figures were different, but the methods aren’t transparent.  This is very misleading!

All told, the plots aren’t that bad.  They do have some interesting statistics in them (people in the survey on food stamps, now called SNAP, pay more attention to nutrition facts?) and make the information more easily accessible than giant tables of data.