Stochastic model of the spread of ebola

A few weeks ago I heard Oskar Hallatschek from the physics department give a talk about the spatial dynamics of evolutionary spread. He was modeling evolutionary drift, but the stochastic models can apply to epidemics too: the analogy is between mutations popping up in nearby populations and disease outbreaks. The talk was very timely considering the ebola crisis. In my mind I keep imagining a haunting series of animated simulations of spread events that he showed (you can see a static picture of the end result here).  I’m curious how the dynamics of ebola relate to these stochastic models.  It’s intuitively obvious that the way outbreaks spread depends on how many people are around, how contagious the disease is, etc, but it’s not so clear how to model and estimate these parameters.

The best study of the dynamics of this ebola outbreak that I’ve seen so far came out of PLOS.  People can speculate about what will happen all day, but I always appreciate when there’s some data backing up speculations.  Plus, their data visualizations were nice enough to be shown as evidence in Congress the other day.  Using standard stochastic models for epidemics, the authors estimated the basic reproductive number in West Africa using data from the beginning of July.  They found the reproductive number is around R0 = 1.8 which means that on average, one person with ebola will generate 1.8 more ebola cases.  In this model if R0 > 1, then the disease will tend to spread and become an epidemic.

Then, after the models for local outbreaks in West Africa were selected, the authors combined the models with airline data to estimate the spread of ebola internationally via simulations.  This is something that models like Hallatschek’s evolutionary spread dynamics and typical epidemic models aren’t able to capture because it incorporates “importation events”, big jumps across distances that wouldn’t be possible but for modern transportation.  The authors considered two cases, if the Nigeria outbreak would be contained during the timeline of their simulations and if it were not (fortunately it has since been contained).  They estimated both the probability of importation events (see the plots below) and the number of cases generated by an importation event in each country.  Judging by their results, it seems like the spread of ebola across the globe is quite likely, but vigilant public health efforts will be able to prevent a large number of cases.

This study came out more than 6 weeks ago so I’d be interested to see updated projections given the new data since then.


Chart junk from the USDA

If you know me at all in real life, you know I’ve been on board the eat-real-foods train since I started cooking for myself.  I care a lot about making informed decisions about the foods I eat and enabling others to make the best choices about what to eat.  Marion Nestle is one of my food heroes and her blog led me to the USDA’s recent report of how consumers use nutrition information labeling in restaurants and fast food places.

I was thoroughly unimpressed by the plots in their report.  After all, charts are often the most effective way to communicate data, especially exploratory comparisons like this study.  I looked to Howard Wainer’s “Dirty Dozen” ways to display data badly for the right words to describe some of these plots.


  • Label illegibly, incompletely, incorrectly, ambiguously:  After studying the plot for a while, I think I’ve figured out what’s going on.  My first question was what on earth are the numbers above the bars?  I’d guess that they’re grams of sugar intake, but the color legend seems to indicate that blue means no sugar and yellow means yes sugar.  But wait, who eats no sugar?
  • Change the scales mid-axis: This isn’t quite a change of measurement scale, but change of the survey sample mid-axis.  Take a look at the footnote and you’ll see that only people who answered yes to the first question are included in the second pair of bars, and the same goes for the second and third pairs of bars.  The side-by-side placement of the three pairs leads us to believe that we’re comparing different statistics for the same group of people, when in fact the people surveyed for subsequent questions are a subset of the earlier group.


  • Emphasize the trivial, ignore the important: Perhaps my confusion here ties back to the unclear numbers on top of the bars.  But to me, it seems pretty trivial – if the righthand 3 bars are supposed to represent people who ate at fast food restaurants more than 5 times per week, then why would you even include a bar to say that 98.3% go to fast food restaurants?  Why isn’t it 100%?  My eye is drawn to these taller blue bars, whereas I think what they meant to convey is in the green bars.

A bonus complaint: the titles!  The data was taken from NHANES, a national survey.  These plots display summary statistics from the survey group.  It’s one thing to make statements about the sample of people, but it’s another to use the sample to draw inferences about all Americans.  The titles seem to suggest that the data definitively tells us things about all Americans (e.g. Americans who go to fast food/pizza places have higher sugar intake).  Significance tests were done to say that these figures were different, but the methods aren’t transparent.  This is very misleading!

All told, the plots aren’t that bad.  They do have some interesting statistics in them (people in the survey on food stamps, now called SNAP, pay more attention to nutrition facts?) and make the information more easily accessible than giant tables of data.