This network visualization caught my eye on Twitter so I just had to investigate. It displays the volume of food imports and exports between states in 2007. They do several nice things in this plot: use both color and position on the circle to indicate total volume for a state, rather than just naively plotting in alphabetical order (the Austria First! principle), use line width to indicate volume of trade between states, and use the line colors to indicate whether the trade is incoming or outgoing.
The paper that this plot comes from is a straightforward mathematical analysis of the food commodity trade in the U.S. The paper is sterile in the best way possible: they don’t make any claims about what the properties of the network mean beyond the mathematical estimates. There are no modeling assumptions or p-values, just observations about the way food moves throughout the nation. They don’t overstate the importance of their findings. I like it.
They bring up two interesting points about the strength (volume of trade in and out) of the nodes (trade regions):
- Distribution of node neighbors: The distribution of global trade for all commodities is scale-free, meaning that the proportion of trade centers with greater than k trade neighbors decays approximately like (1/k)^2. This implies that there are many nodes with a few neighbors and a few nodes with many neighbors. Instead, the distribution of food trade in the U.S. is approximately normal, meaning that there are many nodes with a moderate number of trade neighbors and few nodes with lots or few neighbors. This distribution is more characteristic of a social network.
- Node strength rankings: The places with greatest inward flow strength include New Orleans-Metairi-Bogalusa, Texas, Los Angeles-Long Beach-Riverside, and Chicago-Naperville-Michigan City. That’s presumably because they’re hubs for railroads and ports, so lots of things get sent there and leave the country rather than go to other states. Conversely, the places with greatest outward flow strength include Iowa, Illinois, Missouri, and Nebraska. They probably send out enormous volumes of corn and soy, America’s favorite cash crops.
There’s nothing particularly surprising about these results, but it’s nice to see them presented in a clear way supported by the numbers.
If you know me at all in real life, you know I’ve been on board the eat-real-foods train since I started cooking for myself. I care a lot about making informed decisions about the foods I eat and enabling others to make the best choices about what to eat. Marion Nestle is one of my food heroes and her blog led me to the USDA’s recent report of how consumers use nutrition information labeling in restaurants and fast food places.
I was thoroughly unimpressed by the plots in their report. After all, charts are often the most effective way to communicate data, especially exploratory comparisons like this study. I looked to Howard Wainer’s “Dirty Dozen” ways to display data badly for the right words to describe some of these plots.
- Label illegibly, incompletely, incorrectly, ambiguously: After studying the plot for a while, I think I’ve figured out what’s going on. My first question was what on earth are the numbers above the bars? I’d guess that they’re grams of sugar intake, but the color legend seems to indicate that blue means no sugar and yellow means yes sugar. But wait, who eats no sugar?
- Change the scales mid-axis: This isn’t quite a change of measurement scale, but change of the survey sample mid-axis. Take a look at the footnote and you’ll see that only people who answered yes to the first question are included in the second pair of bars, and the same goes for the second and third pairs of bars. The side-by-side placement of the three pairs leads us to believe that we’re comparing different statistics for the same group of people, when in fact the people surveyed for subsequent questions are a subset of the earlier group.
- Emphasize the trivial, ignore the important: Perhaps my confusion here ties back to the unclear numbers on top of the bars. But to me, it seems pretty trivial – if the righthand 3 bars are supposed to represent people who ate at fast food restaurants more than 5 times per week, then why would you even include a bar to say that 98.3% go to fast food restaurants? Why isn’t it 100%? My eye is drawn to these taller blue bars, whereas I think what they meant to convey is in the green bars.
A bonus complaint: the titles! The data was taken from NHANES, a national survey. These plots display summary statistics from the survey group. It’s one thing to make statements about the sample of people, but it’s another to use the sample to draw inferences about all Americans. The titles seem to suggest that the data definitively tells us things about all Americans (e.g. Americans who go to fast food/pizza places have higher sugar intake). Significance tests were done to say that these figures were different, but the methods aren’t transparent. This is very misleading!
All told, the plots aren’t that bad. They do have some interesting statistics in them (people in the survey on food stamps, now called SNAP, pay more attention to nutrition facts?) and make the information more easily accessible than giant tables of data.