A few years ago my friend and debate partner rightly scolded me for confusing correlation and causation. Since then I've been very careful to distinguish the two and have pointed out cases where others have committed the same confusion. So this is a post about correlation and causation.
For those who aren't math experts, a definition or two. When scientists study something they take measurements. What they measure is, of course, dependent on what they study. These measurements are usually plotted on a graph. Many times several types of measurements are done and scientists want a way to compare two (or more) sets of measurements.
One way to do that is to use a math equation called correlation. Do the graphs of the two things move up and down together? Does one move up when the other moves down (and the opposite)? Do the up and down movements on the graph not match at all? Scientists put their numbers into the equation and come up with a number somewhere from -1 to 1. If the two sets of data move up and down together the number will be positive, and the more closely they move together the closer the number will be to 1. If the one set moves up while the other moves down the number will be negative and the more closely they move in opposite directions the closer the value will be to -1. If the ups and downs don't match at all the number will be zero or close to it.
Now we get to the confusion. If the correlation between thing A and thing B is close to 1, did A cause B? Does more of A make more of B happen? If the number is close to -1 does more of A make less of B happen? The correct answer is that we can't tell from just the correlation number. It is possible there is a thing C that makes both A and B happen. Or perhaps C makes B happen and A just happens to match.
First example, thing A might be level of obesity, thing B might be instance of high blood pressure – the fatter you are the more likely you will have high blood pressure. These apparently have a correlation near 1. Does obesity cause high blood pressure? Or does the stress of being obese in a thin-obsessed culture cause the high blood pressure? Insufficient data.
The second example is one that I discussed back in December. I wrote about inaccuracies in temperature readings, which puts holes in the case for global warming (I'm very aware I'm writing this after the warmest winter and warmest February and shortly after 2015 was declared the warmest year). In this example thing A is level of carbon in the atmosphere and thing B is average global temperature. Again, the correlation number appears to be close to 1. Does the level of carbon cause the higher temps or is something else going on, such as the natural cyclical output of energy from the sun? I'll repeat what I said before: I can't tell and I believe we should reduce carbon in the atmosphere either way.
And now for a third example. I've been reading an edition of the Mother Jones magazine that I found in Dad's basement, this one from February 2013. One article caught my attention and seems relevant in light of the lead contaminated water in Flint. The article by Kevin Drum discusses the cause of the high crime rates in the 1990s and the steady drop since then. What made that happen?
Every person studying crime has an answer. It was New York's push to crack down on petty crimes to warn criminals they better not attempt the big crimes. It was the push to build more prisons, getting criminals off the streets. It was tougher sentencing on drug offenses. It was because there was a general switch from heroin to marijuana. It was because legalized abortion led to fewer unwanted babies. So which is it?
This is where that correlation number is useful. If the correlation number between thing A and thing B (in this case perhaps heroin use and level of crime) is close to zero then it is pretty clear that A did not cause B. Yes, high correlation cannot prove A caused B, but correlation near zero is a pretty good indicator that A did not cause B.
In all the standard explanations for what caused the drop in crime the data, for one reason or another, didn't fit. The drop in crime began a few years before New York got tough on petty crimes.
So what did match the rise and fall of crime?
Rick Nevin was a consultant with the US Department of Housing and Urban Development. A suggestion by someone at HUD got Nevin to explore the issue. He found a very high correlation between the use of lead (tetraethyl lead) in gasoline, if the data is offset by 23 years. Lead was added to gasoline in 1937, its use peaked in the late 1960s, and was gone in 1986. Violent crime began to rise in 1960, peaked around 1990, and was down significantly by 2009.
Correlation? Yes. Causation? Needs more work.
Nevin found much more correlation. He found the same match in other countries around the world (all countries he examined). He found the same match in various cities around the country, he found the same match in various neighborhoods. Nevin even found data that matched lead levels in children with arrest rates when those children grew into adults. That's a very high level of correlation.
So, we're done? A wise scientist would look for more.
And we do have – some – more.
We have lots of evidence of lead's effect on the brain. It lowers IQ and it also messes with the part of the brain that does aggression control, attention, and verbal reasoning. The effect is more pronounced in boys. Those who have been poisoned by lead may have just enough drop in reasoning and aggression control to push them into a life of crime.
Is that enough to say that lead is the cause of the level of crime? Nevin thinks it is. One reason I wrote this post is his reliance on correlation isn't supposed to be proof. From this conclusion Nevin has a few things that concern him. Old houses still have lead paint. All that lead in gasoline that was released into the atmosphere is now settled into soil and kids play in the dirt (which is why the crime rate in 2009 is higher than the crime rate in 1960, even though the level of lead in gasoline 2009 is the same as in 1937).
Alas, various people that work in criminology haven't bought Nevin's idea. That would mean giving up on their own pet projects. Prison officers are still likely to push more prisons as the answer. Conservatives still want to blame lefty soft-on-crime policies.
Getting rid of lead in houses and in soil is possible, at the cost of perhaps $20 billion a year for 10 years. But the return – in higher IQ, in lower medical costs, in lower crime rates – would be about $200 billion a year, a hefty 10-1 return rate. We could also save on the cost of building prisons. Alas, the GOP brand of mathematics doesn't work like that.
If lead in the environment and in people is a big contributor to the crime rate then look for a big spike in the crime rate in Flint around 2037.
This ends today's lesson in correlation. I'm sure my friend and debate partner will grade my descriptions of correlation and its use in science. I'll let you know if I flunked.