It seems to me that many statistics are blatant attempts to make one thing look important by comparing it to something nearly unrelated.
Playing doctor
Suppose you were talking to a doctor, and she told you that most of the children she’d seen in the emergency room who’d been playing on swing sets were seriously injured. You would, I hope, realize that it’s to be expected: who else is your doctor friend going to see from the playground? Children who play on swing sets without getting injured?
But you see this sort of statement in newspapers and in political discussions. Sometimes it is as blatant as the above example. Other times such attempts hide behind a comparison such as:
A medical study yesterday found that 66% of playground injuries were suffered on swing sets.
According to new medical data, you are more likely to be killed by an intruder than to kill an intruder.
These statements are worthless except as argumentative tools. So 66% of playground injuries are the result of swing set injuries. Does this mean that swing sets are more dangerous than anything else on the playground? Or does it mean that swing sets are more popular than anything else on the playground? Or is it the result of heavy variation caused by low sampling? If (for example) there are only three playground injuries per year in the sample size, chances are pretty good that one playground toy is going to get two of those injuries. But saying that there have been two swing set injuries in the past five years isn’t going to sell newspapers; and it isn’t going to get laws passed regulating swing sets or playgrounds.
Tautological hand-waving
Thousands of children die every year from automobile accidents. If you own a car, the chances of your being killed with it are astronomically greater than if you don’t. If you have children, you shouldn’t own a car.
Even if you accept that you should get rid of your car to avoid being killed yourself, should you accept the notion that it will save your children? Suppose that the statistics from which this statement was taken calls ‘children’ anyone up to and including 20 years of age? In other words, three years worth of ‘children’ who may very well not even be living with their parents.
Suddenly those “thousands” of children don’t sound like many at all—and even if they do, you getting rid of your car won’t help any. Those children own their own cars.
The main weasel phrase we’re looking at here is the middle one: if you own a car, the chances of your being killed with it are astronomically greater than if you don’t. Of course this is true. If you don’t own a car, you can’t be killed by your car. It sounds completely silly, but you’ll see this form of argument, and sometimes it won’t even be disguised.
Blame conflation
We saw an example of this above, the notion that since children die in cars, and you are more likely to die in a car if you own a car, you should get rid of your car to save your children. But there are more blatant examples of this:
Drugs and alcohol users are 3.6 times more likely to injure themselves or others in workplace accidents. Workplace drug testing saves lives.
Problem? As it turns out, the following statement is also true: “Alcohol users are 3.6 times more likely to injure themselves or others in workplace accidents.” Of course, drug testing doesn’t follow from the latter, because drug testing doesn’t test for alcohol use, and even if it did it would be easy to avoid.1
When you see two causes lumped together, you need to pay attention: is there some hand-waving going on to shift your attention away from what the statistics really mean? Since such statements are usually followed by calls for change in policy, it might help to break down the needs. What is the danger that needs to be removed? In this case it would seem to be workplace accidents. Why does “drug and alcohol use” cause workplace accidents? By reducing performance. It is, of course, possible to test performance directly, and thus weed out possible workplace accidents caused not only by drugs and alcohol, but also by lack of sleep and simple incompetence. You might also consider that stopping workplace accidents isn’t the true goal of a policy-maker who introduces a ‘middle’ test that focuses in on only one possible cause, when it is easy enough to test for the problem directly. Very possibly, the ‘problem’ is simply a tool to put the middle test in place.
Popularity Contests
Something else that can happen is that statistics are presented as popularity contests when they aren’t. There are two kinds of fake popularity contests. The most obvious is presenting a fact as true or false because of poll results. For example, if 55% of adults 18 and over in New England state that they believe the moon is made of green cheese, this is presented as evidence that the moon is made of green cheese. But what the moon is made of is a tangible fact unaffected by polls on the matter. It either is green cheese or it isn’t, and no amount of polls can change that.
The second is a little more subtle. You might, for example, see the claim that “only 49% of Bartown residents received education improvements after last year’s education reforms”. The implication is that this is bad reform, because 51% did not see improvements. But without information about what those 51% did see, this is worthless information on which to make a judgement. It might very well be that only 49% of Bartown’s residents needed education reform. The rest were already receiving a good education.
There are valid claims to be made with that statistic, but as a popularity contest it’s irrelevant. If what Bartown is seeing is a 49% improvement in education, that probably beats a 0% improvement in education.
Case-Controlled Studies
In another case of “playing doctor’, you should be very careful of what are called “case-controlled” studies that purport to be generalizable to a wide population. “Case-controlled” studies are often medical studies or studies done by someone in the medical profession. They are a useful compromise that address some of the special needs of medical studies, the main one being that it would be unethical to choose people at random from the general population and give them drugs with unknown effects, or otherwise do things to them that might result in harm—the Tuskegee experiments are probably the classic example of what should not be done in medical research.
For true, generalizable studies, researchers need to take a random sample from the population they’re studying. But because this would be unethical in some medical studies, researchers attempt to “control” all the variables they can think of when researching the effects of a new drug or new cure. This is good, because without these techniques it would be difficult to research the effectiveness of new medicinal techniques, without causing unethical harm. But it also means that there always exists the possibility that the researchers forgot to “control” the sample of people for something important, or that the researchers introduced their own personal bias into the subject selection process. All research has sample selection bias to some extent, but case-controlled studies have it in spades.
When case-controlled studies are used to generalize to the population as a whole, you should beware of the procedure’s limitations, especially if the researchers purport to be finding conclusions that are not truly medical in nature.