We all know that numbers don’t lie, but we also know that you can lie with statistics. So, which is it? In fact, it is both. The deception is in the presentation, not in the statistical process itself. To illustrate this, we look to the Coin Flip Study as authored by me, right now.
We have in our possession a fair coin. It comes up heads 50% of the time and tails 50% of the time. This is not newsworthy or interesting, but we need a coin that is. What we need is an unfair coin, so unfair that it’s hard to believe it was minted. How do we make our fair coin look like an unfair coin? This is where the power of statistical deception comes into play.
First, we turn to a mathematical tool known as the binomial distribution, which is used by scientists and mathematicians on a regular basis to calculate the cumulative probability of a series of events. So, in our case, with our fair coin, if we flip it 10 times, we expect to get five tails and five heads. The binomial distribution allows us to determine how likely this is, and how likely other results are as well. As it happens, in a trial of 10 coin flips with a fair coin, you will get exactly five heads and five tails 24.6% of the time, as you can see for yourself by using a binomial distribution calculator.
Our goal is now to figure out what sort of result will look so statistically improbable for a fair coin that we can point to it in our study and say that this coin is most likely unfair. Let’s say we’re looking to market our coin as one that comes up heads more often than a fair coin. What, then, if a trial of 10 flips showed 9 heads? How likely is that outcome with a fair coin? We see from our calculations that a trial that results in at least nine heads out of 10 flips is only 1.07% likely. That is quite improbable. If someone handed you a coin and you flipped it 10 times and got nine heads, you would be right to be a little suspicious of that coin. The best course of action would be to repeat the trial and see if it happens again. If it did, you could be confident in assuming it really is an unfair coin. A 1.07% event happening twice in a row is a 0.0011% event, or about a 1 in 8700 event. That would be the sensible course of action, but that is not what someone who is setting out to lie with statistics does.
The statistical deceiver would see that a trial of 10 coin flips results in nine heads 1.07% of the time and then commission many studies until the desired results are obtained. How many times would you have to repeat a trial of 10 coin flips before it was likely that one of those trials resulted in nine or more heads? The answer is that, if the probability of seeing a trial of 10 flips with nine or more heads is 1.07%, then the probability of a trial not resulting in 9 or more heads is 98.93%. The probability of not seeing this twice in a row is given by multiplying this by itself, so 98.93% x 98.93% = 97.87%. Each additional trial you’re willing to conduct makes it more likely that you will eventually have one where you obtain a result of nine or more heads. After 10 trials, it is 89.80% likely that you will not have seen a trial with nine or more heads; after 100 trials, it is only 34.10% likely that you will not have seen a trail with nine or more heads. The 50/50 threshold is passed at 65 trials, so you are likely by the 65th trial to have seen at least one trial that gives you the results you want. By simply conducting the same trial over and over, you will eventually get the results you can point to to claim, with statistical backing, that your coin is unfair.
This, of course, is dishonest. What you have really done after 100 trials of 10 flips, for example, is conduct a single trial of 1,000 flips, and if we had added all the heads and tails of that trial together, it would likely be more representative of our coin being a fair coin than the one trial we cherry-picked from the series of trials. But, this is precisely what people with a vested interest in seeing a certain result do when conducting studies. When we publish the Coin Flip Study that “proves” our coin is not a fair coin, we simply don’t discuss all the other trials that were conducted, and point to the one trial.
Perhaps you are a skeptic and immediately see through this trickery by way of our small sample size, because you believe that a 10 flip sample size is irrelevant. In fact, size doesn’t matter when it comes to statistical deception. We could have each trial be a 1,000 flip trial and adjust our parameters. In the case of a 1,000 flip trial, a result of 537 or more heads will happen about 1.05% of the time, similar to the odds of getting nine or more heads in a 10 flip trial. If we conduct our 1,000 flip trial 100 times, it is only about 35% likely that one of those trials won’t turn up the result of 537 or more heads. For any sample size you can imagine, you can always target a certain probability threshold and look for the results that fit the outcome you want.
We could go about obtaining the results in several different ways, and we could package how we do it in different ways as well. We could conduct all the trials and have the scientists scrap the ones we don’t want, which would require unethical researchers to go along with our plan in their coin flip laboratories, or we could commission coin flip studies in many different laboratories, where scientists perform the trials we ask and report back the results they obtained. After we have commissioned enough studies, we can discard all the ones we don’t like and then use the ones we do like as the basis for an argument. We can supply pretty graphs and charts with probabilities to make it look legitimate. And, this is what the funders of studies do who have a vested interest in seeing the results swing one way.
Consider, for example, if we weren’t just flipping coins, but we were doing something like fracking. Suppose we wanted to show that our practices were safe, that a sophisticated computer simulation showed that our method of fracking was not likely to result in an undesirable environmental outcome. Instead of commissioning laboratories to flip coins, now we are commissioning them to create and run computer simulations. If our practices really result in about a 50% chance of an undesirable outcome, but we need our study to say it is safer than that, we need look no further than the fair coin example. In the case of the fair coin, we know that it is 50% likely to have either outcome, but we were able to concoct a bogus study that said it came up with nine or more heads on a 10 flip trial, and was therefore very unlikely to be a fair coin. In the case of the frackers, if a hypothetical study cost them $10,000, then 100 studies would cost $1,000,000, and would be likely to result in a dataset that they could use to show their fracking practices were safer than 50/50. This is a small investment in an industry where the largest companies measure their revenue in the hundreds of billions of dollars. In other words, if scientists can be bought off cheaply, or if studies can be conducted for tiny percentages of expected profits, then it should not surprise us when it happens, regardless of the industry in question.
Unfortunately for everyone, this sort of statistical manipulation goes on all the time. A clever propagandist can devise a study to back almost any claim and, sadly, media outlets generally don’t care or are unable to discern when statistical trickery has been employed. Statistical logic and critical thinking in general are not taught well in the American school system and, I imagine, are somewhat lacking in primary education on a universal level, so it is unrealistic to expect the consumers or producers of media to be aware of when they are being fooled. I routinely hear phrases such as, “100,000 is a lot of people, so this is statistically significant trial,” or, “an MIT graduate led this study” as justifications for why we should believe the results of any given study. Statistical significance is intrinsic to each trial, to what is being tested, and what constitutes statistical significance in one case does not constitute statistical significance in another. A collection of 100,000 data points may or may not be statistically significant in any given experiment and, even if they were, someone willing to conduct enough studies will be able to get past barriers of statistical significance by way of cherry-picking data and results, anyway.
We see examples of this, I would surmise, almost everyday. One recent kerfuffle that has resulted in a barrage of studies back and forth is the issue of NFL players and life expectancy. For years, studies had pointed toward NFL players having a lower life expectancy than the general population, and the numbers were looking dire, but then some new studies began to emerge. I encountered one that claimed that NFL players had lower mortality rates than the general populace, so I decided to take a look to see what their findings were. Right away, in the abstract, I saw that the authors of the study had compared the mortality rates of the general population to those of NFL players from the 1970 and 1994 seasons. I will admit that I didn’t read too far after that, because this study was meant to counter studies with more robust datasets, but the authors had decided to just use two of the dozens of available seasons from which to draw their data. Why would they discard, say, the other 23 seasons between those two seasons? I can think of some reasons why, and it’s the same type of thinking that might have you discard 23 out of 24 coin flip trials if you were looking for a specific set of results.
I am aware of this type of manipulation largely because of my background in gambling. In the world of sports betting, there is something called a tout, someone who is allegedly able to beat the odds and goes into business selling picks to recreational bettors who are looking for an edge. A tout will point to previous, documented records that show a winning record. The scam these touts use is exactly the same scam as people who commission voluminous amounts of studies but then only report one use; they omit all the results that don’t conform to a certain narrative. The tout may release several different sets of picks over the course of a season. The tout can do this under different identities or just under different packages available to customers. If we suppose the tout is really just an average predictor of game outcomes, or about 50/50, but needs to pick closer to 55/45 in order to claim to be a winning sports bettor, that means in a 16 game football season, instead of eight correct and eight incorrect picks, the tout will need to produce at least nine correct and seven incorrect picks. According to the binomial distribution, this will happen about 40% of the time. So, if a tout releases five different sets of picks, by the end of the season, he or she is over 90% likely to have at least one set of picks that shows nine or more wins. If the tout releases 10 different sets of picks, the tout is over 99% likely to be able to point to a winning record. The tout chooses the best set of results, points to those, and can go forward charging people for expert picks. Real touts obviously release far more than one pick per week, but the concept still holds; by releasing enough different sets of picks, they end up with at least one set of results that they can use to back their claims of being able to pick winners.
It is easy for me to suggest that you never buy sports betting picks from a tout. It is obvious they are confidence artists who prey on compulsive sports bettors who don’t have the time or ability to figure out how to maximize their betting odds on their own and are looking for a quick buck, but it is harder to explain how to look and think critically about scientific studies and people who point to them as evidence to support an argument. My only suggestion is to be skeptical, and try to figure out who commissioned the study in question and why. Some cases of cherry-picking are obvious, such as the NFL study I cited above, where no amount of technical jargon or sophisticated looking equations will mask the fact that the data was so obviously rigged, but other studies are more subtle. In other cases, the fudging, selective reporting, and cherry-picking are almost impossible to detect. Ultimately, you only have your own powers of critical thinking and reasoning to fall back on. Do you think casinos are built on people coming in with picks from touts and beating them every week? Do you think the NFL owners have the best longterm interests of the players in mind? Do you think the oil industry puts public safety over profits? You don’t need to be a conspiracy theorist to protect yourself from disinformation — personal gain through unethical behavior rarely requires anything as elaborate as a conspiracy — but you do need to think critically.