If there is anything that I love more than statistics and methods, it’s stories about stats and methods. Bloomberg has a story from two professors at the University of Michigan about separating lies from the statistics. This isn’t the first mention of the complicated role that statistics has in our research. University of Delaware professor Joel Best published Damned Lies and Statistics in 2001 and then followed it up with More Damned Lies and Statistics in 2004. In the Bloomberg piece, authors (Stevenson and Wolfers) offer six tips for considering statistics that you encounter in scholarly articles. My favorite point they make is:
“Data mavens often make a big deal of their results being statistically significant, which is a statement that it’s unlikely their findings simply reflect chance. Don’t confuse this with something actually mattering. With huge data sets, almost everything is statistically significant. On the flip side, tests of statistical significance sometimes tell us that the evidence is weak, rather than that an effect is nonexistent. Remember, results can be useful even if they don’t meet significance tests. Sometimes questions are so important that we need to glean whatever meaning we can from available data. The best bad evidence is still more informative than no evidence.”
This really cuts to the bone of what is wrong with academic publishing. I believe that there is a tendency to only be interested and concerned with articles that establish significant relationships, not “non-significant” relationships. I think that the result of this a bias in academic publishing powered by decisions that researchers make in pursuing particular projects. Meaning that when exploratory analyses show trivial relationships that aren’t statistically significant, researchers don’t pursue publication of those findings because that would be an exercise in wasted time and effort.
Further, Stevenson and Wolfers’ point about findings that matter is an important one to consider, particularly those of us that work with big data sets. We need to constantly remind ourselves to ask tough questions of our data and analyses like, “So what?” and “Is this a real relationship that matters or is it an artifact of my large data set?” When we do that, we can be sure that we’re using statistical tools properly.