Lying with Statistics

If there is anything that I love more than statistics and methods, it’s stories about stats and methods. Bloomberg has a story from two professors at the University of Michigan about separating lies from the statistics.  This isn’t the first mention of the complicated role that statistics has in our research.  University of Delaware professor Joel Best published Damned Lies and Statistics in 2001 and then followed it up with More Damned Lies and Statistics in 2004.  In the Bloomberg piece, authors (Stevenson and Wolfers) offer six tips for considering statistics that you encounter in scholarly articles.  My favorite point they make is:

“Data mavens often make a big deal of their results being statistically significant, which is a statement that it’s unlikely their findings simply reflect chance. Don’t confuse this with something actually mattering. With huge data sets, almost everything is statistically significant. On the flip side, tests of statistical significance sometimes tell us that the evidence is weak, rather than that an effect is nonexistent. Remember, results can be useful even if they don’t meet significance tests. Sometimes questions are so important that we need to glean whatever meaning we can from available data. The best bad evidence is still more informative than no evidence.”

stats image

This really cuts to the bone of what is wrong with academic publishing.  I believe that there is a tendency to only be interested and concerned with articles that establish significant relationships, not “non-significant” relationships.  I think that the result of this a bias in academic publishing powered by decisions that researchers make in pursuing particular projects.  Meaning that when exploratory analyses show trivial relationships that aren’t statistically significant, researchers don’t pursue publication of those findings because that would be an exercise in wasted time and effort.

Further, Stevenson and Wolfers’ point about findings that matter is an important one to consider, particularly those of us that work with big data sets.  We need to constantly remind ourselves to ask tough questions of our data and analyses like, “So what?” and “Is this a real relationship that matters or is it an artifact of my large data set?”  When we do that, we can be sure that we’re using statistical tools properly.

Advertisements

Author: Jessica Bishop-Royse

Jessica Bishop-Royse is the SSRC’s Senior Research Methodologist. Her areas of interest include: health disparities, demography, crime, methods, and statistics. She often finds herself navigating the fields of sociology, demography, epidemiology, medicine, public health, and policy. She was broadly trained in data collection, Stata, quantitative research methodology, as well as statistics. She has experience with multi-level analyses, survival analyses, and multivariate regression. Outside of the work context, Jessi is interested in writing, reading, travel, photography, and sport.

Leave a re/ply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s