Researchers across disciplines ‘fail to understand’ p-values

Even those in maths and statistics misunderstood the method blamed for the reproducibility crisis, research in China finds

February 28, 2020
Source: Getty

Many scholars, even those working in disciplines such as mathematics, do not properly understand commonly used statistical methods in science, a study suggests.

About 90 per cent of researchers and students surveyed for the study in China failed to correctly interpret the use of p-values and confidence intervals, two of the most common statistical tools used to analyse scientific results.

Almost 1,500 people, from undergraduates to postdoctoral researchers, were given a series of false statements about the interpretation of p-values and confidence intervals and asked to judge if any were correct.

A total of 89 per cent of the participants made at least one error on p-values, and 93 per cent made at least one mistake when considering the correct interpretation of confidence intervals.

The proportion incorrectly interpreting the two methods did not vary much across disciplines, with even 85 per cent to 90 per cent of those working in maths and statistics failing to spot that all the statements were wrong.

Even when looking at only postgraduates and researchers, the proportion misunderstanding the methods remained high, although a slightly smaller share of those with a PhD made an error interpreting p-values.

Respondents to the survey were also asked to indicate how confident they were in making their decisions on a scale of one to five. Based on the results, the researchers and students were “generally confident about their (incorrect) judgements”.

“These results suggest that researchers generally do not have a good understanding of these common statistical indices,” the paper says. That, it goes on, might indicate that the embedded “ritual” of using such methods wrongly “is not limited to psychology or social science but also [extends] to the entire scientific community”.

The paper, published in the Journal of Pacific Rim Psychology, adds to the growing evidence about the problems of using tools such as p-values. Last year, there was a major call by statisticians to stop using them as a way to deem results as “statistically significant”.

Chuan-Peng Hu, a postdoctoral researcher at the Leibniz Institute for Resilience Research in Germany, and co-author of the new study, said giving undergraduates better training in statistical inference would help to counter the problem, but there also needed to be “constant learning” among scholars at “all levels”.

In addition, he warned, incentives for researchers had to change. “The current system doesn’t care so much about being correct; instead, we are rewarded [for being] productive,” he said. “Changing the culture would be the long-term goal.”

The analysis did find that those whose highest degree had been obtained outside mainland China had a slightly lower error rate on interpreting p-values.

“The only available explanation for this scenario might be that the replication crisis was discussed more in the English media than in the Chinese media. Therefore, students who had studied overseas were more familiar with this topic than their local counterparts,” the paper says.

Please Login or Register to read this article.

Register to continue

Get a month's unlimited access to THE content online. Just register and complete your career summary.

Registration is free and only takes a moment. Once registered you can read a total of 3 articles each month, plus:

  • Sign up for the editor's highlights
  • Receive World University Rankings news first
  • Get job alerts, shortlist jobs and save job searches
  • Participate in reader discussions and post comments

Related articles

Reader's comments (1)

This is completely unsurprising. I would hazard a guess that if you were to perform the same study in the US or UK you would get a similar response. We are failing our students and ourselves. In numerous serious stats reference books confidence intervals get nary a mention - a couple of pages at most. According to Robert Newcombe, the vast majority of mentions of Binomial intervals (simple choice probabilities) employ the incorrect 'Wald' interval anyway, which I think explains why many stats books have avoided them - with the Wald, the results from significance testing and confidence intervals diverge (get a different and inconsistent result). Unsurprisingly statisticians avoid confidence intervals. The correct approach involves inverting the Gaussian or Binomial function, and was pointed out by EB Wilson in 1927 (so much for impact factors). See I have also developed a range of new test methods from this perspective. I run a blog for corpus linguists at which translates this into a particular applied domain, linguistics, but colleagues may find many of the methods useful in other fields.