End use of ‘statistical significance’ in results, scholars urge

Editorial in journal special issue says ‘arbitrary’ p-value threshold for analysing results has become ‘meaningless’

三月 21, 2019

People dancing under a limbo bar symbolising the balance between pension payments and benefits

Source: iStock/william87

Researchers should stop describing results as “statistically significant” simply because they pass an “arbitrary” probability threshold, an influential journal has urged.

An editorial in a special issue of The American Statistician says there should be an end to the practice of using “p-values” to validate the significance of results.

P-values are often used to show the probability that a particular result could have happened for a reason other than the one hypothesised – the “null hypothesis”. If the likelihood of a result’s occurring because of this null hypothesis is less than 5 per cent – a p-value of 0.05 – this is often deemed statistically significant and sometimes taken as strong evidence that the original hypothesis is true.

However, critics have increasingly been taking issue with such an approach, arguing that statistical significance is not the same as conclusive proof.

The issue goes to the heart of the debate on the reproducibility of research, with concerns that as well as statistical significance being misinterpreted, some scholars are even using p-values to essentially trawl for any results that pass the threshold.

In the special issue of The American Statistician – “Statistical Inference in the 21st Century: A World Beyond P<0.05” – dozens of academics explore the issues surrounding the use of p-values and how researchers should properly interpret scientific results.

Writing in the issue’s editorial, statisticians including Ronald Wasserstein, executive director of the American Statistical Association, and Nicole Lazar, professor of statistics at the University of Georgia, say they have concluded that “it is time to stop using the term ‘statistically significant’ entirely” because it has become “meaningless”.

“No p-value can reveal the plausibility, presence, truth, or importance of an association or effect. Therefore, a label of statistical significance does not mean or imply that an association or effect is highly probable, real, true, or important. Nor does a label of statistical non-significance lead to the association or effect being improbable, absent, false, or unimportant,” they write.

“For the integrity of scientific publishing and research dissemination, therefore, whether a p-value passes any arbitrary threshold should not be considered at all when deciding which results to present or highlight.”

Instead, the authors point to approaches suggested in many of the 43 papers published in the special issue, including properly setting out the context of any research, being honest about the limitations of a statistical analysis and using other methods that can be “complementary” to p-values.

The authors of the editorial accept that the scientific community may be unlikely to converge on one “simple paradigm” for testing statistics and indeed “may never do so”, but they add that “solid principles for the use of statistics do exist, and they are well explained in this special issue”.

As well as focusing on the use of p-values, articles in the issue also criticise the incentives embedded in current scientific culture – such as the assessment of academics’ performance using metrics – which many scholars believe are behind the incorrect use of statistics.

In one paper, David Colquhoun, emeritus professor of pharmacology at UCL, says that “in the end, the only way to solve the problem of reproducibility is to do more replication and to reduce the incentives that are imposed on scientists to produce unreliable work. The publish-or-perish culture has damaged science, as has the judgment of their work by silly metrics.”

simon.baker@timeshighereducation.com

阅读更多相关文章

阅读更多相关文章:

Research

请先注册再继续

为何要注册？

注册是免费的，而且十分便捷
注册成功后，您每月可免费阅读3篇文章
订阅我们的邮件

Reader's comments (1)

#1 Submitted by David Colquhoun on 四月 2, 2019 - 4:48pm

It is a great shame that Nature says they intend to ignore the recommendations. The problem is that both authors and editors have incentives to publish results without worrying too much about whether or not they are true.

End use of ‘statistical significance’ in results, scholars urge

Editorial in journal special issue says ‘arbitrary’ p-value threshold for analysing results has become ‘meaningless’

请先注册再继续

订阅

Reader's comments (1)

您可能也喜欢

40小时周制对表演艺术专业学生是否仍可行

同行评审只有在审稿人署名且有评分时才有效

调查：1/3研究人员使用ChatGPT工作

盘点2023年最受网络关注的研究文章