End use of ‘statistical significance’ in results, scholars urge

Editorial in journal special issue says ‘arbitrary’ p-value threshold for analysing results has become ‘meaningless’

March 21, 2019

People dancing under a limbo bar symbolising the balance between pension payments and benefits

Source: iStock/william87

Researchers should stop describing results as “statistically significant” simply because they pass an “arbitrary” probability threshold, an influential journal has urged.

An editorial in a special issue of The American Statistician says there should be an end to the practice of using “p-values” to validate the significance of results.

P-values are often used to show the probability that a particular result could have happened for a reason other than the one hypothesised – the “null hypothesis”. If the likelihood of a result’s occurring because of this null hypothesis is less than 5 per cent – a p-value of 0.05 – this is often deemed statistically significant and sometimes taken as strong evidence that the original hypothesis is true.

However, critics have increasingly been taking issue with such an approach, arguing that statistical significance is not the same as conclusive proof.

The issue goes to the heart of the debate on the reproducibility of research, with concerns that as well as statistical significance being misinterpreted, some scholars are even using p-values to essentially trawl for any results that pass the threshold.

In the special issue of The American Statistician – “Statistical Inference in the 21st Century: A World Beyond P<0.05” – dozens of academics explore the issues surrounding the use of p-values and how researchers should properly interpret scientific results.

Writing in the issue’s editorial, statisticians including Ronald Wasserstein, executive director of the American Statistical Association, and Nicole Lazar, professor of statistics at the University of Georgia, say they have concluded that “it is time to stop using the term ‘statistically significant’ entirely” because it has become “meaningless”.

“No p-value can reveal the plausibility, presence, truth, or importance of an association or effect. Therefore, a label of statistical significance does not mean or imply that an association or effect is highly probable, real, true, or important. Nor does a label of statistical non-significance lead to the association or effect being improbable, absent, false, or unimportant,” they write.

“For the integrity of scientific publishing and research dissemination, therefore, whether a p-value passes any arbitrary threshold should not be considered at all when deciding which results to present or highlight.”

Instead, the authors point to approaches suggested in many of the 43 papers published in the special issue, including properly setting out the context of any research, being honest about the limitations of a statistical analysis and using other methods that can be “complementary” to p-values.

The authors of the editorial accept that the scientific community may be unlikely to converge on one “simple paradigm” for testing statistics and indeed “may never do so”, but they add that “solid principles for the use of statistics do exist, and they are well explained in this special issue”.

As well as focusing on the use of p-values, articles in the issue also criticise the incentives embedded in current scientific culture – such as the assessment of academics’ performance using metrics – which many scholars believe are behind the incorrect use of statistics.

In one paper, David Colquhoun, emeritus professor of pharmacology at UCL, says that “in the end, the only way to solve the problem of reproducibility is to do more replication and to reduce the incentives that are imposed on scientists to produce unreliable work. The publish-or-perish culture has damaged science, as has the judgment of their work by silly metrics.”

simon.baker@timeshighereducation.com

Read more about

Read more about:

Register to continue

Why register?

Registration is free and only takes a moment
Once registered, you can read 3 articles a month
Sign up for our newsletter

Subscribe

Or subscribe for unlimited access to:

Unlimited access to news, views, insights & reviews
Digital editions
Digital access to THE’s university and college rankings analysis

Please or to read this article.

Related articles

fortune teller

Bid to use AI to predict research reproducibility launched

US government funding $7.6 million (£5.9 million) project designed to give policymakers a quick indication of reproducibility

By Rachael Pells

8 February

Two women wearing fancy hats

Betting scientists correctly predict reproducibility of papers

Findings suggest markets could be used to help prioritise which experiments need repeating most urgently

By Rachael Pells

27 August

twins sunflowers

Is science really facing a reproducibility crisis?

NAS calls for US lawmakers to bring change also brings warning that crisis talk may ultimately ‘stifle frontier discoveries’

By Rachael Pells

23 April

Man and dog dressed alike

Claims that reproducibility crisis ‘overblown’ spark debate

Scientists reject the ‘crisis narrative’ as an inflammatory distraction from bigger issues

By Rachael Pells

29 March

Reader's comments (1)

#1 Submitted by David Colquhoun on April 2, 2019 - 4:48pm

It is a great shame that Nature says they intend to ignore the recommendations. The problem is that both authors and editors have incentives to publish results without worrying too much about whether or not they are true.

Sponsored