A landmark initiative to reproduce the findings of 100 prominent psychology papers has been successful in less than half of cases.
The Reproducibility Project: Psychology was launched in 2011 by the Center for Open Science, a US non-profit organisation, in the wake of a number of fraud scandals in psychology, such as that involving Diederik Stapel, who admitted in 2011 to faking more than 50 papers.
One hundred papers were chosen from 2008 issues of three important journals in psychology, and about 350 scientists were involved in meticulous attempts to reproduce them.
Replication attempts used powerful statistical methods, and approaches were independently reviewed and posted on a central repository. The authors of the original studies were also consulted regularly.
The results of the project are reported in a paper, “Estimating the reproducibility of psychological science”, published in the journal Science.
Based on five different measures of reproducibility, the paper concludes that reproduction was successful in less than half of cases. For instance, while 97 per cent of the original studies reported statistically significant results, only 36 per cent of replications did so.
Replication rates were particularly low in social psychology, compared with cognitive psychology.
The paper also reports that even where the original effects were replicated, the statistical size of the reproduced effect was typically less than half that of the original.
Brian Nosek, professor of psychology at the University of Virginia and executive director of the Center for Open Science, cautioned against concluding either that reproduced studies were thereby proved, or that those that could not be reproduced must have reported “false positives”. It was also possible, he said, that the replication could be a false negative, or that the original study and its replication had important methodological differences.
“One reason for the latter would be if the replication team did not implement the procedure correctly or just did a terrible job conducting the study,” he said – although he noted that every effort had been made to maximise rigour.
Gilbert Chin, senior editor at Science, described the findings as “somewhat disappointing”, but emphasised that “the outcome does not speak directly to the validity or the falsity of the theories” underlying the experiments.
“What it does say is that we should be less confident about many of the original experimental results that were provided as empirical evidence in support of those theories,” he said.
Alan Kraut, executive director of the Association for Psychological Science and a board member of the Center for Open Science, noted that even statistically significant “real findings” would “not be expected to replicate over and over again…The only finding that will replicate 100 per cent of the time is likely to be trite, boring, and probably already known.”
Professor Nosek conceded that there was “a lot of room to improve reproducibility”, but he did not see the paper as telling a “pessimistic” story.
“The project is a demonstration of science demonstrating an essential quality: self-correction,” he said. “A community of researchers volunteered their time to contribute to a large project for which they would receive little individual credit…It shows that many scientists embrace scientific ideals and demonstrate their commitment to those by reflecting them in daily practice.”
Moreover, he cautioned, the project was “a drop in the bucket in terms of trying to get a precise estimate of the overall reproducibility of a discipline, let alone of science more generally”.