The US government is funding a $7.6 million (£5.9 million) project that aims to use algorithms to create “confidence scores” for the reliability of social science research papers.
The Defense Advanced Research Projects Agency has launched the initiative in partnership with the Virginia-based Center for Open Science with the goal of giving Department for Defense policymakers a straightforward and quantitative indication of the likely reproducibility of published scholarship.
The move is a sign of concern among politicians and civil servants about the relevance of research findings in the social sciences and potentially a wider range of disciplines. An analysis of major social science studies published in Science and Nature between 2010 and 2015, released by the COS last year, suggested that the main findings of more than one in three papers could not be replicated.
Under the Systematizing Confidence in Open Research and Evidence (Score) project, the COS and colleagues at the University of Pennsylvania and Syracuse University will build a database of about 30,000 claims made in published papers, automatically incorporating and manually extracting information about how these claims are communicated in the article as well as data from external sources, such as how often the work has been cited, whether the data are openly accessible, and whether the research was preregistered.
Academics will review and score around 3,000 of these claims in surveys, panels or prediction markets – betting – for the likelihood of their reproducibility. Meanwhile, the information in the larger database will be used to create artificial intelligence tools that will score the same claims as the experts.
Finally, hundreds of researchers will run replication tests on a sample of the claims to test the academics’ and the algorithms’ ability to predict reproducibility.
Tim Errington, director of research at the COS, told Times Higher Education that, if the algorithms proved reliable, they could potentially be applied to other subject areas, with the aim of making confidence scoring a reliable metric with which the scientific community could evaluate a paper’s value.
The project with Darpa offered an opportunity to “scale up” last year’s COS analysis of reproducibility in the social sciences “to a really valuable level”, Dr Errington said.
That study found that academics were easily able to predict whether experiments would turn out to be reproducible.
But Brian Nosek, professor of psychology at the University of Virginia and executive director of the COS, said that it remained to be seen whether algorithms would be reliable enough to influence policymakers’ assessment of the value of published research.
“There is a long way to go to test the extent to which algorithms provide evidence that is precise, reliable, generalisable, and valid,” he said. “The extent of appropriate use depends heavily on those things [but] what seems most likely to me is that algorithms will ultimately provide a relatively cheap and quick heuristic for initial review of the quality and credibility of evidence.”
One longer-term possibility, then, is that such metrics could one day provide a more favourable alternative to the journal impact factor, which assesses the likely worth on research based only on which periodical it appears in. Dr Errington caveated that “whether any metric can really replace impact factor is dependent on the community that surrounds it”.