In many scientific papers, the core of the analysis is computational. Researchers spend months – sometimes years – collecting and cleaning data, writing and debugging computer code, and then running and rerunning their work. Yet those data and code never enter the peer-review process. No wonder, you might argue, that reproducibility is not the norm in modern science.
When reviewing a manuscript, journal editors and referees have traditionally had to assume that the results outlined are the genuine output from running the researchers’ computer code on their data. Over the past decade, some journals have begun to instruct authors to upload their code and data to dedicated online repositories after the acceptance of the paper, so that, in principle, other researchers can download all the necessary resources to redo their analysis. However, such initiatives have been only partially successful in improving transparency.
There are two main reasons. First, the posted code and data are not checked systematically. Their quality, therefore, is sometimes low – particularly because researchers lack time and incentives to prepare them properly. This makes it hard even for specialists to redo the analysis and fully reproduce an original study.
Second, an increasing number of academic papers rely on confidential data relating to individuals; examples include data on income, employment, taxes and health. These are available only to accredited users within a secure computing environment and cannot be shared. In some cases, an anonymised version of the data can be made public, but recent evidence suggests that this approach is not yet able to provide a guarantee that privacy is preserved. A paper recently published in Nature Communications shows that 99.98 per cent of Americans can be identified from any anonymised dataset with as few as 15 attributes, such as gender, zip code or marital status.
That well-trained researchers are sometimes unable to replicate the results of papers published in their field is a serious concern and calls for action. Some academic journals take the issue very seriously and rerun authors’ code on their data to check for reproducibility. The journal Biostatistics has been implementing such a verification process for several years, and the American Economic Review recently announced that it is about to do the same. Many journals, however, lack the time or specialised staff to deal with numerous software and data sources.
As an alternative, we advocate an external solution provided by a specialised certification agency, acting as a trusted third party. To this end, we recently launched cascad, the Certification Agency for Scientific Code and Data, as a non-profit academic initiative.
When a researcher requests a reproducibility certificate, a cascad reviewer runs their code on their data to verify that the output corresponds to the results presented in the tables and figures in their manuscript. The certificate can then be submitted to journals alongside the manuscript, giving the editor and reviewers confidence that the paper is all that it seems.
Another key advantage of a trusted third party is its ability to certify the reproducibility of research based on confidential data. For instance, as shown in a recent publication in Science, cascad not long ago partnered with France’s Secure Data Access Centre, a public body that allows researchers to access and work with confidential governmental data under secure conditions. The centre creates a virtual machine allowing researchers to remotely access the specific datasets needed for their projects, as well as the required statistical software. The cascad reproducibility reviewer then accesses a virtual machine that is a clone of the one used by the author (same data, same code), and the whole process is fully conducted within the secure computing environment.
Making research reproducible calls for more joint efforts such as this between academic journals, researchers and data providers. Given researchers’ relatively low reproducibility literacy, it is also vital to train them – especially the next generation – to understand and comply with the main principles of reproducible research.
Taking reproducibility seriously is a prerequisite for making science trustworthy and useful to society.
Christophe Pérignon is professor of finance and associate dean for research at HEC Paris, and Christophe Hurlin is professor of economics at the University of Orléans, France. They are co-founders of cascad, the Certification Agency for Scientific Code and Data.
Print headline: A badge that gives assurance
Register to continue
Get a month's unlimited access to THE content online. Just register and complete your career summary.
Registration is free and only takes a moment. Once registered you can read a total of 3 articles each month, plus:
- Sign up for the editor's highlights
- Receive World University Rankings news first
- Get job alerts, shortlist jobs and save job searches
- Participate in reader discussions and post comments
Or subscribe for unlimited access to:
- Unlimited access to news, views, insights & reviews
- Digital editions
- Digital access to THE’s university and college rankings analysis
Already registered or a current subscriber?Sign in now