In a 2016 editorial on data sharing in research, the editors of The New England Journal of Medicine wrote: "There is concern among some front-line researchers that the system will be taken over by what some have characterised as 'research parasites'."
The words sparked a major furore online, in the open research community and beyond, that ultimately led to a "clarification" from the journal. As a result there are now awards for being a "research parasite", which I, for one, hope will go from strength to strength.
The story illustrates perfectly the challenge of encouraging data-driven research. Although most acknowledge the potential of open research data in moving forward the research endeavour, we need to reconfigure the system to really grasp these opportunities. Ultimately the biggest issues, as illustrated by the "research parasites" debate, are the attitudes, behaviours and cultures of researchers themselves, but there are other important building blocks that need to be put into place too.
First, we need an appropriate infrastructure for the storage, preservation and sharing of research data. Researchers need services that are easily available and simple to use for their data, and that allow data to be deposited, with appropriate metadata, with as little friction as possible.
A healthy ecosystem is building to allow this. Services such as Figshare and Zenodo, investment by universities, and, in the UK, a major initiative from Jisc, all contribute to a rich landscape. There is no shortage of places for researchers to share data, although challenges remain for the discovery of that data.
Second, there is the policy environment, where the UK has many strengths. Most notably, the Concordat on Open Research Data, published in 2016, which is a statement of shared principles to guide the development of a coherent policy framework for open research data.
There are also data policies already in force from many funders, although there is still work to be done in getting better alignment between those policies and the Concordat. The forthcoming creation of UK Research and Innovation will certainly help with this coordination.
In support of building a policy environment that encourages the sharing of research data, the Higher Education Funding Council for England proposed in the consultation for the next research excellence framework that policy and practice in this area should be considered as part of assessment of the research environment.
There are also complex policy questions to be addressed in the use of data in research from sources outside of the research process. Administrative data, health records and the extensive "data footprints" we all now leave online have the potential to be valuable sources for research. The recent report from the Royal Society and British Academy provides in depth analysis and makes sound recommendations.
So, progress is being made to provide the frameworks, systems and policy to support the sharing of research data. The real challenge is to get that data used by researchers.
In principle, there is no reason at all why existing funding mechanisms could not be used for research that draws on existing data – the "research parasites" debate demonstrates that there are potential deep-seated cultural norms within research that favour the collectors of data. However, it's not unreasonable to think these norms might influence peer review, leading to difficulties in funding data-driven research in practice.
This effect could well be exacerbated by an overly narrow focus by researchers or institutions on metrics related to research grant income. Work on existing data is likely to be less costly than that requiring new data to be collected.
One solution is for funders to target funding specifically at research that uses existing datasets, such as the ESRC's Secondary Data Analysis Initiative. This might overcome biases in peer review, but does risk normalising the problem. What we need to do is tackle the cultural issues head on, and accept that we all – researchers, institutions and funders – own the challenge collectively.
Building on the progress so far, what could we do to make a difference? Here are four suggestions:
- We need to be realistic about funding. Properly preserving and making research data available will inevitably incur costs, but these should be seen as part of the costs of research, not as some additional add-on. In the short term this might mean funding less research, but this needs to be offset against the potential for achieving new insights at much lower cost from the re-use of data. The additional cost also buys benefits in terms of the extra reliability that comes from the potential re-analysis of data.
- We need to have a broad definition of "data" that encompasses all the resources that support research findings. In some disciplines these resources will be things we might not consider to be data, such as annotated texts or interview transcripts. The key question for the publication of any research findings should be what additional resources has the research generated and how can they be made available?
- We need to reward researchers (in national and local assessments of their work) who publish the data they generate and who use existing datasets in new research or for re-analysis.
- We need to require researchers, as part of applications for funding, to explore whether existing datasets could be used to answer their research questions. Funding for collection of new data should only be given where there is a sound argument that no existing data can be used.
Sharing and re-using data has the potential to transform the research process. The building blocks for a new approach to research are in place; are we all ready to rise to the challenge?
Dr Steven Hill is Head of Research Policy at Hefce.
This article was commissioned by Times Higher Education in partnership with Jisc as part of the Jisc Futures series. Jisc is the UK's expert body for digital technology and resources in higher education, further education, skills and research.