Big data are widely seen as a game-changer in scientific research, promising new and efficient ways to produce knowledge. And yet, large and diverse data collections are nothing new – they have long existed in fields such as meteorology, astronomy and natural history.
What, then, is all the fuss about? In my recent book, I argue that the true revolution is in the status accorded to data as research outputs in their own right. Along with this has come an emphasis on open data as crucial to excellent and reliable science.
Previously – ever since scientific journals emerged in the 17th century – data were private tools,owned by the scientists who produced them and scrutinised by a small circle of experts. Their usefulness lay in their function as evidence for a given hypothesis. This perception has shifted dramatically in the past decade. Increasingly, data are research components that can and should be made publicly available and usable.
Rather than the birth of a data-driven epistemology, we are thus witnessing the rise of a data-centric approach in which efforts to mobilise, integrate and visualise data become contributions to discovery, not just a by-product of hypothesis testing.
The rise of data-centrism highlights the challenges involved in gathering, classifying and interpreting data, and the concepts, technologies and social structures that surround these processes. This has implications for how research is conducted, organised, governed and assessed.
Data-centric science requires shifts in the rewards and incentives provided to those who produce, curate and analyse data. This challenges established hierarchies: laboratory technicians, librarians and database managers turn out to have crucial skills, subverting the common view of their jobs as marginal to knowledge production. Ideas of research excellence are also being challenged. Data management is increasingly recognised as crucial to the sustainability and impact of research, and national funders are moving away from citation counts and impact factors in evaluations.
New uses of data are forcing publishers to re-assess their business models and dissemination procedures, and research institutions are struggling to adapt their management and administration.
Data-centric science is emerging in concert with calls for increased openness in research. In response, the European Commission is leading efforts to assess the practical and regulatory changes needed to foster the dissemination and re-use of all research outputs, including data, software, protocols and materials.
The EC's research and innovation commissioner, Carlos Moedas, has made open science one of his top priorities. The many EU initiatives in this area include the development of a European Open Science Cloud and the creation of the Open Science Policy Platform, a group assembled by the Commission in late 2016 to inform the implementation of open science policies.
The OSPP includes representatives from universities, businesses and publishing, as well as networks of experts, science academies and learned societies. I am on the panel on behalf of the Global Young Academy, the voice of young scientists around the world, comprising 200 top-ranking researchers from all disciplines and 70 countries. All groups represented on the OSPP recognise the challenges of adopting the principles of open data and making them work in different research contexts and across disciplines.
The legal, cultural, economic and institutional landscapes that will best support data-centric open science are not yet clear, nor is there agreement over the future shape of scientific publishing and what metrics should be used to encourage best practice.
Such a significant transformation of scientific governance creates risks as well as opportunities. Implementing open data too quickly and in a top-down manner could compromise scientific excellence by overriding the traditions of different fields and methods built up over centuries. Each discipline’s sampling techniques, for example, are finely tuned to match the characteristics of the objects it studies.
Failure to make provisions for the different research environments found across Europe could also deepen existing inequalities in terms of access to funding, training, equipment and international visibility. And, of course, the commodification of research data parallels a broader "datification" of society, with attendant concerns around privacy, confidentiality and ownership, particularly when data relate to health and other personal information and the environment.
The very existence of groups such as OSPP is an important step towards fostering dialogue over these issues. In the immediate future, it is crucial for all those concerned with research and development – and particularly researchers themselves – to engage with activities of this and other groups addressing science policy and governance, and be willing to identify and address the issues raised by data-centric science.
Sabina Leonelli is associate professor in the philosophy of science at the University of Exeter, and the author of Data-centric Biology: A philosophical study (Chicago University Press, 2016). This week she is presenting the ideas within this blog at the 2017 meeting of the American Association for the Advancement of Science (AAAS) in Boston.