Seize the data

The information deluge and the promise of open enterprise offer UK universities an opportunity not to be missed, says Geoffrey Boulton

June 21, 2012

Science is at a crossroads. Although open enquiry remains at its heart - based on the publication of scientific theories together with the experimental and observational data on which they are based - the "data deluge" has challenged that process, and computing and communications technologies are changing the game in other ways.

The vast data volumes that we now create, store and manipulate can no longer be accommodated by the published articles based upon them, and this breaks the link between concept and information. Yet many citizens are no longer content to accept the pronouncements of scientists in areas that affect their lives without first scrutinising the evidence: after all, they are often the ones who have paid for it. Meanwhile, data-led approaches are becoming integral to science as powerful means of identifying previously unanticipated relationships and are creating new tools of discovery. So, although open-access publishing has captured the headlines in recent months, open data is a much deeper issue.

It is for these reasons that the Royal Society chose to conduct a year-long study into how science can adapt to these imperatives, resulting in the publication of its report, Science as an Open Enterprise, this week. At its heart are 10 recommendations designed to facilitate new ways of working. They address scientists, learned societies, funders of research, journal publishers, business, the government and regulators, all of whom have a part to play.

Open approaches have been pioneered in bioinformatics, in part inspired by the 1996 "Bermuda Principles" - an agreement that primary genomic sequence data should be rapidly released into the public domain "in order to encourage research and development and to maximise its benefit to society". Success has depended on sharing, with groups contributing to open databases that are accessible to the whole community in the realisation that much more can be gained by exploiting a massive common resource than by going it alone.

Many areas of other disciplines (for example, astronomy, nanotechnology, geoscience, social and public health sciences) are following a similar trajectory, not because there is a top-down edict that they should, but because openness has paid off in enhancing the productivity and creativity of their science.

An inspiring example of the novel use of technology was a 2009 blog by Timothy Gowers, Fields medallist and Royal Society 2010 anniversary research professor in mathematics at the University of Cambridge - and just knighted. In it he posted a serious unsolved maths problem and invited others to contribute to its solution. In just over a month and after people had made more than 800 comments, some rapidly developed, some discarded, the problem was solved. This utilised the collective intelligence of a community: as Gowers commented, "it was like driving a car whilst normal research is like pushing it". Some have argued that the openness and sharing in these examples and in other novel modes of interaction that communications technologies have stimulated place us on the verge of a second open-science revolution, three centuries after the introduction of scientific journals ushered in the first.

Given the central role of universities in the scientific endeavour, it is important that they adapt to these changes and, where appropriate, play a role in leading them. Part of that role should be to help to remove constraints on researchers' behaviour and introduce incentives, for example by giving credit for data generation on a par with that given to conventional publication. The skill and creativity required to acquire significant datasets represent a high level of scientific excellence and should be rewarded. Citable data are typically referenced more frequently than the initial papers based on them.

Other issues include the need to train scientists in the use of data as well as training "data scientists"; the role of the science library as the printed page loses its centrality in scientific communication; supporting the data management needs of researchers; ensuring the data created are made accessible; and how to negotiate the demands of openness and commercial relationships. That final balance will be difficult to strike. It has been said that "data are the new raw material for business". One analysis of UK data equity estimated it to be worth £25.1 billion to British business in 2011. This is predicted to increase to £216 billion or 2.3 per cent of cumulative gross domestic product between 2012 and 2017. Although most of this is forecast to come from greater business efficiency in data use, £24 billion will stem from an increase in commercial data-driven R&D.

The economic context alone draws attention to the huge importance of the issue, and in normal times would justify serious further investment in the science base. There is, among other things, an urgent need to develop software tools for data management. In announcing a major boost for such development recently, a US White House spokesman commented that "the future of computing is not just big iron. It's big data." But this is not just about the future of computing: it is about the future of science itself. The UK is a genuine world leader in research and, as the tectonic plates of science shift, we must be nimble in adapting. The transition will not be simple, but we cannot afford to miss the moment.

Register to continue

Why register?

  • Registration is free and only takes a moment
  • Once registered, you can read 3 articles a month
  • Sign up for our newsletter
Please Login or Register to read this article.