Crystal clear for all to see

May 5, 2006

Information jams and long waits to publish research evidence could be a thing of the past thanks to new open- data sources that can make results available online to the scientific community, reports Becky McCall

If the periodic table is anything to go by, the lifeblood of chemistry is to be found in its reams of data featuring atomic numbers, configurations or diagrams of three-dimensional molecular structures. But these data are only as good as a user's ability to manage and interpret them. To the relief of researchers, in chemistry and elsewhere in academe, help is at hand. The Joint Information Systems Committee is leading a nationwide initiative to develop an infrastructure that will make all the primary data that underpin scientific papers easily accessible online.

Mike Hursthouse, professor of structural chemistry at Southampton University, and his team of crystallographers and software specialists generate vast amounts of data. To filter and manage these data, they are piloting a dataset called e-bank, available online and based on the principle of open data. "We've set up a website that allows molecular data, for example fine structural details, to be easily and rapidly disseminated without having to go through the whole long-winded publication process," Hursthouse says At the moment, the dissemination of all these chemical data gets stuck in a bottleneck whereby only about 25 per cent of data generated ever reach the public domain. "We often want to easily compare different molecular structures and properties to arrive at new conclusions that we would then write up as a paper, but if only a small proportion of that data is available, it severely restricts the scope of that activity," Hursthouse says.

Southampton is further developing the network in a project known as R4L, which explores ways of connecting the crystal data repository directly to the instruments of analysis as well as methods that ensure that researchers secure rights over the data.

In June 2005, Jisc launched its £4 million Digital Repositories programme of 25 pilot projects designed to enable the free sharing of information and thus to encourage collaboration and widespread communication between individuals across research and education. The projects explore the technical, cultural and managerial issues around the open-data programme.

Neil Jacobs, who manages the programme, says: "It's based on the 'wiki' principle, which means that information is posted on a website and people are free to do as they choose."

Five of the pilot projects involve data derived from scientific research. By its nature, science easily lends itself to the open-data principle, given that it is often based on replicating or analysing someone else's data.

"If you're a researcher, then it's all about generating results and raising the profile of your work. At the moment, the pilot projects are determining how to accommodate researcher rights and examining how user-friendly the material is. The benefits come once the data are interoperable, so if the data were genetic information, then this material could readily be compared to social data or whatever to arrive at interesting results and trends," Jacobs says.

When the system is fully operational, researchers will not only file a research publication in a journal repository, but their original dataset will also be deposited into their institutional repository and perhaps a discipline-related archive, too. Any researchers doing follow-on work elsewhere could access these data through the original researcher's publication. Open data will allow the scientist to click from a publication through to the original dataset and even cited datasets through the network.

Of course, any researcher will testify that data are generated through weeks, months and even years of hard work, and certainty about data-usage rights is fundamental to the smooth running of the infrastructure. Claddier, designed to track and log the use and flow of data, is a project in the Jisc stable that is examining this issue.

Bryan Lawrence, who leads the project for the Council for the Central Laboratory of the Research Councils alongside the universities of Reading and Southampton, says: "We want to link datasets to documents physically. So if as a researcher you want to source and use someone else's data, you can use a system known as track-back to ensure that the dataset is annotated with a citation and even potentially an excerpt of your new work. Such citations add to the provenance of the data and increase the confidence of other users."

Digital Repositories is not the first open-data project, but Jisc's programme expands the concept on an unprecedented scale.
Back to ICT index page

Register to continue

Why register?

  • Registration is free and only takes a moment
  • Once registered, you can read 3 articles a month
  • Sign up for our newsletter
Please Login or Register to read this article.


Featured jobs