How do you manage a store of information as vast as the sky? Olga Wojtas finds out from astronomers at Edinburgh, who are pioneering ways to archive data from large astronomical surveys as part of an effort to build a global ‘virtual observatory’
The vastness of the night sky may be overwhelming, but equally overwhelming are the logistics of coping with the ever-increasing data arising from our study of it. Bob Mann of Edinburgh University’s Institute for Astronomy says the pace of change has been dramatic over the past five to ten years.
Edinburgh astronomers have been data curators of sky surveys for more than 30 years. The institute houses a major collection of photographic plates. Astronomers who wished to view them had to visit Edinburgh until its scientists developed machines to scan the plates and send digitised images via the internet.
The institute’s decades-long SuperCOSMOS Sky Survey (SSS) of the southern sky already occupies about three terabytes (one terabyte is 1,000 gigabytes) in data. Now Edinburgh’s Wide Field Astronomy Unit (WFAU) is preparing to deal with data from the next major UK-led sky study, the UK Infrared Deep Sky Survey (UKIDSS), which will dwarf the SSS.
“It will generate at least 10 terabytes of data each year for the next seven years or so,” Mann says. The next big survey - infrared observation of the southern sky - will generate data at five times the rate of UKIDSS.
Astronomy has historically been very individualistic, Mann says - researchers would make observations on a particular telescope then analyse the data by themselves.
Now, much more effort is spent on large surveys of regions of the sky in all the bands of the spectrum - including radio, X-ray and infrared - and the observations go into an archive that everyone can access. This allows much more research to be conducted from a given night of observations. However, it also presents astronomers with data management issues beyond their previous experience.
While individual astronomers have traditionally used data from one particular band of the spectrum, the aim now is to build a global “virtual observatory” that integrates all the archives. Then, if an astronomer finds an interesting object in data from an optical telescope, he or she can check it against X-ray or radio data.
However, there are billions of objects to be searched at any one time, and alongside the massive increase in data is the need to preserve and manage it so that it is searchable.
The WFAU is working with Edinburgh’s recently established Data Curation Centre (DCC). The centre is funded by the Joint Information Systems Committee and the research councils’ e-science core programme to help solve challenges that are too big or tough for a single institution or discipline.
Chris Rusbridge, the DCC director, says: “Disciplines differ in their data requirements. Astronomy seems to me a prime example of a discipline where pretty much everything observational should be kept because later hypotheses can be validated by earlier observations, which clearly cannot be made again once their moment has passed.”
That said, he adds that astronomy may be one of the easier areas for digital curation because astronomers are already convinced of the need and value of sharing their observational information and have been taking steps to do so.
Peter Buneman, the DCC’s research director, agrees: “One must not confuse complexity with size.” But he does not minimise the challenges. “In the past ten years, most sciences have embraced digital storage of data. This has completely changed the way scientific investigation is conducted. However, we are now faced with problems of archiving, preserving, communicating and annotating data. These areas are not well understood, and they are crucial for scientific research.”
Peredur Williams, manager of WFAU, believes astronomy may help pioneer data handling techniques that could be used by other disciplines that must cope with huge amounts of information. “The big advantage with astronomy is that you do not run into any problems of patient confidentiality or commercial sensitivity. The whole tradition of astronomy is for data to be publicly available, so it is a very useful testbed.”
Astronomers are building the AstroGrid as the UK’s contribution to the virtual observatory. This could, for example, allow users to combine X-ray data from Leicester University with infrared data from Edinburgh, Williams says. Given the huge volumes of data, it would be impossible for astronomers to access them at their desktop so they are likely to send their questions to the unit that holds the data.
Mann says this puts another layer of responsibility on whoever runs the archive, an issue that is set to occupy the WFAU and DCC over the next 18 months.
“Moving to having astronomers analysing data in the data centre involves a lot of issues about security,” he says. “If I’m operating an archive and allowing external people to run code analysing data, I have to be happy that their code is not able to do anything nasty in our computers. I do not want to risk having our archive overwritten.”