The data deluge in the arts and humanities challenges e-Science, but the developments will benefit everyone, say Peter Halfpenny, Rob Procter and David Robey
The needs of the physical and biological sciences have largely driven the development of e-Science. Now, however, social scientists and researchers in the arts and humanities are asking how the programme can help them address their own versions of the "data deluge".
Among future research priorities identified in a recent review of UK social science are globalisation, immigration and ageing populations. The scale and complexity of these problems calls for collaboration across disciplinary boundaries and demands more powerful research tools.
Computer-based statistical modelling and simulation is an established social science research tool. But new models will need to be more complex and will need more computing power. Running a statistical model of the UK population at the level of individuals and households is impractical on the resources available to most researchers. But e-Science allows access to extended computing power.
In addition to getting more power, researchers also need to find and access the right kinds of data. Practices and infrastructure for sharing and reusing social data are well established in the social sciences. Data centres such as the UK Data Archive, Mimas and Edina host a wide range of information that makes it easy for researchers to access individual datasets. But differences in databases and data formats mean that linking disparate datasets together to answer complex research questions can still be tricky. E-Science offers solutions to these problems by providing richer mechanisms for data description and ways to hide the heterogeneity.
Meanwhile, the social sciences are witnessing the beginnings of what may be a fundamental and decisive shift in data collection away from traditional survey-based methods. In future, research data are increasingly likely to be derived from administrative records - education, employment and so on, the worldwide web (including news and corporate sites, wikis and blogs), digital communications (e-mail, newsgroups, speech and SMS) and transactional records (purchases). Exploiting these requires more sophisticated techniques for multimedia data fusion and management, more powerful computational tools for data annotation and analysis and, critically, new mechanisms to ensure the privacy of data subjects.
E-Science offers access to data, data management and analysis services via a standard web browser.
The arts and humanities have come to e-Science relatively late, in part because the Arts and Humanities Research Council acquired its research council status only last year and missed out on the substantial e-Science funding that the other councils received.
But there can be no doubt about the potentially transforming impact of e-Science on these disciplines. Google's recently announced facility for downloading the full text of out-of-copyright books is the most conspicuous addition to a huge wealth of digitisation and data-creation projects across the arts and humanities domain. Virtually all the major and a great many of the minor literary texts in the major European languages can be found in digital form somewhere, alongside an enormous variety of databases and electronic archives of images, sound, historical records, linguistic corpora and much else.
These data exist in a large variety of technical forms and each collection, sometimes each item, can typically be accessed only on its own. Grid technology will progressively allow more comprehensive access to this data.
The benefits do not flow just one way. The arts and humanities present new challenges to e-Science developers. Data in the subject are not only highly varied in content and encoded in a wide variety of forms; they are also typically quite "fuzzy", lacking in the kind of structure that makes most scientific data easier to process. New technologies developed to deal with this kind of data will eventually go on to help the sciences.
The arts in particular open up fundamentally new kinds of use for grid technologies. The Access Grid is a good example: while all researchers can use its videoconferencing facilities in much the same way as in other disciplines, theatre studies specialists are developing imaginative ways of using it for artistic performances as well.
Having missed out on the earlier rounds of funding, the AHRC is now taking its e-Science agenda forward in a joint initiative with the Engineering and Physical Sciences Research Council and the Joint Information Systems Committee. A series of workshops and demonstrators has already been funded in a wide range of subjects, an Arts and Humanities e-Science Support Centre established at King's College London, and a call for bids recently announced for research grants to a total value of up to £2 million and six four-year postgraduate studentships.
Peter Halfpenny is executive director and Rob Procter is research director of the Economic and Social Research Council's National Centre for e-Social Science. David Robey is director of the Arts and Humanities Research Council ICT in Arts and Humanities Research programme. Details: www.ncess.ac.uk