Climate research data release does not go far enough: more is needed

One way to resolve the dispute at UEA would be for climate-change scientists to enable others to replicate their work, says Darrel Ince

December 3, 2009

The hacking into the Climatic Research Unit computer files at the University of East Anglia (UEA) threatens to become a cause celebre. At the time of going to press, the number of Google hits for what the climate sceptics have dubbed "Climategate" were still increasing dramatically.

It is clear that if something is not done soon, this incident will harm research into global warming; in addition, the collateral damage may affect us all.

There have been calls from both the climate-change community and the climate sceptics for an independent investigation into the leaked documents. This obviously needs to happen, but it is not very clear what this will achieve.

I do, however, have a suggestion for an inquiry that would cover much of this investigation and also, perhaps, lay to rest some of the criticisms of the sceptics. Before I explain, it is worth saying that I believe climate change is happening, and that my belief is based on confidence in the peer-review process used by academic journals and conferences. It has, however, been a bit shaken by the events of last week.

A number of bloggers with technical expertise spent a couple of days poring over a chronicle describing software development at the Climatic Research Unit. It forms part of the hacked document set. The bloggers have suggested that there may be some major problems with the databases used and the program code.

I have only these blogs to rely on, so I do not know whether what they say is correct.

What I do know is that developing the type of computer programs used at UEA can be an error-prone process in two ways. First, there is a generic problem - if you make one small mistake in a program it could easily invalidate your results, even if the code was thousands of lines long.

Second, scientific computing is bedevilled by the floating-point problem. A computer stores numbers such as 1.44 (known as floating-point numbers) in an approximate way to a degree of accuracy. Once a program processes these numbers, say by repeatedly multiplying them together, accuracy errors known as rounding errors will occur if the programmer has not been very careful.

My suggestion is that UEA should release all the documentation, data and program code for independent review. Computer programs are unambiguous: they are like mathematical documents - they tell no lies. An independent examination of these programs will determine: whether there are any errors in the code; whether the statistics used were correct; whether the statistics were implemented correctly in the program code; and whether the data were used correctly.

Science relies on verification through repeatability, ie, the ability of a scientist to read an account of some research in an academic article and repeat the work. Articles that report on work where a computer analyses data or simulates some scenario should, as a matter of course, describe the code and also make it available to colleagues.

From such an examination, there are three possible scenarios.

First, the code may be cleared and much of the debate could be closed. It would not die completely because if the content of the emails reproduced in the media is correct, there are some embarrassing details that I am sure the writers regret sending.

Second, it may not be possible to find some or all of the code; this would mean that the work reported in any articles that use the code could not be verified and the work in its current state would be non-reproducible. The results could be relied on only if the code were rewritten ab initio.

Third, there may be errors in the code, gaps in data may invalidate the statistics used or the statistics may not have been implemented properly. This is obviously the nightmare scenario: it would mean that key work published on global warming that is being used by policymakers to spend millions, if not billions, of pounds is being informed by erroneous results.

A future step we need to take is to persuade journals and conferences to insist that if an article that is based on computer codes is to be published, the authors must make the codes and their data publicly available. Clearly, practical details will prevent it from being published conventionally (many programs run to thousands of lines of code), but it's easy to print a link to a website.

I look forward to the analysis of the code to confirm my view about global warming. It is, of course, not quite an ill wind: there are plenty of publications for statisticians, computer scientists and sociologists of science that may be generated from my suggestion and also from the release of the hacked files.

Please login or register to read this article

Register to continue

Get a month's unlimited access to THE content online. Just register and complete your career summary.

Registration is free and only takes a moment. Once registered you can read a total of 3 articles each month, plus:

  • Sign up for the editor's highlights
  • Receive World University Rankings news first
  • Get job alerts, shortlist jobs and save job searches
  • Participate in reader discussions and post comments

Have your say

Log in or register to post comments