Whose data is it anyway?

A PhD student's run-in with a senior researcher sets Tim Birkhead thinking

June 18, 2009

A colleague recently told me about one of his PhD students publishing his first paper in a well-respected scientific journal. The student was proud of his accomplishment, but his sense of achievement quickly evaporated when he received a rather blunt email from a senior figure in the field, demanding to know why he had not made all his data available in the paper's supplementary material.

Being chastised by a senior researcher didn't seem to mark a great start to the student's academic career, and he spent the entire night crafting a reply in an attempt to dig himself out of what he perceived to be a slippery academic hole.

The critic curtly pointed out that since the data had been obtained using public funds, all funding bodies required that the student make it freely and instantly available. Wondering whether this were true, I examined a few research council websites to see what they had to say.

The Organisation for Economic Co-operation and Development, the Biotechnology and Biological Sciences Research Council and the Natural Environment Research Council all state that publicly funded research should be openly available to the maximum extent possible.

I agree, but there are circumstances when it isn't appropriate for data to be made immediately available to all and sundry. Science is competitive and being scooped is a constant threat. There are no prizes for coming second. Allowing yourself to be beaten by someone using data that you placed in the public domain would be plain stupid.

Researchers typically spend a huge amount of time and effort collecting original data and it seems perfectly reasonable to me that they should - within reason - be able to publish it as and when they see fit.

As it happened, the doctoral student had several other papers in the pipeline that would also use some or all of the same data as in his first paper. He also fully intended to present the information in the last of the papers from his PhD.

Research students clearly should have priority in using their own data to establish themselves in the field. Only once all the papers from PhDs are published is it appropriate for them to make the information freely available.

In fact, the research councils acknowledge this. Nerc, for example, says: "Individual scientists, principal-investigator teams and programmes will be permitted a reasonable period of exclusive access to datasets they have collected."

Similarly, the BBSRC says that "researchers have a legitimate interest in benefiting from their own time and effort in producing the data, but not in prolonged exclusive use".

What seems to be missing right now is the opportunity for researchers to include a statement in their publications making it clear that their data will be made available once all the papers from their theses are published. I have never seen such a statement, but when I put the suggestion to half a dozen journal editors in my field, they all thought it entirely appropriate.

So why don't they do it? Scientific journals seem to be inconsistent in asking their authors about the availability of their data: they should ask, and if there are more analyses to come from it, then a brief statement to that effect would help to avoid any misunderstanding. If there are no further analyses, then making the information available is fine.

Who is responsible for making it available? Researchers are the main custodians of their data, but actual ownership is less clear. Nerc says that, "despite behaviour that might suggest the contrary, datasets frequently do not belong to those who have collected them. They generally belong to the employers of such data collectors."

The BBSRC, however, says that ownership lies with the investigators.

I have heard rumours about one scheme for making data available, in which journals would require datasets when papers were accepted for publication. They would then hold them for a fixed period of time before making them available on request.

A year may be an appropriate length of time, since this would allow researchers to use the data for their own purposes and get it published. Two years may be a possibility in extenuating circumstances.

The disadvantage of such an approach is that it would increase the already considerable administrative burden journals face. It would also be essential for researchers to submit data in a standardised form and provide the necessary accompanying notes to make it intelligible.

What did my colleague's PhD student do? After discussion with his supervisor, he scrapped the two-page reply to his critic and sent a one-line message pointing out that he was using the data for other work, and that as soon as he had published everything from his PhD he would make the information freely available.

In the meantime, the editors of scientific journals should perhaps think about drafting guidelines and adding appropriate statements about the availability of data.

Register to continue

Why register?

  • Registration is free and only takes a moment
  • Once registered, you can read 3 articles a month
  • Sign up for our newsletter
Register
Please Login or Register to read this article.

Sponsored