With a never-ending stream of emails, tweets, Facebook videos and (at least for academics) research papers, we are producing an increasing volume of data. Even the most diligent of researchers struggles to keep up with all the publications in their field; at the same time they have access to an unprecedented amount of data from which to glean new insights.
To cope with this flood of new information, researchers are increasingly turning to text and data mining. Academics use algorithms to sift through existing discipline literature, vast databases of experimental data, or the leftovers from our online lives. From this they can understand, for example, how certain vocabulary has risen and fallen in popularity over the decades, or build new databases of microbes by automatically reading new research.
But in the European Union, there is a problem: research organisations say that publishers, which own the rights to many of these crucial databases, prevent them from taking full advantage of this new technology, and are fighting for a change in the law.
Even if researchers have permission to access published papers, through a subscription to a journal, for example, it is often unclear whether they are also allowed to mine them. The current situation “leaves authors in a vacuum not knowing whether what they are doing is legal or illegal”, says Lidia Borrell-Damián, director of research and innovation at the European University Association (EUA).
When some researchers come to publish, they suddenly find that they lack the right copyright permissions, the EUA has discovered through surveys of members. Publishers can put specific clauses in their licences that rule out mining, and gaining permission to mine content from lots of different publishers can be hugely complex.
Some researchers even fear being sued after publication for having violated copyright rules, says Marie Timmermann, EU legislation and regulatory affairs officer at Science Europe, which represents research organisations across the continent. "It's not clear that the legal access they have gives them the rights they need" to mine content, she adds.
This matters for European research because it arguably puts academics at a disadvantage compared with the US, where the situation is seen as less restrictive. For example, the US has no equivalent of the EU database regulations that give their owners special protections against having their data reused.
Perhaps understandably, publishers have sought to hang on to control over their data. "Data is the new fuel right now," says Natalia Mileszyk, a public policy expert at Communia, which lobbies for a reduction in copyright protections.
Meanwhile, an important legal precedent was set in a landmark US court case dating from 2005 that gave Google the right to mine the content of millions of books without the permission of authors, in order to create a vast searchable library. “If you hinder research in Europe, you risk losing research to other parts of the world,” Timmermann says.
But academics in Europe could soon be granted an exemption from the need to worry about infringing copyright by a new EU directive currently being scrutinised by the European Parliament. As well as making life easier for scientists, the European Commission hopes a blanket exemption across the EU will smooth cross-border collaboration. At the moment, EU member states are free to enact their own exemptions, but so far only the UK has done so.
But campaigners say the exemption as it stands is much too narrow. It would apply to "research organisations" – defined as acting on a not-for-profit basis, and with a "public interest mission" – carrying out "scientific research".
If only not-for-profit universities are exempt, they fear that this might cause problems when universities want to work with established companies, or help students and researchers to spin out startups that use data mining to create new digital products. In its current form, the directive would “create a grey zone for public-private partnerships”, Timmermann says. It would mean "we end up with partners with different rights – that causes too much uncertainty", she adds.
Another concern is that an exemption for academics might not stop publishers restricting how researchers use their databases. Some big data owners have stipulated that researchers mine their data using the publishers' own software and servers.
"This is limiting...if a researcher wants to access the data in a new way," says Borrell-Damián. "What is very important is that researchers need to be able to mine using their own algorithms," she continues. But she believes it is unclear whether the new exemption would allow this kind of flexible access.
Nor it is clear it will stop publishers charging "an excessive price" for access, Borrell-Damián argues; a database might technically be "open" for researchers to use, but the price prohibitive.
For now, observers hope that the directive will come into force next year – or possibly at the end of 2017 if European legislators act quickly. It may not be perfect in the eyes of research organisations, but it should make managing the overwhelming flow of new data with the aid of machines that little bit easier.