Harnessing open access publishing with artificial intelligence

kuansan-wang-the-res-1

In the era of information overflow, new tools are creating opportunities for academia

The question of how intellectual property rights should be treated in the age of open access publishing and artificial intelligence has sparked a debate among stakeholders, including scientists, publishers and funders. But to resist the transformative power of AI-assisted technologies, such as Microsoft Academic, would be to ignore the fact that new platforms are helping to increase exposure and aid research.

Kuansan Wang (pictured), director of Microsoft Research Outreach Academic Services, says that democratising the world’s access to AI and bleeding-edge technology could enable profound scientific advancements. The convergence of big data, cloud computing and AI has created a research environment where the cost of premier access need no longer be prohibitive to all but large corporations and academic institutions.

However, much of academic publishing is still wedded to a centuries-old business model.

“It used to be that the authors, the researchers, had to pay a hefty fee to get their research published,” says Dr Wang. “And the readers have to pay fees again just to read the papers. For whatever reason, the fees are increasing dramatically. That doesn’t seem to be in line with the digital transformation’s ability to lower the costs of production. Since the open access movement started, decades ago, we have seen more and more open access publishers and they are doing very well…More funding agencies and research institutions are increasing their budget to encourage their researchers, scientists and technologists to make their research published as open access media.”

Advocates of open access publishing believe that international initiatives such as Plan S will help enforce global compliance. Supported by an international consortium of research funding agencies, Plan S aims to have all scientific publications funded by public grant be published on open access platforms by 2020.

The success of open access, however, requires technological solutions in order to maximise its potential. While green open access models can enable researchers to choose their own archive mechanisms, they are of little use if research gets lost in the noise of the web.

Microsoft Academic can help, says Dr Wang, via the forensic scope of Bing, which in addition to its search engine functionality sifts the internet to find self-archived academic journals.

This brings up the issue of whether search engine optimisation might be used to skew the relevance of academic research towards papers that are readily searchable. For Microsoft, the way to avoid this lies in more openness. To support its methodology, it publishes its analytics algorithm and makes its datasets available, clearly showing how Microsoft Academic catalogues research.

“Everyone can take a look at how we ended up with the ranking decision,” says Dr Wang, “how we actually identify what is the most incredible work that we recommend to our users.”

He argues that evaluation metrics such the journal impact factor are flawed and may become obsolete. "There is plenty of scientific evidence to show that using journal impact factor is actually not the right thing to do to evaluate a research outcome," he says, adding that with machine learning having built huge datasets and Microsoft making public both the data and the algorithm that sorts it, the AI can be trained to evaluate research outcomes more effectively.

This training is what really excites Dr Wang. It occurs via the human interactions with Microsoft’s tools and enables the AI to become more efficient when handling a “humongous” dataset.

“In 1974, we see the world’s annual new publications exceeding one million papers,” he says. “Nowadays we are adding more than one million per month. Who can read it all!? My field, deep learning, which is a small sub-field of artificial intelligence, last year published more than 10,000 papers – 10,000! Which means I have to at least read 30 papers a day in order to catch up, without doing anything else. Fortunately, now I have intelligent machines to go over them and bring the most relevant ones to my attention.”

The power of Microsoft Academic’s search tool enables users to interrogate data to find the research that is relevant to them. The curation of these datasets – and of Microsoft Academic Graph, which informs researchers of new scientific discoveries – can only be facilitated by AI agents with an understanding of natural language. Mere mortals cannot keep up.

Find out more about Microsoft in Education.