Funders mull robot reviewers for Research Excellence Framework

Research England examining study on how AI might be used to predict quality of research outputs

September 15, 2022
A humanoid robot on show to illustrate Funders mull robot reviewers for REF
Source: Alamy

Research England has commissioned a study of whether artificial intelligence could be used to predict the quality of research outputs based on analysis of journal abstracts, in a move that could potentially remove the need for peer review from the Research Excellence Framework (REF).

The study, part of a broader review of the REF called the Future Research Assessment Programme, investigates whether certain words contained in the summary of research papers submitted for the latest exercise correlate with assessments by independent evaluators, and whether the scores could be predicted in future using an algorithm.

With existing research suggesting quality can be predicted in this way, it may pave the way for future assessments to at least partially substitute REF peer reviewers with computer analysis, thereby reducing the burden on the hundreds of experts who in the latest exercise scrutinised 185,594 outputs, of which 41 per cent were deemed world-leading and 43 per cent internationally excellent. In the 2014 REF the cost of panellists’ time amounted to £19 million, part of the £246 million overall cost, according to official estimates.

Mike Thelwall, professor of data science at the University of Wolverhampton, who has already completed the study into AI’s potential use in the REF, told Times Higher Education that his algorithm had more success predicting evaluations in some units of assessment than others, but that there were also other challenges associated with this kind of automated review.

“Even a system which is 100 per cent accurate in its predictions needs to think about the incentives it generates,” said Professor Thelwall. “If a predictor of quality is the mention of ‘randomised controlled trials’, you will probably find more people mentioning this in their abstracts.

“Technology-based solutions may not always be best for the sector, even the predictive success is high.”

Earlier this year Professor Thelwall published a study in Quantitative Science Studies which affirmed that machine learning analysis of abstracts and author data could be used to predict articles’ citation scores.

As well as Professor Thelwall’s study – due to be published in November – Research England has also commissioned a review of the potential use of metrics in the REF more broadly. This is likely to focus on whether bibliometric data such as citations could be used instead of peer review, something which has previously been rejected.

As part of his Research England work, Professor Thelwall also quizzed academics about their views on using AI in their areas of assessment. While there was “general scepticism” about the experiment, some researchers were also enthusiastic about the use of technology, Professor Thelwall told THE.

“Some people are really gung-ho on automated peer review because they hate the amount of time they spend doing these assessments, though others say it is ridiculous to try to replicate peer review, which, they believe, can never be done by a machine in the same way as a subject expert,” said Professor Thelwall.

Register to continue

Why register?

  • Registration is free and only takes a moment
  • Once registered, you can read 3 articles a month
  • Sign up for our newsletter
Please Login or Register to read this article.

Related articles

Reader's comments (2)

A ranking of the journals and then using that to predict the REF outcomes will work much better. Just agree the journal ranking and then award so much credit for each article in each journal. The problem with "reading" each article is that if the submission is from Oxford, Cambridge. Warwick, Imperial etc then it will get a 4* rating while for other "lesser" Universities it will be given a 3* ranking even if it is in the same journal. The REF has always favoured the already advantaged.
Funny time of year for an April fool. Has the prof not heard of the axiom that as soon as a characteristic is used as a performance indicator it becomes useless as a measure? And won't we have to retain a control sample for peer review, and have an appeals system, and invest in updates, and AI countermeasures, ad infinitum?


Featured jobs