By switching to quantitative markers to assess research we risk missing out on key discoveries, argues Ole Petersen
There is no substitute for expert judgment in assessing the research of a university department. So, while the new assessment framework announced in last week's Pre-Budget Report has merits, by failing to identify peer review as core to evaluating quality, it also holds major risks.
The basic tool for assessing quality - to determine distribution of funding - has long been subject-based judgment by peers.
This is about to change, at least for science, engineering, technology and medicine. The 2008 research assessment exercise will be the last.
Thereafter, a metrics-based system will be used, based on quantitative indicators, including research grant income and a "bibliometric indicator of quality". The only remnant of peer judgment will be advisory groups to "oversee" the process.
The Royal Society, responding to the consultation on the reform of research assessment and funding, recommended that "the primary assessment method for all subjects must be peer judgment" and "research indicators... should do no more than inform peer review panels". Clearly, this advice has not been heeded.
There are problems with peer review. It is time-consuming and thus relatively expensive. Furthermore, it is not always easy to exclude biased judgments. Nevertheless, scientists have not yet found a better way. But last week, Education Secretary Alan Johnson indicated that the Higher Education Funding Council for England was "confident that it should be possible to move quickly to a fully bibliometric method of measuring quality in science, engineering, technology and medicine". What could this bibliometric quality indicator be?
The Institute for Scientific Information's Web of Knowledge contains a vast amount of bibliometric data, including any publishing scientist's "citation report" with graphs, statistics and the h-index - a quantitative measure of citation impact. Citations of an individual's work by other scientists might signal peer approval, so could this approach solve the old problem of how to measure quality objectively and cheaply?
The basic problem is that it takes time for citations to accumulate to a degree that makes quantitative analysis useful. Comparing scientists on this basis is not easy, since citation patterns and numbers vary widely between subjects and even subfields. Furthermore, the most innovative research may take a long time to be recognised. If assessments are made over relatively short periods, the following illustrates the difficulty.
When German scientists Bert Sakmann and Erwin Neher invented a way to directly measure the tiny electric currents that pass through the channels in cell membranes - defects in which are linked to diseases such as cystic fibrosis - their 1976 Nature paper was recognised instantly by peers to be of great importance. But, five years later, Sakmann's citation record was no more than modestly respectable. It took another decade before his now spectacular citation impact became apparent, by which time he had received a Nobel prize. In general, citation analysis is useless for assessing young scientists. It also has difficulty dealing with multi-author and/or multi-institutional papers and separating positive from negative citations.
Real scientists know that the only way to assess a colleague's research performance is to read their original papers, judging their importance, reliability and novelty. Which is exactly what the RAE does.
There is something intrinsically right about it, particularly in the simpler and more elegant form originally devised by Sir Peter Swinnerton-Dyer. There is something intrinsically wrong with the new system and this might have serious consequences.
Universities could, for example, be induced to make appointments principally in fields where research is most expensive, grants therefore larger and where citations can be accumulated rapidly. Scientists might be tempted to write provocative single-author review articles, which often achieve a short-lived fame, accompanied by a transient citation burst. Some highly cited scientists achieved their record almost exclusively via review articles. Real discoveries, with development potential, will not necessarily be seen as important, simply because they cannot be evaluated quantitatively within convenient assessment periods.
The present RAE could be greatly simplified without losing the laudable goal of directly assessing what obviously needs to be assessed, namely actual scientific output. Unless Hefce modifies markedly the new assessment plans, the baby will be thrown out with the bath water.
Ole H. Petersen is president of the Physiological Society and chaired the Royal Society's working group on the research assessment exercise.