Source: Patrick Welham
Citations, whatever their faults, are observable in a way that is not true of the nodded approval of quietly self-selecting scientific communities
A little-remarked feature of the forthcoming research excellence framework is that subject review panels are forbidden from using information on journal impact factors or other journal rankings.
When one of us argued in these pages a few years ago that such a rule was illogical and foolish, it sparked debate. Yet everyone knows that different journals have different quality standards for publication, so what is the point of pretending otherwise?
Although peer review by journals also necessarily involves a level of subjectivity, specialist journals employ specialist referees, and REF panel members can learn a lot from their considered decisions. Editors and referees are simply far more knowledgeable about the average published paper on Martian termites than are members of the REF panel on exobiology.
In addition, it is being untruthful to ourselves and the world to suggest that REF panel members have enough time to assess properly the submitted papers in areas miles from their own specialisms. If asked whether this is true, the panellists are forced to dissemble.
So how might the panels sensibly use information on journal impact factors? In a new paper in The Economic Journal, “How Should Peer-Review Panels Behave?”, we argue that evaluation panels should blend citations and publications data, and then throw in a dash of oversight. The way to do this is to turn to Thomas Bayes, the clergyman who 300 years ago developed what are today known as Bayesian methods.
The idea is straightforward. Panellists would take a weighted average of the journal’s impact factor and the accrued citations to the article, with the weighting adjusted over time according to Bayesian principles. More and more weight would be put on citations in proportion to the length of time since the paper was published. In the early years, almost all the weight would be put on the journal impact factor, since citations would not have had time to accrue. The panellists could shade up or down the resulting quality assessment based on their own judgement.
We are familiar with the weaknesses of journal impact factors, such as the fact that they do not necessarily reflect the citation rate and/or quality of every paper they contain. But it is not rational to conclude that no weight should therefore be put on them, particularly right at the start of an article’s life. In the first few months post publication, the impact factor is the best guess we have about the likely importance of an article. Several decades later, a paper’s citation count plays that role. In between, weightings on each have to be chosen – and we believe that our paper merely formalises a way of doing so that is already carried out by many experienced researchers.
To give a practical example, consider an article published in the not-so-fancy journal NITS (Notional Insights on Termite Science). Imagine that a REF panel discovers that, after a small number of years, a specific article published in NITS happens to have a significantly better citation record than one in fancy journal HDILPOP (Huge Discoveries by Ivy League Professors Only Please). How should the panel react?
In the language of our paper, the citations record of the particular NITS article constitutes a series of good Bayesian signals, whereas the citations record of the particular HDILPOP article does not. A reasonable question for the panel is: how long should we persist in downgrading the NITS article on the basis of the journal’s impact factor if the high relative citations to it continue? In one illustrative calculation, we find that Bayes’ rule would suggest that roughly four years of conflicting citation data are needed before the original opinion should be reversed.
Journal articles are the main raw material of modern science – and arguably have the advantage, whatever their faults, of having been through a form of refereeing. Citations to them are the main marker of those articles’ influence – and arguably have the advantage, whatever their faults, of being observable in a way that is not true of the nodded approval of quietly self-selecting scientific communities. The REF panels should use that information.
This certainly does not mean that mature overview by experienced human beings ought to have no role. Purely mechanical procedures should never be used in REF-like evaluations of universities, scholars, departments or disciplines. Nevertheless, a weighted average of impact factor and article citations is the natural starting point for a sensible REF panel. And that goes for the social sciences and humanities, as well as for the field of Martian termite studies.