The real challenge around “responsible metrics” is only partly about the metrics themselves; it is also, and perhaps more, about how people use them.
First, I must repeat a mantra that I have used widely: we mean indicators, not metrics. A metric is something like a citation count; it doesn’t tell you a great deal, and it usually depends on the data source. An indicator is what you get when you try to assess something else, like research performance. You can’t measure research performance directly, because that requires expert and experienced interpretation, but you can create an indicator of something associated with performance, such as the average number of citations per paper after accounting for publication date and research field.
One widely used indicator is the h-index, created by Jorge Hirsch in 2005. A scholar with an h-index of n has published at least n papers, each of which has been cited at least n times. This indicator is not very responsible because it does not reflect the age of papers or the rate at which citations accumulate in different fields. Older biomedical researchers almost invariably have a higher h-index than younger quantum technologists, so their relative h-indices not only tell us very little but may hide real differences in achievement.
The list of papers someone produces over their career, their full CV, is a much better source of information. The problem for research managers is that they have too little time to pore over those lists, let alone to look at even the summary sections of the papers. Much the same challenge faces peer reviewers when they consider grant proposals with a publication history attached. The challenge escalates for members of UK Research and Innovation grant programme or research excellence framework evaluation panels who may be looking at large and diverse portfolios.
What do people do with an overwhelming task? They look for proxies to supplement the information gaps that they won’t have time to fill through their own analysis. And that is where the problem of responsibility lies, because if those proxies are not as good as the user likes to think, that may lead to weak judgments, poor decisions and bad investments.
To make the grade as a popular proxy indicator, and to become a “great metric”, the index needs to be very simple: preferably a one-dimensional number, so a big value means good and a small value means not so good. It also needs to appear to be transparent and comprehensible; in other words, it needs to look as if it represents the thing you really want to know. The h-index fulfils these needs because it’s about more papers with more citations (must be a good thing – right?) and it’s a simple, counted metric. It is just too appealing to ignore, yet, as noted, the value you get depends on other things around research culture, and it requires detailed interpretation.
At the Institute for Scientific Information, we worry a lot about research indicators and how they are interpreted. In particular, we worry about another widely used “great metric”: the journal impact factor. Citation counts take time to build, so reviewers sometimes choose to look at recent papers in the context of the journal that published them. The problem is that the JIF is a journal impact factor, not an index of the papers or their authors. It’s great to get a paper in Nature (in the 2008 research assessment exercise, every UK-authored Nature paper that could be submitted for assessment was submitted), but Nature’s high JIF value includes a wide range of article citation counts, with the median value of article citations being lower than the journal-level score.
What can the ISI do to mitigate this problem? JIF is a great metric for publishers and librarians, but we think that other research folk turn to JIF as a substitute for decision-making because they lack a choice of more relevant data. So, this year’s journal citation report unpacks the data to enable researchers and managers to see where each JIF fits into the overall distribution of citations for a journal. We’ve separated the median (mid-point, not average) citation values for articles and more frequently cited reviews and split out “non-citable” items such as commentaries and editorials, with the full calculation of each JIF shown against the data display. Anyone looking at the journal citation report will now see the full context, and they can also see where all the papers that contributed the citations came from.
JIF is a great metric when used for the purpose intended. Responsibility in its use requires choice, and we have made a firm move to increase every research assessor’s ability to make the responsible choices for their specific requirements. Other metrics require similar development. We and other data organisations will need to work to support the responsible use of “metrics” that world-class research deserves.
Jonathan Adams is a director at Clarivate Analytics’ Institute for Scientific Information.