World University Rankings blog: dealing with freak research papers

Phil Baty explains why hundreds of research papers will not be considered when compiling the next Times Higher Education rankings

August 19, 2015

More than 11 million research papers, published between 2009 and 2014 and drawn from Elsevier’s Scopus citation database, have been analysed as part of the global research project that underpins the Times Higher Education World University Rankings, to be published on 30 September.

But around 600 papers published during that period will be excluded from the calculations. Why? Because we consider them to be so freakish that they have the potential to distort the global scientific landscape.

One such paper, in physics, is snappily titled “Charged-particle multiplicities in pp interactions at sqrt(s) = 900 Gev measured with the ATLAS detector at the LHC”. It has clearly made a significant contribution to scholarship, based on ground breaking research at the Large Hadron Collider, and that is reflected in its high number of citations. But the problem arises from the fact it has 3,222 authors (another paper from the LHC published this year hit 5,154 authors, meaning that only nine pages of the 33-page paper were actually concerned with the science, the rest dedicated to a list of authors).

A similarly unusual paper, this time from biology, appeared this year in the journal G3 Genes, Genomes, Genetics and examined the genomics of the fruit fly.

Drosophila Muller F Elements Maintain a Distinct Set of Genomic Properties Over 40 Million Years of Evolution” has a more modest 1,014 authors, but it includes 900 undergraduates who helped edit draft genome sequences as part of a training exercise.

In the ensuing debate about how to properly credit academic research, neuroethologist Zen Faulkes, from the University of Texas Rio Grande Valley, wrote on his blog, Neurodojo: “I was curious what you had to have done to be listed as an author. With that many, it seemed like the criterial of authorship might have been, ‘Have you ever seen a fruit fly?’… Papers like this render the concept of authorship of a scientific paper meaningless.”

Under THE’s previous rankings methodology, using data and analysis provided by Thomson Reuters, each and every one of the authors on both of these papers and others like it (which also tend to attract unusually high volumes of citations), would be given equal credit for the work when it came to calculating a university’s research impact (which counts citations per paper, normalised against global citation levels for each discipline).

While this approach may not have had a statistically significant effect on large, comprehensive institutions like Harvard University, which typically publish around 25,000 papers a year, for smaller institutions with much lower overall volumes of research (our threshold for inclusion in the rankings is 200 papers a year over five years), it could have a distorting effect. It could not just artificially inflate a university’s research impact score, but given that research impact is worth a total of 30 per cent of the overall ranking score, it could unfairly push a small institution up the overall ranking table.

After extensive discussion with external experts, our new bibliometric data supplier, Elsevier, and among our burgeoning internal team of data experts (THE’s data and analytics director, Duncan Ross, blogs on the subject here), we have agreed that this approach is not appropriate.

So for the 2015-16 World University Rankings, we have decided to exclude from the analysis all papers with more than 1,000 authors. This amounts to 649 papers from a total of 11,260,961 papers – or 0.006 per cent of the total. It also adds up to 19,627 citations excluded from a total pool of 51,404,506 citations used to calculate the rankings – or 0.04 per cent of the total.

This might not be a perfect solution to a nuanced challenge, and it will cause some unwelcome volatility in the rankings this year.

It will no doubt frustrate a small number of institutions which have benefited from the previous practice and who will see themselves ranked lower in this year’s rankings compared to last year.

But until the global higher education sector can agree a fair and robust way to properly attribute author credit in such freak circumstances - and while THE’s data team take the time to examine proposals to use some other potential solutions such as fractional counting for all authors on all papers - we believe we have taken the transparent and fair approach.

Phil Baty is editor of the THE World University Rankings.

Register to continue

Why register?

  • Registration is free and only takes a moment
  • Once registered, you can read 3 articles a month
  • Sign up for our newsletter
Please Login or Register to read this article.

Related articles

Reader's comments (21)

Fractional counting is the ultimate solution. I wish you could have worked it out to use fractional counting for the 2015-16 rankings. The current interim approach you came up with is objectionable. Why 1,000 authors? How was the limit set? What about 999 authored-articles? Although the institution I work for will probably benefit from this interim approach, I think you should have kept the same old methodology until you come up with an ultimate solution. This year's interim fluctuation will adversely affect the image of university rankings. - <a href=""></a>
Thanks for your comments Mete. We accept that the 1,000 paper cut off is somewhat arbitrary, but our data and analytics director, Duncan Ross, explains a little more about the decision here: A look at his graph does suggest it is a sensible separation point. We have to accept a degree of fluctuation this year, due to a number of factors, not least the fact that we are ranking 800 universities this year compared to 400 last year, and have moved from Web of Science to Elseiver for our bibliometric data.
Leaving aside how and whether to rank universities, authorship on one of these "freak" papers clearly means something different from more normal co-authorship. It is simply not possible for all 5,154 authors to contribute to drafting the paper (it has fewer words than authors) or even, I suspect, to approve final submission. Fractional authorship will mean that the few people who actually played a major role will be under-credited; but perhaps that's their choice in helping drive a new meaning for "authorship"? Further discussion (by coincidence) on my blog this week: "Does mega-authorship matter?"
As a member of the particle physics community, I have strong objections against expelling articles with 1000+ authors from the data used for the next THE rankings. Obviously, the THE editors and particularly Mr. Duncan Ross do not understand properly why so many authors are indicated in papers from the LHC and other big High Energy Physics (HEP) experiments. It is true that all 3k+ authors do not draft the paper together, on the contrary, only a small part of them are involved in this very final step of a giant research work leading to a sound result. It is as well true that making the research performed public and disseminating the knowledge obtained is a crucial step of the whole project. But what you probably missed is that this key stage would not be possible at all without a unique setup which was built and operated by profoundly more physicists and engineers than those who processed raw data and wrote a paper. Without that "hidden part of the iceberg" there would be no results at all. And it would be completely wrong to assume that the authors who did the data analysis and wrote the paper should be given the highest credit in the paper. It is very specific for the experimental HEP field that has gone far beyond the situation that was common still in the first half of 20th century when one scientist or a small group of them might produce some interesting results. The "insignificant" right tail in your distribution of papers on number of coauthors contains the hot part of the modern physics with high impact results topped by the discovery of Higgs-boson. And in your next rankings you are going to dishonour those universities that contributed to this discovery. FYI, almost every large collaboration in the particle physics adopts a quite strict policy concerning publication and authorship rules. Typically, prior to be a legitimate author of collaborative manuscripts, a new person of a particular collaboration must work more than one year and make a significant contribution in the experiment including such tasks as taking shifts during experiment operations, maintaining or upgrading some part of the detector setup, developing important software tools, etc. Me and colleagues of mine believe that you should first become familiar with the policies of the most renowned and respected HEP collaborations (e.g., ATLAS, CMS, LHCb, Belle, etc.) before calling their papers freaky. Whatever the case, the way you changed the methodology as compared to the one used before and resort to this crude solution is very doubtful. It would probably be more reasonable to follow the established methodology until you come up with a final approach: the point is that frequent fluctuations of the ranking methodology might damage the credibility of the THE. Certainly, I do not imply here large and well-esteemed universities like Harvard or MIT. I believe their high rankings positions not to be affected by nearly any reasonable changes in the methodology. However, the highest attention to the rankings is attracted from numerous ordinary institutions across the world and their potential applicants and employees. In my opinion, these are the most concerned customers of the THE product. As I already pointed out above, it's very questionable whether participation in large HEP experiments (or genome studies) should be considered "unfair" for those institutions.
As a particle theorist at a UK university that does not participate in any of the large LHC experiments, I am writing to express my dismay at your proposal to zero-out the research contributions of my experimental colleagues, and wonder whether you have thought through the implications. Your announced policy is unfair, since it discounts all research in big science: particle physics and biology today, astronomy tomorrow with the advent of new, large and complex facilities that require the expertise of many contributing scientists, and who else the day after. Also, as one commenter has already remarked, it is arbitrary: why remove papers with more than 1000 authors and keep papers with 999? This would not matter if there were no negative consequences of your policy. But how many university administrators will now draw the obvious conclusion that they should withdraw support from their experimental particle physicists, maybe fire them, because their efforts do not count towards their THES rankings? Such effects have already been seen in South Africa when the Department of Higher Education zeroed-out research credit for papers with over 100 authors. I am all in favour of finding the best way to assign credit among large numbers of scientific authors, but this requires more thought and broader consultation than is evident from your blog post. Finally, I really must protest about your adding insult to injury by calling multi-author research papers 'freaky'. Is that an appropriate adjective for the experimental discovery of the Higgs boson, widely acknowledged to have been one of the major advances in physics during the past few years, which you would zero out according to your new policy?
As Head of the Cavendish Laboratory, and a particle physicist, I have to agree with John Ellis that this is a very bad decision which will damage the credibility of your rankings. Ruling out entire disciplines because you don't approve of the author list is not a minor methodological issue as it is portrayed in the article. Neither is it a minor correction to a few institutions. The argument that it affects a tiny proportion of papers is specious since they are all in a few areas. Many of the best Departments in the world are leaders in big science enterprises and this decision will have undesirable implications for them all. I also agree with John that the jokey use of the word "freaky" to describe some of the best science in the world today does you no credit. I would urge you to reconsider this decision.
I'm sorry that my use of the world "freaky" has not been well received. Clearly I was not seeking in any way to cast aspersions on the research, which is clearly of huge value, but to highlight the fact that these papers are extraordinary in the context of research evaluation. They are truly "freaks" in terms of bibliometric analysis - making up 0.006 per cent of the papers we're examining and 0.04 per cent of citations. It is important to stress that this relatively tiny group of papers have the potential to very seriously distort the overall ranking performance of small universities with very low research output, while having very little impact, if any, on the position of larger comprehensive institutions. It is true that the majority of these papers are in physics, so we will consider including them in the physcial sciences ranking we will publish later this year.
All that this shows is the totally arbitrary nature of university rankings, Change the method and get almost any answer you want. It beats me how anyone thinks it possible (or sensible) to describe all the complexity of a university by a single number. It's simply statistical illiteracy. If everyone ignored these silly rankings, it would no longer be profitable to produce them. They would vanish and nobody would notice because they have no discernible usefulness.
In response to Phil Baty, I would repeat that the fact the number of papers is a small part of the total is a specious argument. It represents the majority of the output of some disciplines, and may the main source of income in some physics departments. THE is sending the message that these disciplines are worthless in ranking terms, and they will damage that research, especially in smaller institutions. People could lose their jobs because of this, and I do not think that the THE is taking this sufficiently seriously. The rankings are, as David Colquhoun points out, of dubious merit, but many funders do take notice of them. If excellent science creates a problem for rankings, then the issue is with the methodology, not the science. ps. I have just noticed that the Cavendish HoD account with THE is in James Stirling's name. I took over from him and take responsibility for these comments. Andy Parker
Andy, We are very happy to include these extraordinary physics papers in our physical sciences ranking, published on 12th November, as they clearly do make a tremendous contribution to science and we are keen to recognise that. But in terms of the overall world university ranking of largely comprehensive research universities, a tiny proportion of papers in one narrow field have a hugely distorting effect on smaller institutions with very low overall research output. We can help ensure that outstanding research in physics is represented in the overall rankings through our academic reputation survey, which had 17 per cent of its respondents from the physical sciences.
Phil - With this decision, your rankings are no longer a neutral look at the status, but an active policy intervention. You are telling people that investing in big science will no longer benefit their ranking. We know that Universities (especially small ones) strive to move up the rankings. So this will put pressure to move funding elsewhere. It is indefensible that some bibliometric anomaly, which is easily cured in many ways, is used in this way. Andy
The irony of passing judgment on the quality of science at a university (which presumably it what you're trying to measure) using such unscientific methodology -- the physicists & biologists whose work you're discounting will chuckle, then cry as their funding is cut. If the world does not conform to your expectation, change the method until it does. The LHC would have found so many SUSY particles this way. Then you'd really have a hard time ignoring them.
Since physics and biology now have different authorship norms from other disciplines, perhaps each discipline should be weighted and then have their authorship contributions measured in a manner appropriate to that discipline.
Phil - your "solution" of including these papers in the Physical Sciences rankings but not in the overall ones really makes no sense. If these papers really skew your rankings so badly, as you claim, why is it ok to screw up Physical Sciences? How can the rankings make any sense between disciplines if you do this? How can the overall rankings be analysed when the disciplines use different methodologies? You will create an almighty mess. Your position before was, in my view, indefensible. Now it also lacks any internal logic! I would urge you to hold back on changing your methodology until you have consulted sufficiently widely to get a credible consensus. Andy
Ah! The danger of subjectivity sans agreed boundaries. Beauty is in the eyes of the ...
I am the national contact physicist for South Africa in the ATLAS Collaboration and chair of the South Africa ATLAS group. The group currently consists of 5 academics in 3 universities and has a total ATLAS membership of approximately 50 people, 10 of whom are included on the ATLAS author list. John Ellis mentioned the South African Department of Higher Education and Training (DHET) policy on not awarding credit to papers with over 100 authors. In South Africa, this credit is a funding subsidy to the university which contributes a significant fraction of the university's research income. The naive metric used in this policy has already had an impact on the number of universities interested in participating in the ATLAS Collaboration. Those universities who have chosen to invest in the ATLAS Experiment, despite the absence of this funding stream, will now have one more hurdle to overcome in their support for large collaborative science. The ATLAS groups will no longer contribute to the university's research ranking. Just to point out the non-sensical use of "1000" as the cut off - the ALICE Collaboration currently has 980 authors, while LHCb is sitting around the 800 figure. For the readers who are not in high energy physics, the LHC has four large experiments ATLAS, CMS, LHCb, and ALICE - all of whom operate with similar authorship rules. Perhaps ALICE and LHCb should put a cap of 999 authors? While I understand that these large author papers do skew your rankings in some way, the elimination of these papers will also skew your rankings. Whether we like it or not the THE rankings affect research funding. This move will have a negative impact on large collaborative science. Please consider a more nuanced approach to including large collaborative research in the THE rankings. I am confident that the large collaborations would engage with you on finding a solution. Phil: Please contact me offline if you would like any assistance in finding a more sensible solution.
Phil - you say that you consulted external experts on this decision. Can you please tell us who they are and how they were selected? Andy
Considering my institution is involved with large international collaboration with roughly 900 members this will benefit us. However as mentioned by others, it is a completely flawed and arbitrary methodology to use by THE. Having access to these sorts of collaboration are hugely beneficial both to the institution and the students and researchers involved. Membership of these collaborations shows that world leading research is being carried out, outward engagement is occurring (including internationally) and will allow strong benefits for researchers and students to develop their careers through these vast networks. These are some of the KPIs that correspond with what THE are looking for in a "good" university so makes no sense to disregard them. Maybe it is necessary to ensure that they don't skew rankings however the appropriate methodology would include fractional counting probably along with a new section under your "International outlook" ranking which considers the number of members of an institution or funding (as percentages) which relate to domestic and international collaborations (scaled by collaboration size perhaps).
A small analysis team worked very hard to produce the paper “Charged-particle multiplicities in pp interactions at sqrt(s) = 900 GeV measured with the ATLAS detector at the LHC”. When the paper was ready for publication, the list of authors decided by the international collaboration was appended. Many other LHC papers are similar in that they were produced by a small analysis team working inside a large collaboration. The author list includes the analysis team, as well as all of those that were involved in building and running the experiment. As others have said, putting at cut at 1000 will remove all ATLAS and CMS papers. These collaborations are at the forefront of the highest energy collider physics studies, looking for answers to the big questions about our understanding of the physical Universe.
Hi Phil, I am member of CMS Collaboration. THE rankings are obviously affecting the funding strategies of the goverments and institutes. However, this effect on fundings will create another big problem in a different way. Students! Who are one of the major work power of this kind of forefront science are now getting the opinion that a career in these fields you ruled out will not be a promising one in the following years. So, they are simply changing their routes to the fields you are indirectly blessing with your current methodology. Moreover, this is going to be a cumulative effect, and the human resource gap in these fields will increase for every year if you insist on such kind of methodolgy which is almost blind to an entire field of research. Bora