Peer review: not as old as you might think

Peer review is often thought of as ancient and unchanging, but it is neither – and it shouldn’t be treated as a sacred cow, argues Aileen Fyfe

June 25, 2015
Is peer review broken? That was one of the major questions addressed at the Royal Society’s conference series on the future of scholarly scientific communication, which took place earlier this year.

The meetings were held to coincide with the 350th anniversary of the establishment of the world’s oldest scientific journal, Philosophical Transactions. There was a common assumption among those participating – even those such as Richard Smith, former editor-in-chief of the British Medical Journal, who thought peer review should be swept away – that peer review also began in 1665, and that it had always been used to ensure quality control in science. When you look at the history, however, both those beliefs are questionable.

The assumption that peer review is as old as journal publishing – as also implied by a recent report, Scholarly Communication and Peer Review, commissioned by the Wellcome Trust – is based on a misunderstanding of Philosophical Transactions’ editorial practice. Recent investigation by my own team and by Alex Csiszar at Harvard University has revealed that selection, reviewing and (to some extent) evaluation did happen, but not at all in the way we would now recognise as “peer review”, in which an editor requests independently written reports from experts in the field for his or her (mostly) private use.

Henry Oldenburg, who founded Philosophical Transactions, searched out material for his monthly periodical from his extensive correspondence with scholars around Europe, his participation in the weekly meetings of the Royal Society and recently published treatises and pamphlets. He was a very active editor, soliciting contributions and extracting, excerpting and translating from his other sources.

Indeed, for most of the history of scientific journals, it has been editors – not referees – who have been the key decision-makers and gatekeepers, to the extent that journals were often known colloquially as “Editor X’s journal”. At Nature, for example, editors were firmly in charge until the mid 20th century. Whether this practice could be described as peer review depends on whether we adjudge the editor to be a “peer”. For many scholarly editors, this has, of course, been true. But it’s not really what we mean by peer review now.

At early Royal Society meetings, research findings were presented, often demonstrated and frequently discussed. But while it is possible to say that this means they had undergone scrutiny by well-informed scholars, that could be deemed to be peer review only to the extent that material presented nowadays at workshops and conferences (or on preprint servers) can be said to have been peer-reviewed. A modern journal editor might, as Oldenburg was in effect doing, scout for potential submissions at a conference and take heed of the tenor of the discussions; but those discussions are part of the oral culture of scientific communication, which help a researcher firm up their analysis and interpretation before seeking publication. They serve a purpose of their own, distinct from any editorial process.

It is possible that Oldenburg specifically sought an opinion on some potential contributions. But if he did, he probably did so in person and, unfortunately, it’s rather difficult to find historical evidence of what past scholars chatted about in the coffee houses and taverns. Two centuries later – as a forthcoming book from Melinda Baldwin of Harvard shows – Norman Lockyer, founding editor of Nature, took a broadly similar approach. He made most of the decisions himself, but sometimes sought additional opinions from his extensive network of connections in the London scientific community.

That Lockyer, an astronomer, felt the need to seek advice on papers dealing with the life sciences hints at one of the limitations of relying on the judgement of a single scholar. This had already been recognised in the 1750s, when Denis Diderot said: “A journal embraces such a large variety of matters that it is impossible for a single editor to issue even a mediocre journal…A journal must be the work of a society of scholars.”

The Journal des Sçavans – which covered a wide range of humanistic fields – had an editorial team as early as 1701; and in Britain, although the Philosophical Magazine began in 1798 with a single editor, by the 1850s it was run by a five-man editorial team bringing expertise in physics and chemistry, as well as connections in London, Dublin and Edinburgh.

Having a team of editors provided breadth of expertise, but if each operated as sole decision-maker in his (or her) field, those decisions were still potentially vulnerable to the biases and prejudices of the editor. Editors themselves seem not to have been too worried about this, but it was something that very much did concern those learned societies that operated journals.

In the 18th century, the leading French and British societies both developed practices for evaluating research collectively. The Royal Society took formal control of Philosophical Transactions in 1752 and introduced new editorial regulations that sought to ensure that the society as a collective body would control what appeared in its pages, thus limiting the potential damage from any one individual’s incompetence, bias or prejudice. A Committee of Papers was created to evaluate contributions presented at the society’s meetings for possible publication. In contrast to the editorial teams mentioned above, this committee had to reach its decisions collectively, which it did by taking a vote.

In Paris, the Académie Royale des Sciences was a different type of organisation, with paid academicians appointed by the Crown. One of their roles was to assess the merits of inventions and discoveries by non-academicians, which they did by appointing small committees to investigate and report back in writing. Securing a positive judgement from the Académie’s reporters (rapporteurs) could be very useful: it could help inventors get a patent; it could be quoted in commercial marketing materials; or it could persuade either the Académie’s publication committee or another journal editor to publish the findings (perhaps accompanied by an extract from the report).

Both these systems ensured the involvement of more than one person in the decision-making process, and both made some provision for expert judgement: in the French case, by careful selection of the rapporteurs; and in the British case, by having enough committee members to cover all fields and inviting additional members if necessary.

Just as the mechanisms established by these various organisations differed, so too did their claims about the quality or certainty of what they published. In the late 18th century, the French Académie’s committees sought to replicate and test the research findings. This was far more than just a careful reading of a text – it could involve a lengthy experimental investigation. By the 1830s, this was abandoned as being too time-consuming.

In contrast, the Royal Society printed an “advertisement” at the start of each issue of Philosophical Transactions explaining that the selection process did not pretend “to answer for the certainty of the facts, or propriety of the reasonings…which must still rest on the credit or judgment of their respective authors”. Rather than making decisions based on the “certainty of the facts”, the committee focused on their “importance and singularity” and the quality of their communication.

The practice that we now recognise as “peer review” (but not the term itself) emerged in the early 19th century. The Royal Society was one of several learned societies in London that started to seek referees’ reports around this time, as a way of ensuring that more expertise was involved in editorial decision-making. A Royal Society committee in 1827, whose members included computer science pioneer Charles Babbage, suggested that small committees be appointed, somewhat similar to the French model, but the report was ignored. In 1831, there were some experiments with jointly authored reports, but from 1832 on, the Royal Society sought independently written reports, which informed the decision by the Committee of Papers. Thereafter, refereeing quickly became a normal part of the publication process at the learned societies. Charles Darwin experienced it from both sides – as author and as referee – at the Geological Society in the late 1830s; he would later referee for both the Royal Society and the Linnean Society.

For George Gabriel Stokes, secretary of the Royal Society in 1854-85, his editorial role was something to relish, and he devoted significant amounts of time to corresponding with authors and referees (he was an early adopter of the typewriter, in 1878). He developed the practice of sharing referees’ suggestions with authors, and guiding authors on how to respond. Refereeing thus came to play a variety of roles: in the process of selecting papers for publication, it entailed an evaluation both of the worth or originality of the paper and of its suitability for the particular journal; and it involved a semi-conversation between authors and referees, mediated by the secretary, about improvements to the text.

The editorial process at learned societies in the mid 19th century thus drew on the expertise of referees, combined with a committee decision-making process to balance against possible accusations of bias or favouritism. One of the consequences of the reliance on referees and committees was that the learned societies could not publish research as quickly as the independent journals, which were managed directly by their editors. Publications in society journals appear to have been highly valued as markers of prestige, but they were not the best means for rapid communication of new research.

It was only in the late 20th century that refereeing was rebranded as “peer review” and acquired (or reacquired) its modern connotation of proof beyond reasonable doubt. The Oxford English Dictionary says that it was not until 1967 that “peer review” was first used – in the US – to describe “a form of review of competence by others in the same occupation”; the dictionary lists various quotations from the 1970s for the term’s more specialised uses in scientific grant-making and publication.

A Google ngram – which charts yearly frequencies of any phrase in printed documents – makes the point starkly visible: it was in the 1970s that the term “peer review” became widely used in English. This coincides with a more widespread use of the refereeing process outside the learned societies, being adopted, for instance, at Nature and at grant-making bodies such as the US National Science Foundation. But the various research teams looking into the history of peer review, including my own, do not yet know enough about why the post-war expansion of scientific research, on both sides of the Atlantic, led to the transformation of refereeing into “peer review”, or why it then came to dominate the evaluation of scholarly research.

Some of the participants at the Future of Scholarly Scientific Communication meeting suggested that, as the internet era progresses, we will increasingly move away from journals as the key means of communicating science. It is, therefore, worth considering whether a process that developed for print journals at learned societies will still be fit for purpose in that brave new world.

“Peer review” should not be treated as a shibboleth or – as one meeting participant suggested – a “sacred cow”. Rather, it should be seen for what it is: the currently dominant practice in a long and varied history of reviewing practices.

Aileen Fyfe is reader in modern British history at the University of St Andrews, and leads the Arts and Humanities Research Council-funded project called Publishing the Philosophical Transactions, 1665-2015.

Article originally published as: Peering into the past (25 June 2015)

