Source: Getty (edited)
Good career moves
These days, I’m asked to talk about reproducibility and replication in science at least as often as I’m asked to talk about my own research. And I’ve noticed a repeated pattern in the responses from early career researchers. “Working in an open and reproducible way all sounds very nice, but my boss would not approve,” they say. Or, in a similar vein, “If I take time doing things carefully and transparently, I will miss out on publications, and my career will suffer.”
Looking through the written responses to the UK Science and Technology Select Committee’s recent call for evidence on science reproducibility, it is remarkable how many early career researchers make similar points. And their impressions are backed up by the Wellcome Trust’s survey of research culture, 43 per cent of whose respondents thought that metrics are valued over research quality. Nearly a quarter (23 per cent) of early career researchers had felt pressured by a supervisor to produce a particular result.
In December, I was privileged to have the opportunity to give oral evidence to the committee, alongside Marcus Munafò, chair of the UK Reproducibility Network (UKRN). We had to dispense first with some basic questions of definition. Reproducibility is often used quite broadly to indicate the extent to which you feel a result is solid and can be built on. However, technically, we can distinguish literal reproducibility – ability to arrive at the same result, given the same dataset – and replicability, which is obtaining broadly compatible results when an experiment is repeated with a new sample.
THE Campus Views: We can make research more ethical without compromising its quality
Literal reproducibility may seem like a pretty low bar for research to achieve, but studies can fail this criterion if methods are only vaguely specified, data are unavailable, and/or if there are errors in data processing. Lack of replicability, meanwhile, does not mean a study was badly done: there are many reasons why results may differ, including random variability. But if a high proportion of findings don’t replicate, this suggests there is something wrong with the way we are doing science, given that our methods are supposed to guard against biases and error.
Publication bias – the non-reporting of null findings – is ubiquitous. This distorts the field because the published body of work is not representative of typical results. In many subjects, there is further scope for bias caused by researchers selecting post hoc from within a study the particular analyses or variables that give the most impressive-looking finding. Researchers often underestimate how dramatically the rate of false positives can increase if they use such a flexible approach to analysis.
There is no single cause of problems and no single solution, but there are some relatively easy fixes that can be adopted by funders and institutions. First, there is a need for more training in research integrity, as well as better grounding in methods and statistics – areas that have moved so fast that many senior scientists struggle to keep up, so are unable to train others.
Second, data and analysis code should in most circumstances be available to other scientists, so that they can check the work.
Third, criteria for hiring and firing need to be modified. We must stop using proxy measures of quality, such as numbers of publications in high-impact journals and amount of grant funding, and reward work that is conducted in a reproducible and rigorous fashion. Marcus – whose mushrooming organisation interacts with funders, publishers and learned societies across the disciplines to coordinate efforts to improve research quality – drew an analogy with the Japanese car industry. In its early days, it had a poor reputation, but then transformed itself to become a byword for quality and efficiency by focusing on rigorous quality control at each step in the manufacturing process.
The committee was interested in whether peer review was a source of problems. Certainly, the system is under great stress, but in my view, the problem is not with peer review itself, so much as with the point at which it occurs. Registered Reports is a new model of doing research, which makes peer review far more useful by requiring it before any data have been gathered. The reviewers evaluate a protocol that specifies the problem to be addressed and the methods that will be adopted. This approach also provides a transparent record of what was planned, which guards against the problems of biased selection of results.
In addition, our funding and reward systems still tend to implicitly envisage a single scientist working alone. But times have moved on and we need to recognise that bringing together groups with complementary skills, possibly distributed across several centres, is a good way of fostering research that is both reproducible and replicable.
Clearly, something is wrong in a system where so many young researchers feel there’s a mismatch between doing good science and having a successful career. The written evidence presented to the select committee contains many more good ideas of how to address the problems. Let’s hope that we are now on the road towards self-correction of a research process that has, in recent years, been veering off course.
Dorothy Bishop is professor of developmental neuropsychology at the University of Oxford.
Source: Getty (edited)
For at least 80 years, it has been clear that our systems for improving knowledge have dramatic flaws. These flaws are ubiquitous, spanning the disciplines, and have devastating costs. Vast amounts have been written about them in the past decade, but they remain largely unresolved.
Much of what is taught as true is known to be false. Many of the phenomena upon which widely cited theories are built or that underpin policies and legislation have been inadequately tested or not replicated even once. Yet theory is rejected very infrequently and is often “not even wrong” – unable to be refuted because it is imprecisely specified, adapted post hoc and otherwise protected. And researchers’ responses to failures to replicate their results are as likely to be rebuke as humble acceptance.
Luckily, researchers have all the skills needed to determine what phenomena are reliable and to conduct the debate needed to ensure that only robust theories thrive. The problem is that, today, only tiny handfuls of researchers undertake this work. What is needed is a change of incentives. I have three rapidly achievable proposals.
First, funders such as UK Research and Innovation should set aside 10 per cent of their budgets for testing the validity of phenomena, protocols, code and analyses. To begin with, researchers (identified according to some low bar, such as entry to the research excellence framework) should be invited to suggest and anonymously rank the papers/claims they think are unlikely to replicate or whose erroneousness would make a lot of difference. Prediction markets show we are very good at identifying such studies: much bad research is an open secret. A side-effect is that groups will develop expertise in replication, which will render future attempts at it both cheaper and more rigorous.
Second, we should trial giving researchers an automatic and flexible research budget – say £10,000 – targeted at replication. No reviews, no university overheads: just pre-registration of the hypotheses to be tested, and open science. Flexibility would allow researchers to pool their grants to undertake a larger piece of work in voluntary, uncontrived collaborations. Crucially, researchers would switch from asking “Is this fundable?” to “Is this true?” And it could be highly efficient: key outputs of a £300,000 grant might be tested by a single researcher using their £10,000 personal research budget.
Of course, some research fields are more expensive than others, and an alternative would be a grant fixed at, say, 5 per cent of the median field grant. But the flat grant is simple and would focus researchers on efficiency. Often labs with millions of pounds of equipment still lack the means to fund the marginal cost of work not requested on a specific grant line.
Third, we should address how large project funding is organised. The current system should not be upended, but we should confidently experiment with it, applying science to the task of improving science. A great example of novel funding enhancing research integrity and productivity is UK Biobank. This took an area ripe for discovery (genomics), created a sample 10 times bigger than what existed previously, and gave it away to any researcher who could describe a scientific use for it. It generated thousands of research studies, many of which consist of collaborative replications – within which corner-cutting is less likely.
At the level of applications, funders have raised minimum grant sizes, universities pre-screen grants, and only those with uniformly stellar reviews are funded. This reduces administration but favours orthodoxy. Critiquing the work of others is avoided, lest this attract a negative review, and the successful groups are then funded to test their own idea.
A portion of funding should be allocated to disrupting this sequence. In each area, researchers would collaborate on an initial “Wikipedia of aims”. From mRNA applications curing cancer to improving mathematics instruction, academics would curate the competing approaches, identifying key predictions, and specifying the studies required to test each theory. In a second phase, researchers would rank these proposals. In phase three, research teams emerging from phase one would compete for grants to execute the highest-ranked projects.
In this way, hundreds of smart, disinterested eyes would be trained on each grant, making capture by what Lakatos called degenerate research programmes less likely. Devoting 5 per cent of public research funding to the grants that garner adequate support could launch this on a meaningful scale. Research teams would propose budgets to execute the top-ranked projects, decided conventionally.
Much more is possible. The pay-offs can be transformative, from medical technology to new understandings of human behaviour. In the long run, nothing is more important.
Tim Bates is a professor in the School of Philosophy, Psychology and Language Sciences at the University of Edinburgh.
Source: Getty (edited)
Research reproducibility is a serious problem all around the world, so I was heartened to see UK parliamentarians taking a closer look at what can be done. But they already undertook two similar investigations in the previous decade, in 2011 and 2018, and I fear the new inquiry will similarly fail to make an impact.
The core problem is that while genuine cheating is rare, honest technical errors, equipment miscalibration and the contamination or degradation of key materials, which may lead to erroneous data, are common. Uncovering unreproducible research data is also difficult and slow. Years can be spent trying to learn newly reported techniques and then seeing if matching findings arise. Even then, misalignment of data may be subject to genuine (and lengthy) academic debate. And senior faculty have protections around their employment and can be resistant to outside influences, such as committees on research integrity. All the while, there is relentless pressure to publish.
So how to improve research reproducibility?
Pouring millions of pounds into costly replication studies is often suggested, but there is an alternative solution that would be far more effective, realistic and politically appealing. That solution is to expand the uptake of university-derived technologies into new start-up companies.
Why might this work? Consider the hypothetical case of a PhD student, Sam, working in the lab of Professor Smith on research to improve solar cell energy. He co-invents a device that can boost efficiency by 5 per cent across a range of climates, which is commercially attractive. Sam decides to start a new company to develop the patent-pending technology commercially.
A local investor is intrigued and offers additional funding for a share in the company – which requires a contract with the Smith lab. The funding includes routine servicing for a key piece of equipment: the “hot and cold incubator”. This is how businesses conduct research.
It is discovered that the equipment is miscalibrated because it hadn’t been serviced in years. The low-temperature data are incorrect. The investor is notified, but this isn’t a problem: low-temperature use was not a business priority. The Smith lab ultimately publishes a paper accurately discussing the solar energy unit’s operation across a range of temperatures.
This is an example of how the forces of commercial scrutiny can help support research outputs by creating a direct connection between investors and the laboratories of senior faculty. Companies have very different research practices than academic labs. Practical reliability and consistency are commercial priorities, and investors may even insist on seeing whether key data can be recreated within an independent laboratory. It seems unlikely that an academic committee on research integrity could ever induce such scrutiny.
Commercialisation is often a two-way street, however. Academics can act as scientific advisers to these companies and, in the process, learn about commercial practices. These may open new avenues of research, but they also help improve quality. In Sam’s case, equipment maintenance doesn’t just affect the Smith lab but all the other labs using that piece of equipment. Colleagues may wonder, “What other equipment of ours is out of specification?” Hence, a few more pieces of equipment are calibrated. Funding is available to replace obsolete equipment. One or two more companies arising from a university each year may provide lasting cumulative benefits.
One more thing: academics gossip. News that a piece of equipment has been recalibrated may be discussed in lab briefings, but faculty getting some rapid funding from a former student’s start-up will be a popular topic, along with the kinds of data and standards expected by investors. More negatively, news that a company has folded due to poor research reproducibility (perhaps with accompanying scandal) is likely to remain in the minds of senior faculty for a very long time.
What if encouraging doctoral candidates to found start-ups still doesn’t improve research reproducibility? The economy would still have a substantial increase in new high-tech businesses that can grow and thrive – another government priority.
Not every research finding lends itself to commercialisation, of course. But for those that do, there is little to lose and everything to gain by trying.
Chris Loryman is a senior innovation and commercialisation manager at the University of California, San Diego, and has previously managed intellectual property and technology transfer at several London universities.
The pandemic has again emphasised that good science can literally be a lifesaver, but bad science has the potential to lead to awful consequences. In 2019, false claims that ivermectin or hydroxychloroquine could cure Covid-19 led to several poison-related admissions to US hospitals, and even suspected deaths. Likewise, preprints reporting unverified and misleading results have contributed to a significant level of public confusion on a range of pandemic-related issues.
While it is clear that reliable scientific data are critical for good decision-making, science is not easy. Alongside the practical challenges of designing and conducting good experiments, there are also complex social contexts that apply to researchers. Like it or not, money is key, and science is influenced by the quest for funding. It is therefore naive to interpret scientific data, or any scientific finding, without understanding the incentives that lead to its production – and their potential to distort results.
A well-known problem is the intense competition to win research grants, publish high-profile papers and report “impact” – which for many scientists represent the benchmarks of prestige. But linking rewards to such proxy indicators of success, rather than to creating reliable and reproducible results, is problematic because it distorts science in favour of people who can play the funding, publishing or impact game better than others. This problem is further compounded by the many well-known and serious problems in the main quality-checking part of the system – peer review – which is largely voluntary and conducted in people’s spare time.
The result, as Sir Ian Chalmers and Paul Glasziou first calculated in 2009 and explained in subsequent BMJ editorials, is that “85 per cent of research funding [is] wasted because it asks the wrong questions, is badly designed, not published or poorly reported”. This astonishing figure – representing $170 billion of waste globally per year – has stirred many to action, with scientists, publishers, funders and researchers committing to a number of initiatives focused on improving research culture and creating better, reproducible and transparent science.
While it is heartening that the problem of research waste is being recognised and action is being taken, there does not seem to be any single, easy solution given the current incentive structure. However, one approach is to make better use of governance and ethics processes. The need for such processes is famously laid out in the World Medical Association’s authoritative Declaration of Helsinki, first adopted in 1964 in response to various 20th-century abuses perpetrated in the name of science. Alongside enshrining important principles, such as consent, this declaration has been expanded to include other important aspects of research quality, such as the role of ethics committees and greater transparency through mandated trial registration and reporting.
Unfortunately, if done badly, the implementation of these safeguards can lead to significant bureaucracy that can itself create waste by slowing down or even preventing important research. The challenge is therefore to design systems that can detect and prevent research waste without becoming part of the problem. However, such systems should also be difficult to subvert or avoid entirely – something that may, unfortunately, have happened because of the pressures of the pandemic.
Researchers have clearly made an enormous contribution to fighting Covid-19. But the sheer scale of the funding given out, the rapid change of focus by many scientists and institutes, and the watering-down of some research governance processes, mean that it is very unlikely that all the effort and investment has been put to good use. The fear is that the 85 per cent figure for research waste could be a lot higher for pandemic-related efforts.
As the dust starts to settle on what science got right and wrong during the pandemic, expect growing scrutiny of where all the research funding went. While certain allowances may be acceptable due to the rush to tackle Covid-19, the fear is that if it does turn out that a significant amount of effort and funding has failed to produce any tangible outcomes, this will severely dent public trust and possibly public appetite for future research funding. It therefore remains to be seen whether, despite the clear successes, the pandemic will turn out to be an overall positive or negative for science.
Simon Kolstoe is reader in bioethics at the University of Portsmouth.
Source: Getty (edited)
UK biomedicine rose to the challenge of Covid-19 with some impressive achievements. The Oxford-AstraZeneca collaboration launched a cheap vaccine for worldwide use in only 10 months. NHS research networks ran large trials rapidly to test effective treatments. But these successes should not cause complacency. Why? Because many aspects of research culture – in the UK, as elsewhere – still limit our ability to combat global disease outbreaks.
The World Health Organisation built a system of clinical trial registries as part of the international response to severe acute respiratory syndrome (Sars) and Ebola. One of the founders was the UK’s primary registry, ISRCTN, which I chair. The WHO registry network operates to a common standard, enabling information from trials worldwide to form a body of evidence. Systematic reviews consolidate reported results, allowing the strengths of some studies to compensate for the weaknesses of others. The balance of evidence about effective interventions can then be a firm platform for policy, professional guidelines and licensing.
However, for decades, reporting to WHO registries lagged behind. What will it take to make academic triallists report their results routinely, as most commercial health researchers do? US and EU law require both public registration and timely reporting of trials, and last year the US began to enforce the law on reporting results, while from this year, EU countries will impose penalties for non-compliance. Among other measures in its impressive transparency strategy, Make It Public, the UK’s Health Research Authority requires a summary report a year after a clinical trial ends. From 2022, it will register all UK trials with the ISRCTN registry on approval. But UK law does not sanction non-reporting, despite continued lobbying for clinical trials transparency.
And why just clinical trials? The HRA’s remit should lead it to promote transparency across all health research. For example, research using personal data is increasingly important and contentious. Some research may well not be reproducible. But by working with other sectors on applying consistent duties and reporting standards that normalise transparency and reproducibility across all data-driven research, the HRA and equivalent agencies around the world could play an important role.
In response to political concern about research integrity, UK Research and Innovation is establishing a UK Committee on Research Integrity. Can it address the reproducibility crisis? It is not a watchdog: it will be another forum to discuss how standards and expectations can be set across the sector. The UK Research Integrity Office and the UK Reproducibility Network already discuss standards and generate practical guidance. Can UK CORI prompt concerted action from research funders, national academies, publishers and governments, as well as university and scientific leaders, to transform damaging elements of research culture?
The reproducibility crisis owes less to misconduct than to persistent incentives that inhibit collegiate scientific effort, and these are not new. Competitive funding has strengths, but it can shape a mindset in which bidding takes priority over verifying results or reporting contestable outcomes. Many researchers believe that publishers prefer novel findings. That belief can incentivise data manipulation. Insecurity may make researchers feel vulnerable to having their ideas stolen. All these factors inhibit the teamwork needed for self-critical science and well-documented, reproducible findings. The Concordat on Research Integrity includes accountability for the research environment. Making that a reality could be a valuable focus for CORI.
Amid a deluge of misinformation, the pandemic is stimulating stronger public engagement in the quality of scientific evidence. To build faith in governments’ ability to apply sound science across sectors, from climate and energy to public health and behaviour, science should consider how researchers can collaborate and pool findings so as to focus on gaps in the evidence supporting important decisions.
For example, impactful clinical research depends on achieving the statistical power to demonstrate clear findings, as well as on faithful reporting. One lesson from the pandemic could be that priority-setting and careful preparation enable science to do less but better. In 2021, the French parliament noted “a proliferation of trials” in a critical report on its research response to Covid-19. France had “conducted 365 clinical trials (compared with 415 by the US, 164 by Germany and 140 by the UK)”, but “most have not been able to reach reliable conclusions, in particular because of too few patients, thus wasting available resources.”
The UK chancellor’s 2021 budget confirmed billions of pounds to establish the UK as a global life sciences superpower. One test of success will be whether the nation discovers and licenses more game-changing treatments. But reproducibility will be another test of leadership in the international scientific response to public health emergencies. It might not gather headlines, but UK lawmakers and funders could spur collaboration towards scientific outputs beyond the capability of any one nation.
Marc Taylor is chair of the ISRCTN registry, a curated database that works to improve the publicly available information about clinical trials and related health research.