The Covid-19 pandemic has brought many scientific issues to wide public attention, but even in these extraordinary times, the way computer coding is used in research is not a topic many would have predicted for mainstream discourse.
Nonetheless, the subject has burst into the open, mainly because of scrutiny of the code used in epidemiological modelling − in particular, the highly influential Imperial College London paper, led by Neil Ferguson, published just as the UK started going into lockdown.
The code underlying the modelling came in for criticism after it was posted to the public repository for programming, GitHub, although, according to a report in Nature this month, scientists who have tested the code have found that its results can be reproduced.
Bill Mitchell, director of policy at the British Computer Society (BCS), said that although it agreed that there was “no credible evidence” of major problems with the Imperial code, the episode had shone a light on the issue of how programming was performed and reviewed in academia.
The BCS released a position paper last month in which it said “the quality of the software implementations of scientific models appear to rely too much on the individual coding practices of the scientists” and called for professional software engineering standards to be used where scientific code formed the basis of policy.
Dr Mitchell, a former computing lecturer at the universities of Manchester and Surrey, said there were “lots of very, very standard things that you would expect in the software world” that are not always being done in science.
This included code being readily shared on public repositories such as GitHub; being written in such a way that it can be easily understood and tested by others; and tests being published so reviewers can easily try to replicate the results.
“It goes to the heart of doing science. You tell people what experiments you’ve done; you allow them to look at your working,” he said.
Dr Mitchell said his “very personal” view was that scientists might sometimes view coding as just a “mechanical way of generating data” and might not fully appreciate “just how much innovation and ingenuity and cleverness is embedded in their own code and how valuable that is to other people”.
Changing this culture − especially given the “intense” publish or perish pressures in academia − might require incentives similar to those seen in the open access movement, he said.
The “simplest thing” would be to say that all scientific software developed with public money must be made openly available. “I think suddenly when people realise that, ‘Oh my gosh, people are going to be looking at my code’, the standard will instantly improve,” Dr Mitchell said.
Others say the direction of travel is moving towards more openness, but there was a debate to be had about how to speed up progress.
“In my field, there has been a movement towards transparency for quite a number of years, and it is becoming more and more common for journals, reviewers and the community to require code to be made available with papers,” said Rosalind Eggo, assistant professor in infectious disease modelling at the London School of Hygiene and Tropical Medicine.
She added that one longer-term solution would be to invest more in employing research software engineers “who are experts in writing and translating scientific code and making it more efficient, shareable and, ultimately, more useful”.
“Making sure we have the resources that allow the hiring and long-term funding of software specialists would improve the quality of scientific code and hopefully make it easier to build efficient analysis, and to reuse and repurpose code,” she said.
Konrad Hinsen, a biophysicist at France’s National Centre for Scientific Research (CNRS) and an expert in scientific computing who often blogs on the issue, suggested that employing more research software engineers was a good idea.
However, he added, using them to help write code might be difficult for “small, exploratory projects that are done in informal collaborations”.
“You can’t just add a software expert with a very different working style to such a team. But you can still do after-the-fact code review before accepting results for publication,” he said.
This is where research software engineers could have a key role more generally, including through the traditional publishing process, he said, pointing out that some “pioneering journals” were already including code review as an “integral part” of the peer review process.
More broadly, Dr Hinsen added, the issue was one of “training enough people, and then employing them in appropriate jobs”. However, he was somewhat sceptical about whether progress could be sped up across all disciplines in science.
“Much scientific code is long-lived, and habits are even more subject to inertia. Faster improvement is not possible for scientific code in general, though it is in specific, well-defined subjects where motivation is high. Epidemiology might be in that situation right now,” he said.
后记
Print headline: Pandemic models spark calls to reveal more code