Open Science Means Open Source--Or, at Least, It Should

Why open source was actually invented in 1665.

When did open source begin? In February 1998, when the term was coined by Christine Peterson? Or in 1989, when Richard Stallman drew up the "subroutinized" GNU GPL? Or perhaps a little earlier, in 1985, when he created the GNU Emacs license? How about on March 6, 1665? On that day, the following paragraph appeared:

Whereas there is nothing more necessary for promoting the improvement of Philosophical Matters, than the communicating to such, as apply their Studies and Endeavours that way, such things as are discovered or put in practise by others; it is therefore thought fit to employ the Press, as the most proper way to gratifie those, whose engagement in such Studies, and delight in the advancement of Learning and profitable Discoveries, doth entitle them to the knowledge of what this Kingdom, or other parts of the World, do, from time to time, afford, as well of the progress of the Studies, Labours, and attempts of the Curious and learned in things of this kind, as of their compleat Discoveries and performances: To the end, that such Productions being clearly and truly communicated, desires after solid and usefull knowledge may be further entertained, ingenious Endeavours and Undertakings cherished, and those, addicted to and conversant in such matters, may be invited and encouraged to search, try, and find out new things, impart their knowledge to one another, and contribute what they can to the Grand design of improving Natural knowledge, and perfecting all Philosophical Arts, and Sciences.

Those words are to be found in the very first issue of the Royal Society's Philosophical Transactions, the oldest scientific journal in continuous publication in the world, which published key results by Newton and others. Just as important is the fact that it established key principles of science that we take for granted today, including the routine public sharing of techniques and results so that others can build on them—open source, in other words.

Given that science pretty much invented what we now call the open-source approach, it's rather ironic that the scientific community is currently re-discovering openness, in what is known as open science. The movement is being driven by a growing awareness that the passage from traditional, analog scientific methods, to ones permeated by digital technology, is no minor evolution. Instead, it brings fundamental changes to how science can—and should—be conducted.

The open science revolution can be said to have begun with open access—the idea that academic papers should be freely available as digital documents. It takes the original idea behind the Royal Society's Philosophical Transactions—that news about discoveries should be set down and published—to the next level, by making that information freely accessible to all. Open access illustrates neatly the leap between analog and digital worlds. Where it would have been impossible to make the printed versions of the Royal Society's Philosophical Transactions generally available, the internet can potentially give everyone with an online connection cost-free access to every article posted online.

The same can be said of another important aspect of open science: open data. Before the internet, handling data was a tedious and time-consuming process. But once digitized, even the most capacious databases can be transmitted, combined, compared and analyzed very rapidly. For science, this is transformational, since it means that, in principle, other researchers can check experimental results by downloading complete datasets and carrying out their own, independent analysis and evaluation. Just as important, they can conduct new analyses to obtain results that go beyond the initial discoveries. The development of tools and techniques to mine data for new information, and to combine it with other datasets, has led to the spread of open data ideas and practices far beyond science.

The final leg of the open science tripod, and arguably the most radical one, is open source. One of the most important developments in science in the last few decades is the use of digital tools for research. These might be programs that gather data, or analyze it, or store it. But however it is used, software is indispensable for modern science. The problem is, much of the code is specifically written for each scientific investigation. Despite all the effort that goes into this indispensable tool, the fruits of that work are rarely shared with other scientists afterward.

Indeed, even as the open science movement gathers momentum, open source is conspicuous by its absence. For example, in 2016, the Council of the European Union issued its important policy statement titled "The transition towards an Open Science system", in which open source it not mentioned once. Neither does the 2017 European Open Science cloud declaration. The 2018 Advancing Open Science in the EU and the US workshop also seems to have overlooked this aspect. More recently, The National Academies Of Sciences, Engineering, And Medicine published a "New Framework to Speed Progress Toward Open Science". In it, the power and success of open source is mentioned no less than 20 times, which is great. Unfortunately, the final recommendations do not include promoting open source as part of open science.

A major new initiative in Europe, which has been hitting the headlines in scientific circles, is also silent on open source. With the support of the European Commission and the European Research Council, 11 national research funding organizations recently announced the launch of Plan S by the weirdly named cOAlition S. This is "an initiative to make full and immediate Open Access to research publications a reality". Open source could play an important role here, through the use of high-quality free software applications that make publishing easier and cheaper than current approaches. Instead, the plan simply says: "The importance of open archives and repositories for hosting research outputs is acknowledged because of their long-term archiving function and their potential for editorial innovation"—open archives, but not open-source archives, that is. Fortunately, influential figures are calling out this serious oversight. Commenting on Plan S, Peter Suber, widely recognized as one of the leaders in the open access world, writes:

The plan promises "support...for Open Access infrastructures where necessary." So far, so good. But the plan is silent on the importance of open infrastructure, that is, platforms running on open-source software, under open standards, with open APIs for interoperability, and preferably owned or hosted by non-profit organizations.

As the above indicates, governmental bodies and the top science organizations show a regrettable lack of interest in working with open source in order to boost open science. That's surprising and unacceptable, since much of the code written by researchers has been funded by the public. There is, therefore, a compelling case that all such software must be released under an open-source license to allow anyone—including the people who paid for it with their taxes—to re-use it however they wish.

In the face of that indifference from the big funding bodies, grass-roots activists are doing what they can with their limited resources, and there are some hopeful signs of progress. For example, OPERAS, a European research infrastructure, has published a white paper exploring what open-source solutions are available for creating an open science scholarly communication infrastructure. Similarly, a recent post by Lettie Y. Conrad provides a useful survey of what "open" tools are available for open science:

For purposes of this project, we zeroed in on those tools provided by non-profit or community-based organizations using open source software, offering open data, via an open license, leveraging open standards where possible—basically, as open as humanly and technologically possible.

Conrad presented her work at a workshop on producing a Joint Roadmap for Open Science Tools. What's striking is that among the participants in the workshop, the only mainstream name from the Open Source world is Mozilla. This shows that alongside the massive failure on the part of research funding bodies to embrace open source as part of the solution, there is a similar failure of open-source projects to become active in this important area.

That's a real shame, because open science offers a huge opportunity for free software coders to take on new challenges and create some exciting and innovative programs. As well as enriching the Open Source community and its projects, such a move also would help accelerate the open science revolution. It's surely what the founders of the Royal Society's Philosophical Transactions would have wanted.

Glyn Moody has been writing about the internet since 1994, and about free software since 1995. In 1997, he wrote the first mainstream feature about GNU/Linux and free software, which appeared in Wired. In 2001, his book Rebel Code: Linux And The Open Source Revolution was published. Since then, he has written widely about free software and digital rights. He has a blog, and he is active on social media: @glynmoody on Twitter.

Load Disqus comments