Archiving, Referencing and Citing Software Artefacts Made Easy

ACM SIGSOFT Blog
3 min readOct 24, 2022

By Pierre Alliez, Roberto Di Cosmo, Alain Girault, Benjamin Guedj,
Mohand-Said Hacid, Arnaud Legrand, Xavier Leroy, Nicolas Rougier, Manuel Serrano

Software is a significant part of our scientific production, and we believe it must be put on par with articles and data in our daily practice. This includes obviously our writing and publishing activities, but also our reviewing, editing, and application evaluation activities.

It is our responsibility, as computer scientists, to ensure that the specificities of software development are taken into full account in academic publishing.

For this, an essential first step is to ensure that the source code of software artefacts associated with research publications is properly archived, referenced, described, and cited, and not handled as just a generic bundle of data.

This is now possible by leveraging Software Heritage, a non-profit initiative launched by Inria in 2016 with support from UNESCO and a broad panel of sponsors worldwide, with the explicit mission to collect, preserve, and share the source code of all publicly available software. Designed specifically for software source code, today Software Heritage provides the best infrastructure for addressing these needs, for a variety of reasons:

  • as a universal archive, it collects all publicly accessible source code, with all its development history (commits, releases, etc.);
  • it allows to seamlessly trigger instant archival of publicly available repositories, in particular by using the Updateswh browser extension, available for Chrome and Firefox;
  • it uses intrinsic cryptographic persistent identifiers (known as SWHID that enable traceability and guarantee integrity for all the source code, without relying on any third party;
  • it is entirely free of charge for the users, both individuals and institutions;
  • it allows to browse and reference source code at all levels of granularity (releases, commits, directories, files, and even code snippets);
  • the biblatex-software citation style extension (available in CTAN and TeXLive) allows LaTeX users to properly cite and reference source code in their articles. It also works with Overleaf;
  • a simple HOWTO web page details all the steps.

    For all these reasons, Software Heritage is now identified as a key infrastructure in the Report on Scholarly Infrastructures for Research Software of the European Open Science Cloud, and recommended for the long term archival of research software in the recently announced French National Plan for Open Science.

    Using Software Heritage for archiving and correctly citing software is extremely simple, as can be seen in the dedicated HOWTO page, and a number of journals are adopting it for the software artefacts associated with their publications (see for example this statement from eLife).

    We are writing today to ask for your help in raising awareness, and fostering adoption of these best practices in your community, as well as ACM as a whole, ensuring that:
  • Software Heritage is (one of) the recommended archives for obtaining the appropriate ACM Artifact Available badge;
  • Software Heritage is (one of) the recommended archive for Artifact Evaluation Committees;
  • Conference proceedings and journal bibliographic styles support proper citation of software with links into the Software Heritage archive, and accept manuscripts that use the biblatex-software citation style extension.

We will be happy to provide more details and help to move forward in this direction.

Disclaimer: The posts in the SIGSOFT Blog are written by individual contributors and any views or opinions represented in their posts are personal, belong solely to the blog authors and do not necessarily represent those of ACM SIGSOFT or ACM.

Call for Contributions: Please consider contributing to the SIGSOFT Blog.

--

--

ACM SIGSOFT Blog

SIGSOFT is the ACM Special Interest Group on Software Engineering