Tue 14 July 2020

Bringing Lorry into the 2020s

In the mid-2010s, Codethink developed the Baserock software suite for specifying, developing, and building Linux-based systems. The work has been used by several long-standing customers and Codethink even uses it for some of our own infrastructure. This blog will take you through work done to modernise one of the components of Baserock - Lorry.

Some of the components of Baserock are useful in their own right, and one of those is Lorry: a program for maintaining mirrors of source code. When building systems that include many third-party source components, it is important to make sure you keep copies of all the source code within your organisation. Otherwise, if an upstream repository or file server becomes temporarily or permanently unavailable, you might be unable to rebuild or deploy your system. You might also be unable to comply with copyleft license requirements to provide source to your users.

While some Git hosting services can maintain a mirror of an upstream Git repository, Lorry can mirror to Git from many other version control systems and from file-based upstream releases. The Lorry Controller (LC) component implements the services needed to run Lorry regularly for each mirror, and pulls the mirror configuration from another Git repository. One of my projects during lockdown has been modernising Lorry: making these components compatible with current versions of their dependencies and with various Git hosting services, documenting how to install and configure them, and building container images.

The earlier Trovekube project began the work of getting Lorry and LC running on Python 3, and added GitLab CI configurations which have helped me a lot with understanding their dependencies. As I tested out all their features, I found and fixed a few more issues with Python 3 compatibility, mostly involving distinctions between binary and text I/O.

As part of a Baserock Trove appliance, LC was originally co-installed with the Gitano hosting service (downstream host), and had specific support for mirroring the whole of another Trove (upstream host). Later on, support for Gerrit as downstream host, and for GitLab as downstream or upstream host, was added, but I found that this support had bit-rotted. There were also no internal abstractions that would allow adding another type of host cleanly. I have refactored support for each host type into classes, updated the Gerrit and GitLab support to work with current versions of those programs, and added support for Gitea as a downstream host. It turned out to be possible to run a test instance of each of these hosting services in a VM or system container on my development laptop, though it wasn't always the most comfortable experience!

The documentation for Lorry and LC now lists all their direct dependencies and the versions I've tested. For LC I documented all the installation steps and how to integrate it with each of the supported downstream host types.

Finally I needed to build container images using BuildStream. This part was probably the most challenging, since I had no prior experience with either technology. BuildStream is also a fast-moving project that hasn't been kept up-to-date in Debian, my usual development environment. I used the experimental "oci" plugin that can build container images in the OCI or Docker v1.2 format.

I found BuildStream's tutorial and documentation generally excellent, and was impressed at how easy it was to write definitions for building a component. Still, defining how to build all of Lorry and LC's dependencies would have required a huge amount of work. However, the Freedesktop-SDK already has BuildStream definitions for many common open source components, even including Git, OpenSSH, and Perl (used in some of Lorry's file importers). My colleague, Valentin David, helped me to understand how to reuse those, so that I only had to write definitions for the more obscure dependencies.

While my initial testing of Lorry and LC was done on Debian 9 "stretch", the last version where I could easily test all the supported version control systems, Freedesktop-SDK has much more recent versions of some of the dependencies including Git. The current version of Git creates additional files under .git that weren't expected by Lorry's test suite, but it was easy enough to fix the affected tests.

After I built the first container image, I still needed to learn how to use container managers and to add configuration for various components to work properly in the container. This was not a typical application container, as LC depends on systemd to run multiple services together. I used podman initially, since I understood that it had better support for containers with systemd. BuildStream's build environments are sandboxes with only a (virtual) root user, so I had to defer creation of the lorry service user and its files to first boot of the container using systemd-sysusers and systemd-tmpfiles. Container images should not contain any security credentials, so I also arranged to create SSH key pairs at first boot and to propagate access tokens from the container environment into a LC configuration file. The biggest stumbling block was in logging. Container managers normally run a single application process and expect it to log through its stdout or stderr, but this isn't possible in a multi-service container. However, both podman and Docker can optionally create a /dev/console device node that will also feed into their logs, and systemd-journald can be configured to log through this instead of to files.

This project was an interesting break from my usual work on kernel development and maintenance. I had the opportunity to explore and learn about various unfamiliar but important technologies. My colleagues helped a lot in getting me started and in reviewing my work toward the end. All of this work is now public under the CodethinkLabs group on GitLab.

Other Content

Get in touch to find out how Codethink can help you

sales@codethink.co.uk +44 161 660 9930

Contact us