Building and testing at scale often involves codebases with millions of lines of code, worked upon by thousands of developers. In this environment, it is a challenge to ensure that:
- Every change made by every developer is built and tested, through a continuous integration process.
- The work performed by one team is seamlessly integrated with work from all other teams in a given organisation.
- That the time elapsed for every developer to obtain feedback on the change that they’ve made is as short as possible.
Creating a distributed build and test framework
One solution to the problem of building as quickly as possible is simply delegating that responsibility — not to the machine that the developer is working on, or even to a server that the CI job has been dispatched to, but to a server farm. These machines all pool their resources to collectively perform build tasks faster than a single machine could, whilst caching their results so that compiler actions do not need to be repeated.
Such an idea is not new, with tools such as ccache and distcc existing for over 15 years. However, these tools have a fairly narrow focus, restricted to the dispatch of compilation tasks from clang or gcc to a battery of servers, and the caching of those results. This is an important part of build, but is by no means the only consideration. For this to be useful for developers in large projects there needs to be:
- Consistency in environment: The build and test environment must be well defined and trusted. There must be no ambiguity in the results of a change, because of differences in developer and/or CI environment.
- Consistency in performance: The distributed build and test environment must be elastic enough in scale, such that, irrespective of the location of the change in the dependency stack, the time to build and test that change is predictable and bounded.
- Consistency in infrastructure: As the complexity of the codebase increases, the testing requirements increase, as does the amount of infrastructure required. Maintaining separate build and test harnesses is challenging and often results in duplication of infrastructure. Ideally, a distributed build and test environment should be a single environment, maintained by a single team.
To fulfil the above requirements, we require a toolkit, not just a single tool.
Creating a toolkit with the remote execution API
The remote execution API (REAPI) is a protobuf-based, open source API that provides a consistent way to manage the execution of binaries on a remote system. With a community of contributors, the remote execution API has created an ecosystem of clients and servers that can serve as a toolkit for a variety of build and test problems. However, what is interesting is that these clients focus on different use cases.
If your use-case is relatively simple, and you are simply concerned about making your C/C++ compilation faster, then tools such as recc and goma occupy that problem space, similar to tools such as ccache and distcc. However, where the complexity of the application being developed is large, a tool that has support for fast, incremental builds may be important, to facilitate consistency of performance. Bazel is the most well established and mature tool that exists in this space, but there are a variety of other tools, including pants, buck, and please. Finally, if you are concerned about the consistent construction of environments, then an integration tool such as BuildStream is useful, with support for multiple build systems, to create systems in multiple output formats, in a sandboxed environment.
To serve these clients with different use-cases we want a single, consistent infrastructure. This should be able to be elastically scaled, and be flexible in deployment, either to cloud services, or via an on premises deployment. In terms of implementations, there exist several, including Buildbarn, Buildfarm and Buildgrid.
Related to the blog post:
- RISC-V: A Small Hardware Project
- Why aligning with open source mainline is the way to go
- Build Meetup 2021: The BuildTeam Community Event
- A new approach to software safety
- Does the "Hypocrite Commits" incident prove that Linux is unsafe?
- ABI Stability in freedesktop-sdk
- Why your organisation needs to embrace working in the open-source ecosystem
- RISC-V User space access Oops
- Tracking Players at the Edge: An Overview
- What is Remote Asset API?
- Running a devroom: FOSDEM 2021 Safety and Open Source
- Meet the codethings: Understanding BuildGrid and BuildBox with Beth White
- Streamlining Terraform configuration with Jsonnet
- Bloodlight: Designing a Heart Rate Sensor with STM32, LEDs and Photodiode
- Making the tech industry more inclusive for women
- Bloodlight Case Design: Lessons Learned
- Safety is a system property, not a software property
- RISC-V: Codethink's first research about the open instruction set
- Meet the Codethings: Safety-critical systems and the benefits of STPA with Shaun Mooney
- Why Project Managers are essential in an effective software consultancy
- FOSDEM 2021: Devroom for Safety and Open Source
- Meet the Codethings: Ben Dooks talks about Linux kernel and RISC-V
- Here we go 2021: 4 open source events for software engineers and project leaders
- Xmas Greetings from Codethink
- Call for Papers: FOSDEM 2021 Dev Room Safety and Open Source Software
- Building the abseil-hello Bazel project for a different architecture using a dynamically generated toolchain
- Advent of Code: programming puzzle challenges
- Improving performance on Interrogizer with the stm32
- Introducing Interrogizer: providing affordable troubleshooting
- Improving software security through input validation
- More time on top: My latest work improving Topplot
- Cycling around the world
- Orchestrating applications by (ab)using Ansible's Network XML Parser
- My experience of the MIT STAMP workshop 2020
- Red Hat announces new Flatpak Runtime for RHEL
- How to keep your staff healthy in lockdown
- Bloodlight: A Medical PPG Testbed
- Bringing Lorry into the 2020s
- How to use Tracecompass to analyse kernel traces from LTTng
- Fixing Rust's test suite on RISC-V
- The challenges behind electric vehicle infrastructure
- Investigating kernel user-space access
- Consuming BuildStream projects in Bazel: the bazelize plugin
- Improving RISC-V Linux support in Rust
- Trusting software in a pandemic
- The Case For Open Source Software In The Medical Industry
- My experiences moving to remote working
- Impact of COVID-19 on the Medical Devices Industry
- COVID-19 (Coronavirus) and Codethink
- Codethink develops Open Source drivers for Microsoft Azure Sphere MediaTek MT3620
- Codethink partners with Wirepas
- Testing Bazel's Remote Execution API
- Passing the age of retirement: our work with Fortran and its compilers
- Sharing technical knowledge at Codethink
- Using the REAPI for Distributed Builds
- An Introduction to Remote Execution and Distributed Builds
- Gluing hardware and software: Board Support Packages (BSPs)
- Engineering's jack of all trades: an intro to FPGAs
- Bust out your pendrives: Debian 10 is out!
- Why you should attend local open source meet-ups
- Acceptance, strife, and progress in the LGBTIQ+ and open source communities
- Codethink helps York Instruments to deliver world-beating medical brain-scanner
- Codethink open sources part of staff onboarding - 'How To Git Going In FOSS'
- Getting into open source
- How to put GitOps to work for your software delivery
- Open Source Safety Requirements Analysis for Autonomous Vehicles based on STPA
- Codethink engineers develop custom debug solution for customer project
- Codethink contributes to CIP Super Long Term Kernel maintenance
- Codethink creates custom USB 3 switch to support customer's CI/CD pipeline requirements
- Codethink unlocks data analysis potential for British Cycling
- MIT Doctor delivers Manchester masterclass on innovative safety methodology
- Balance for Better: Women in Technology Codethink Interviews
- Introducing BuildGrid
- Configuring Linux to stabilise latency
- Full archive