Building and testing at scale often involves codebases with millions of lines of code, worked upon by thousands of developers. In this environment, it is a challenge to ensure that:
- Every change made by every developer is built and tested, through a continuous integration process.
- The work performed by one team is seamlessly integrated with work from all other teams in a given organisation.
- That the time elapsed for every developer to obtain feedback on the change that they’ve made is as short as possible.
Creating a distributed build and test framework
One solution to the problem of building as quickly as possible is simply delegating that responsibility — not to the machine that the developer is working on, or even to a server that the CI job has been dispatched to, but to a server farm. These machines all pool their resources to collectively perform build tasks faster than a single machine could, whilst caching their results so that compiler actions do not need to be repeated.
Such an idea is not new, with tools such as ccache and distcc existing for over 15 years. However, these tools have a fairly narrow focus, restricted to the dispatch of compilation tasks from clang or gcc to a battery of servers, and the caching of those results. This is an important part of build, but is by no means the only consideration. For this to be useful for developers in large projects there needs to be:
- Consistency in environment: The build and test environment must be well defined and trusted. There must be no ambiguity in the results of a change, because of differences in developer and/or CI environment.
- Consistency in performance: The distributed build and test environment must be elastic enough in scale, such that, irrespective of the location of the change in the dependency stack, the time to build and test that change is predictable and bounded.
- Consistency in infrastructure: As the complexity of the codebase increases, the testing requirements increase, as does the amount of infrastructure required. Maintaining separate build and test harnesses is challenging and often results in duplication of infrastructure. Ideally, a distributed build and test environment should be a single environment, maintained by a single team.
To fulfil the above requirements, we require a toolkit, not just a single tool.
Creating a toolkit with the remote execution API
The remote execution API (REAPI) is a protobuf-based, open source API that provides a consistent way to manage the execution of binaries on a remote system. With a community of contributors, the remote execution API has created an ecosystem of clients and servers that can serve as a toolkit for a variety of build and test problems. However, what is interesting is that these clients focus on different use cases.
If your use-case is relatively simple, and you are simply concerned about making your C/C++ compilation faster, then tools such as recc and goma occupy that problem space, similar to tools such as ccache and distcc. However, where the complexity of the application being developed is large, a tool that has support for fast, incremental builds may be important, to facilitate consistency of performance. Bazel is the most well established and mature tool that exists in this space, but there are a variety of other tools, including pants, buck, and please. Finally, if you are concerned about the consistent construction of environments, then an integration tool such as BuildStream is useful, with support for multiple build systems, to create systems in multiple output formats, in a sandboxed environment.
To serve these clients with different use-cases we want a single, consistent infrastructure. This should be able to be elastically scaled, and be flexible in deployment, either to cloud services, or via an on premises deployment. In terms of implementations, there exist several, including Buildbarn, Buildfarm and Buildgrid.