Part of creating software involves translating source code into instructions that can be executed by hardware and packaging the results in a way that can be consumed by users. That process is known as building (although colloquially it sometimes can be referred to as compiling).
For small programs, the time it takes to build them can be very small. A simple Hello World might be compiled in less than a second using a laptop. However, as a software project gets bigger and more complex, that time can become significant. For example, building a Linux kernel (millions of lines of code) might take hours. Also, libraries and other dependencies that a project employs also need to be built, so those will also contribute to the time the building process takes.
During development, it is common to make some changes to the code, build the software to test those changes, decide that we need to make some more changes, and repeat. That means that the time that it takes to build becomes crucial, because it gets in the way of getting immediate feedback. That not only makes developers waste time; it also makes them lose their focus, and it has a huge impact on productivity.
Strategies to avoid duplicating work
There are some tools (like make, CMake or ccache) that run locally and provide ways of reducing the time that it takes to build a piece of software. They do so by trying to do as little work as possible by only building those parts that were modified between builds and reusing the output generated from a previous build for the next one. But even with those optimizations some building times can still prove too long.
Going one step further
Another way of speeding the process of building might be to get more powerful hardware (making all operations run faster). But getting more machines can also help: because there might be parts of the software system that can be built in parallel and then put together, we can hand those into different machines and work on them at the same time. That way the overall time for building our project will drop.
An approach to solve it
Here is where the Remote Execution solution comes into place. It consists of having a set of machines—let's call them workers—that are available to build anything that we give them. We can send them some source code, they build it, and return the results to us. In addition, they keep track of all the work they have done, remembering for example that when a certain command is executed on a file with a certain content a given output is produced. That allows them, if asked to build something they had already encountered before, to avoid having to do the same work twice (like the tools mentioned earlier).
Furthermore, if we were to share those workers among a team or a whole company the following might happen: Let's say Developer A builds the newer version of the project as soon as she gets to the office. Developer B comes in a bit later, planning to get to work into introducing changes also into the latest version of the project. He instructs the remote execution server to start building and... it builds instantaneously!
The effect is that now we not only get to use more powerful hardware (at least more powerful than a typical developer's laptop or workstation), but also that we have a shared cache. That cache could be shared at a team level or at a company level; or, in the case of open-source projects, globally.
This scheme also has the advantage of allowing developers to build software for a platform other than the one running in their computers without introducing any changes to their own systems. It also could be employed to guarantee that the environment that is used to do the building can be shared by everybody, which assists with reproducibility.
BuildGrid is a server (written in Python; more details here) that receives source code and instructions to perform on it and coordinates a set of workers to build that code. To interact with BuildGrid, developers use a build tool such as BuildStream, RECC, or Bazel.
Among the factors that BuildGrid must deal with are authentication, scalability, dealing with potential unresponsive workers.
BuildBox provides other important components:
- The code that is executed by the workers, the machines that do the actual work and might consist of different hardware platforms running different operating systems.
- Tools that prepare the environment in the workers. That is making sure that suitable compilers and other tools are available.
- Mechanisms to ensure that the commands that are executed in a worker are isolated from the worker's own system (sandboxing).
There are also other projects outside of Codethink that aim to solve the problem of building software in a distributed fashion. Some of them, like BuildGrid and BuildBox, communicate using two standard APIs designed by Google: the Remote Execution API (REAPI) and the Remote Worker API (RWAPI). This standardization provides a great deal of flexibility to the users of those tools. It also means that presently we are part of a continuous collaborative effort among other people and organizations that employ those standard APIs for their own solutions to this problem. Some of those discussion take place in the BuildTeam Slack as well as on the Remote Execution APIs Working Group mailing list, and everybody is welcome to join.
Working on these projects can prove very interesting due to the challenges that they present, especially when considering that solutions should scale to really big environments, where they will provide the most dramatic gains for developers. And because things are still in a first experimental stage, there is a lot of room to come up with new designs, think about implementation details, and play around with different approaches to problems.
Another big plus is that all the work we are doing is open source. So when the day comes and BuildGrid and BuildBox are mature enough they will be able to be easily adopted by anyone interested in improving the building times for their projects.