Wed 14 August 2019

Using the REAPI for Distributed Builds

by Santiago Gil , 2019 , Tags BuildGrid BuildBox Distributed Builds Remote Execution Bazel API REAPI

"computing" is licensed under CC0 1.0

Intro

As mentioned in the first part of this article, An Introduction to Distributed Builds and Remote Execution, there is no single solution to the problem of building software in a distributed manner. This description will focus on a solution based on the Remote Execution API.

Remote Execution API

The REAPI defines a standardized way of implementing a distributed builds solution. It employs Google's Protocol Buffers (or protobufs) to define messages and gRPC for remote procedure calls. Its specification is published in this .proto file: remote_execution.proto.

Build client

A build client is a tool that assists with the building of a software project. It keeps track of the sources of the project and its dependencies as well as the work that should be performed in order to build the software component and package it for use.

There are several clients with different approaches. Some clients, like Bazel, might require the user to write a description of what the project consists of and what its dependencies are. They will later parse those descriptions and, using rules, infer how the project should be built. That approach has the benefit of allowing the build tool to construct a dependency graph and do a smart use of that information in order to minimize the work. Other clients, for example recc, take a simpler approach by resigning to having that detailed information but sparing users from having to write configuration files describing their projects. That makes the transition from local building to remote execution simpler.

In all cases, clients need to determine what files should be taken as inputs, what commands should be performed on them and what outputs are to be expected.

Content-Addressable Storage (CAS)

The CAS is a crucial element of the remote execution/distributed builds architecture and is shared across the different components. It is essentially a database that stores arbitrary binary data. Each entry, a binary blob, is indexed by a Digest: a value containing a hash of the data (typically SHA-256) and its size in bytes. While the hash alone would be enough to uniquely identify a blob, having the size allows implementations to easily predict the resources that would be needed to service a particular request.

The CAS supports two elementary operations: creating a new entry and fetching an existing one. For the latter it is necessary to know the hash of the blob that is to be fetched, thus it is said to be content-addressable. Digests are used all across the REAPI to refer to specific pieces of data.

Because there are no limitations on the actual contents, the CAS server can store not only source files and results, but also serves as a cache of protobuf messages of the Remote Execution API. (For example, Directory messages that represent a tree structure are stored along the files contained in it.) Therefore we refer to entries in the CAS as blobs instead of files.

The REAPI does not mandate or limit what storage back end can a CAS server use, so implementations range from storing data in memory to using a local filesystem to relying on a cloud solution like S3.

The methods that the CAS server implements are:

  • BatchReadBlobs()/BatchUpdateBlobs(): store or retrieve multiple blobs at once
  • GetTree(): fetch a whole directory
  • FindMissingBlobs(): given a list of digests, returns which ones are not present in the CAS, allowing a client to avoid unnecessary transfers

Remote Execution server

The remote execution server acts as a coordinator between a pool of workers offering computing power and clients requesting actions to be performed. Once again, there are different approaches for designing and implementing this server although they all share the same basic functionality: A build client sends a request for work and waits for it to be carried out while receiving status updates of the progress; the server proceeds to look at the available workers and tries to schedule that work to be performed in the most efficient way.

Before submitting a request to the remote execution server, the build client first needs to make sure that all the input files are available in the CAS, where the remote execution server and the worker that will end up picking up the task will be able to access them.

Once all the required input files are stored in the CAS server, the client can then proceed to create an Action message. That message contains:

  • The digest of a Command to run (which is represented as another REAPI message) and its arguments
  • The digest of the root directory
  • A list of the files and directories that are expected as output

The Command message in turns contains:

  • The name of the command to execute and its arguments
  • The directory where that command is to be run
  • A list of environment variables
  • The list of files and directories that are expected to be produced as outputs

After those messages are stored in CAS, the client can invoke the Execute() method on the remote execution server, passing as an argument the digest of the Action message.

Workers

At that point the execution server must carry out the actual work. The command will not be executed by the execution server itself, but rather that is going to be delegated to a machine called a worker. An execution server handles multiple remote workers, and is in charge of distributing work across them in an efficient manner.

Workers might have different properties: for example, a worker might be a Linux system running on x86-64 while another runs FreeBSD on ARM. For that the client can use the Platform Properties section of Command to note any specific requirements that the operation might have and the server will accommodate them.

Worker machines can go off-line without notice due to network errors or crashes, so one of the challenges faced by the remote execution server is to ensure that those scenarios the work is handed to another worker.

After the remote execution server finds an available worker that meets the requirements, it sends the necessary information to the worker so that it can start running. The worker will parse that information, download the necessary files from the CAS, set up an environment and launch the command. Once it finishes executing (either successfully or after aborting due to an error), it will produce an ActionResult message, store it in the CAS, and return its digest to the remote execution server.

Execution results

An ActionResult message contains all the information that the command execution produced:

  • A list of output files, directories and symbolic links (consisting of the digest of the actual contents plus metadata like where in the working directory they were written and, in the case of files, whether they are executable)
  • The exit code of the command
  • The data that the command wrote to stdout and stderr

The remote execution server will forward that message to the client so that it can then fetch all the outputs directly from the CAS. Once it does that the results are the same as if the command had been executed locally.

Caching and avoiding work

A side effect of that procedure is that the ActionResult message and all the outputs generated by the executed command are stored in the CAS, where they are accessible to anybody that has access to it. Clients should, before sending an Action to the remote execution server, use the GetActionResult() call with that Action's digest and, if the action has already been performed, the results may be fetched directly without having to do any computation. That is the essence of what makes remote execution so powerful.

Expiry

Because storage is a limited resource, an important functionality of CAS implementations is providing a mechanism to clean data that is not longer needed.

A simple approach is to adopt a LRU (Least Recently Used) policy, deleting the entries based on when they were last accessed. However, because blobs are generally consumed in sets (for example, some output files and an associated ActionResult, or an Action and a Command), more efficient strategies involve not only considering blobs in isolation but also what role they play in its relation to others. That prevents the efficiency of the cache from suffering when a client finds that only a portion of the data is cached and the rest needs to be fetched.

Worker implementation details

There are multiple considerations when designing and implementing a worker.

Communication

There are different ways of implementing the communication between the remote execution server and the workers.

A standardized protocol for that interaction is defined by Google's Remote Worker API (RWAPI). That protocol uses a pull model: the workers contact the remote execution service asking for work to do. They also constantly send messages to the server advertising their current status.

Tooling

A worker needs to have the tools that clients might expect, for example a certain version of a compiler. The simplest way is installing those tools directly in the worker, advertising or agreeing with clients beforehand on what will be available, and use the worker's system to execute client commands (that is how buildbox-run-hosttools is implemented).

However, that can prove inadequate if there is a huge combination of tools and/or versions that need to be provided. Also, in some scenarios, for example when reproducibility is required, there is a need to have absolute control of the execution environment. An approach to this could be allowing clients to submit a root filesystem or root image that contains a complete environment and execute the command using that image instead of using the tools provided by the worker's system.

Sandboxing

Another challenge, related to security, is isolating a client's command from the worker's system and limiting the resources that commands can access (for example, it might be dangerous to allow network access). In order to protect the worker system, some mechanism of sandboxing needs to be implemented by the worker.

A real-world example

To illustrate the different steps in remote execution, we are going to build this very simple C++ program using Remote Execution.

#include <iostream>

int main() {
  std::cout << "Hello, Remote Execution World!" << std::endl;
}

recc

For this example we will use recc as a client. The Remote Execution Caching Compiler, an open source project started by Bloomberg L.P., is a wrapper around a compiler like GCC. Each time the compiler gets called it looks at the arguments and if possible replaces that invocation with a Remote Execution call.

If we were to build our small hello.cpp file locally, we could simply execute g++ hello.cpp -o hello. But, because recc currently supports compile commands only, we will need to separate the compiling and linking steps. That is:

> recc g++ -c hello.cpp -o hello.o
> g++ hello.o -o hello

The first step will invoke recc with a compile command that will take hello.cpp and write the object code to hello.o. recc will determine that the call to g++ is a compile command, and thus can be replaced with a remote execution call. The second step will run locally and link that object code to obtain a binary that we can run.

Step 1: building the messages

Command

arguments: ["g++", "-c", "hello.cpp", "-o", "hello.o"]
output_files: ["hello.o"]

Action

command_digest: "22c2f5014459c3f9c262b5ddc5f7acac40929eca4f9438d6c2f2331c5d8e108b/42"
input_root_digest: "a305bea9aacce4b741b6a783ea24e6b49163d9de4db3a86c5f2888445aa1734d/83"

The value of command_digest contains the digest of the Command message described above, and the input_root refers to a Directory message that contains the input files. All of those messages will need to be available from the CAS.

Step 2: checking the cache

Before uploading those messages to the CAS, recc will issue a GetActionResult() with the Action's digest. If no ActionResults are cached for that Action, it will proceed to upload both protobuf messages and the contents of the directory containing hello.cpp to the CAS.

In the best case, the CAS will have an ActionResult stored and that will allow recc to just download the outputs without invoking to the remote execution server and waiting for the command to run.

Step 3: Calling Execute()

After finding that the work is not cached and uploading the necessary data, recc will call Execute() on the server. The ExecuteRequest sent in that call will contain a digest to the Action uploaded to the CAS in the previous step.

At this point the remote execution server will fetch the Action from the CAS, proceed to find a worker and assign it to run our command.

When the worker is done, it will store an ActionResult in the CAS. The digest of that ActionResult will then be returned to the server which in turn will forward it to the client.

Step 4: Fetching the results

Once Execute() returns, recc gets the ActionResult message containing the information about the command's execution.

ActionResult

output_files {
  path: "hello.o"
  digest: "9e1b02f45d705408bf7a8e48945afd1e9cb4f94cf69deb2f36b381fe880dc27f/2792"
}
output_directories {
}

That allows recc to iterate through the digests in the output_files and output_directories fields, fetch those blobs from the CAS, and finally write them to disk.

> ls
hello*  hello.cpp  hello.o
> ./hello
Hello, Remote Execution World!
> sha256sum *
a47684c46c5e9e5f64b465fc6790a56ae2321ff7b86c87545ca33c56284de616  hello
d4fb49aea82924340a1e5875214f072f2b8f6a0a511d17853b5ad2ed14667178  hello.cpp
9e1b02f45d705408bf7a8e48945afd1e9cb4f94cf69deb2f36b381fe880dc27f  hello.o

At that point the intermediate result, hello.o, is available to be linked before completing the process of building hello.cpp. For users, the fact that the operation took place remotely is transparent: other than the time it took, they see no difference with having compiled and linked using the local GCC.


Would you like to learn more?

Get in touch

Learn about Buildgrid

Learn about Buildstream