Tue 06 April 2021

What is Remote Asset API?

You may be familiar with Remote Execution API (REAPI) - Codethink's Santiago Gil wrote an excellent article. But have you heard about RAAPI?

The Remote Asset API is related to the REAPI and exists to enhance solutions that leverage remote execution using this API. The API is split into two parts, Fetch and Push.

In this blog post we will talk through the Remote Asset API, describing its components and going into detail about the server-side implementation which has been worked on here at Codethink.

Before we begin, let's start with a quick definition what we mean by the term "client" in this post. In the context of RAAPI, a client is any piece of software which uses the API calls to request fetching of data or to push data. It will likely be a build client which uses the REAPI, such as Bazel or BuildStream.

Tom Coldrick talks about "bb-remote-asset: A Remote Asset API Server Implementation" at the Build Meetup 2021

What is RAAPI good for?

Before we go into the technical side, we'll quickly look at what this API is for. There are several reasons this API is desirable.

Firstly, it allows a server to download assets, such as source files for builds. This can reduce the network usage for a build by the client. Instead of the client needing to download the blob, determine if it is present in the CAS and upload it to the CAS if necessary. The client can send a request to the server, which will download the file to the CAS if needed. This is also highly beneficial if the client’s own network connection is slow and the server is located somewhere with a faster connection.

It can also act as a cache of sources or even results of remotely executed actions. For example, if a client needs to use a particular git repo, it can take advantage of another client who may have already requested it from the asset server. Again this will result in a significant reduction in time and network usage.

Fetch

A Remote Asset client can make a Fetch request to the server, consisting of a set of URIs and additional metadata. For example, there are fields relating to the age of the data and timeouts, so the client may state that it doesn’t want data that is more than a day old. The client is also given the option to define Qualifiers in the Fetch request, which are arbitrary key-value pairs to give further detail to the server about the nature of the request. The client will expect a response from the server that either provides the Digest of the requested data, which is in the CAS, or an indication of failure. We’ll look at the Qualifiers later, for now, we’ll focus on what the server will do with such a request.

There is no single method that the server must use to generate a response to an incoming request. The only hard requirement imposed by the API is the format of the response.

The server may attempt to download the data that the client has requested to the CAS using the URIs it has been provided in the request. A simple HTTP-based fetcher could do this, for example, or some other method like a git clone could be used.

Alternatively, it may instead check if the requested data is already cached from a previous Push request. As is easy to imagine, some combination of checking caches and attempting to download data when it is requested seems to be a sensible choice.

Push

If a client makes a push request to the server, it must provide a set of URIs and optional Qualifiers as it would for a fetch request. However this time, it will give the Digest of the blob to associate this identifying data.

The data that the Digest corresponds to must be in the CAS, and the asset server must store the association of URIs/Qualifiers to Digest in the same way that it would if it had fetched the data.

Qualifiers

So far, we’ve mentioned Qualifiers without explaining what they are or what their role is in the API. Qualifiers exist to give specific metadata about the data being fetched and stored to ensure it is what the client needs. They can be used to specify a commit or branch of a version-controlled repo or provide a checksum for the desired data to ensure that the correct thing has been downloaded. Another use is to give the server a hint on how to fetch the data. For example, if it is a git repo, git clone may be useful.

Qualifiers can be any Key-Value pair of strings. However, there are a few standard qualifiers mentioned in the API at present (although they are still optional to support for the server): * resource_type: a description of the type of resource. This is where the client may specify that the resource is a git repo for example. The API states that values should be an existing media type as defined by IANA. * checksum.sri: a checksum to verify the fetched data against, as described above. * directory: a relative path of a subdirectory of the resource. It allows the client to get the Digest of only the subdirectory it is interested in. * vcs.branch: a version control branch to checkout before calculating and returning the Digest. * vcs.commit: a version control commit to checkout before calculating and returning the Digest.

Optional Features

Many parts of the API are stated to be optional. This even includes the existence of a Push server.

An optional feature worth mentioning is the server's option to transform assets that it fetches. If a directory is requested from a URL that points to a tarball, the server will unpack the tarball and return the directory's Digest. Or vice versa, if a directory is requested from a URL that points to a blob.

What implementations exist

Currently, there are few implementations of the Remote Asset API, for example bazel-remote and bb-remote-asset - a solution developed by engineers here at Codethink.

The implementation provided by bazel-remote, as the name suggests, is specific to the needs of Bazel. Currently, Bazel only uses the Fetch side of the API, specifically only FetchBlob. Thus this is the only service from the API implemented in bazel-remote, and it is "very experimental".

As for bb-remote-asset, it is a much more complete implementation of the API, as it is intended to be far more client-agnostic than bazel-remote. We will discuss what bb-remote-asset implements and look at recent additions and potential plans in the next section.

bb-remote-asset

What features does bb-remote-asset implement?

This implementation aims to be versatile and client-agnostic. That is to say that bb-remote-asset can run in different setups depending on what the client may need.

So far, the project implements Pushing to and Fetching from the server. This is done by keeping a record of the assets which have been pushed to it and allowing them to be fetched. It also has limited support for downloading blobs using an HTTP fetcher. The support is limited in the sense that it only works for blobs, directories cannot be fetched in this way currently.

It is also possible to fetch git repositories as directories if the server is configured correctly. This will be covered in a bit more detail in the following section.

Of the five standard qualifiers mentioned previously, bb-remote-asset supports four to some extent. The only one not currently used at all is directory.

Recent additions

Recently there have been a few changes made to the project. Firstly, it has become possible to cache assets using the action cache of a Buildbarn remote execution server. This requires conversion from a representation of an asset to an ActionResult from the REAPI. The benefit of this is that the overall amount of storage being used can be reduced as there is no need for separate storage to be set up.

Hand-in-hand with the previous change, another adjustment allows the use of remote execution workers to fetch blobs. This is where the previously mentioned git repositories come in. Remote execution can be leveraged for two values of the resource_type qualifier: ' application/octet-stream' and 'application/x-git'. The former is handled by wget. This can be combined with some authorisation if required and a checksum to ensure the data's validity. The latter value of the resource_type will be handled by a call to git clone and can be combined with a branch or commit qualifiers to cause the correct revision to be checked out before the Digest is returned to the client.

What might be added in future?

One feature that would be nice to add in the future is the unpacking and packing of assets mentioned earlier in this post. This would implement HTTP-based fetching of directories possible, as the archive could be unpacked and the Digest of the directory returned. Currently, an attempt to fetch a directory using HTTP will fail, and a mismatching request, which is to say a fetch blob request which causes a directory to be fetched, is also currently an outright fail.

How to contribute to bb-remote-asset

The Remote Asset API is still in its infancy. As mentioned, there are only two implementations to our knowledge at this time. There is a lot of potential for change and improvement in both the API and the clients and servers that use it.

As for bb-remote-asset, it is currently actively maintained. Contributions are welcome from anybody, be it opening issues for bugs or feature requests or writing pull requests and being involved in the development. Some familiarity with REAPI, Golang and using Bazel will certainly help if you wish to contribute to the code, but they are by no means required, so don't be discouraged from getting involved if you don't have experience with these things. We'd also love to hear from anyone who is trying out bb-remote-asset and would like to encourage more people to give it a go. We are in #buildbarn in the BuildTeam Slack group and are happy to offer support and answer questions there.

Follow our news about Build Engineering

Complete the form and receive in your inbox more information about Build Engineering and Open Source.

Related to the blog post:

Other Content

Get in touch to find out how Codethink can help you

sales@codethink.co.uk +44 161 660 9930

Contact us