You may be familiar with Remote Execution API (REAPI) - Codethink's Santiago Gil wrote an excellent article. But have you heard about RAAPI?
The Remote Asset API is related to the REAPI and exists to enhance solutions that leverage remote execution using this API. The API is split into two parts, Fetch and Push.
In this blog post we will talk through the Remote Asset API, describing its components and going into detail about the server-side implementation which has been worked on here at Codethink.
Before we begin, let's start with a quick definition what we mean by the term "client" in this post. In the context of RAAPI, a client is any piece of software which uses the API calls to request fetching of data or to push data. It will likely be a build client which uses the REAPI, such as Bazel or BuildStream.
Tom Coldrick talks about "bb-remote-asset: A Remote Asset API Server Implementation" at the Build Meetup 2021
What is RAAPI good for?
Before we go into the technical side, we'll quickly look at what this API is for. There are several reasons this API is desirable.
Firstly, it allows a server to download assets, such as source files for builds. This can reduce the network usage for a build by the client. Instead of the client needing to download the blob, determine if it is present in the CAS and upload it to the CAS if necessary. The client can send a request to the server, which will download the file to the CAS if needed. This is also highly beneficial if the client’s own network connection is slow and the server is located somewhere with a faster connection.
It can also act as a cache of sources or even results of remotely executed actions. For example, if a client needs to use a particular git repo, it can take advantage of another client who may have already requested it from the asset server. Again this will result in a significant reduction in time and network usage.
A Remote Asset client can make a Fetch request to the server, consisting of a set of URIs and additional metadata. For example, there are fields relating to the age of the data and timeouts, so the client may state that it doesn’t want data that is more than a day old. The client is also given the option to define Qualifiers in the Fetch request, which are arbitrary key-value pairs to give further detail to the server about the nature of the request. The client will expect a response from the server that either provides the Digest of the requested data, which is in the CAS, or an indication of failure. We’ll look at the Qualifiers later, for now, we’ll focus on what the server will do with such a request.
There is no single method that the server must use to generate a response to an incoming request. The only hard requirement imposed by the API is the format of the response.
The server may attempt to download the data that the client has
requested to the CAS using the URIs it has been provided in the request.
A simple HTTP-based fetcher could do this, for example, or some other method
git clone could be used.
Alternatively, it may instead check if the requested data is already cached
from a previous
Push request. As is easy to imagine, some combination
of checking caches and attempting to download data when it is requested seems
to be a sensible choice.
If a client makes a push request to the server, it must provide a set of URIs and optional Qualifiers as it would for a fetch request. However this time, it will give the Digest of the blob to associate this identifying data.
The data that the Digest corresponds to must be in the CAS, and the asset server must store the association of URIs/Qualifiers to Digest in the same way that it would if it had fetched the data.
So far, we’ve mentioned Qualifiers without explaining what they are or what
their role is in the API. Qualifiers exist to give specific metadata about the
data being fetched and stored to ensure it is what the client needs. They can be
used to specify a commit or branch of a version-controlled repo or provide a
checksum for the desired data to ensure that the correct thing has been
downloaded. Another use is to give the server a hint on how to fetch the data.
For example, if it is a git repo,
git clone may be useful.
Qualifiers can be any Key-Value pair of strings. However, there are a few standard qualifiers mentioned in the API at present (although they are still optional to support for the server): resource_type: a description of the type of resource. This is where the client may specify that the resource is a git repo for example. The API states that values should be an existing media type as defined by IANA. checksum.sri: a checksum to verify the fetched data against, as described above. directory: a relative path of a subdirectory of the resource. It allows the client to get the Digest of only the subdirectory it is interested in. vcs.branch: a version control branch to checkout before calculating and returning the Digest. * vcs.commit: a version control commit to checkout before calculating and returning the Digest.
Many parts of the API are stated to be optional. This even includes the existence of a Push server.
An optional feature worth mentioning is the server's option to transform assets that it fetches. If a directory is requested from a URL that points to a tarball, the server will unpack the tarball and return the directory's Digest. Or vice versa, if a directory is requested from a URL that points to a blob.
What implementations exist
Currently, there are few implementations of the Remote Asset API, for example bazel-remote and bb-remote-asset - a solution developed by engineers here at Codethink.
The implementation provided by bazel-remote, as the name suggests, is specific
to the needs of Bazel. Currently, Bazel only uses the Fetch side of the API,
FetchBlob. Thus this is the only service from the API
implemented in bazel-remote, and it is "very experimental".
As for bb-remote-asset, it is a much more complete implementation of the API, as it is intended to be far more client-agnostic than bazel-remote. We will discuss what bb-remote-asset implements and look at recent additions and potential plans in the next section.
What features does bb-remote-asset implement?
This implementation aims to be versatile and client-agnostic. That is to say that bb-remote-asset can run in different setups depending on what the client may need.
So far, the project implements Pushing to and Fetching from the server. This is done by keeping a record of the assets which have been pushed to it and allowing them to be fetched. It also has limited support for downloading blobs using an HTTP fetcher. The support is limited in the sense that it only works for blobs, directories cannot be fetched in this way currently.
It is also possible to fetch git repositories as directories if the server is configured correctly. This will be covered in a bit more detail in the following section.
Of the five standard qualifiers mentioned previously, bb-remote-asset
supports four to some extent. The only one not currently used at all is
Recently there have been a few changes made to the project. Firstly, it
has become possible to cache assets using the action cache of a
Buildbarn remote execution server. This requires conversion from a
representation of an asset to an
ActionResult from the REAPI. The
benefit of this is that the overall amount of storage being used can be
reduced as there is no need for separate storage to be set up.
Hand-in-hand with the previous change, another adjustment allows the use of
remote execution workers to fetch blobs. This is where the previously mentioned
git repositories come in. Remote execution can be leveraged for two values of
resource_type qualifier: ' application/octet-stream' and
'application/x-git'. The former is handled by
wget. This can be combined with
some authorisation if required and a checksum to ensure the data's validity. The
latter value of the
resource_type will be handled by a call to
git clone and
can be combined with a branch or commit qualifiers to cause the correct revision
to be checked out before the Digest is returned to the client.
What might be added in future?
One feature that would be nice to add in the future is the unpacking and packing of assets mentioned earlier in this post. This would implement HTTP-based fetching of directories possible, as the archive could be unpacked and the Digest of the directory returned. Currently, an attempt to fetch a directory using HTTP will fail, and a mismatching request, which is to say a fetch blob request which causes a directory to be fetched, is also currently an outright fail.
How to contribute to bb-remote-asset
The Remote Asset API is still in its infancy. As mentioned, there are only two implementations to our knowledge at this time. There is a lot of potential for change and improvement in both the API and the clients and servers that use it.
As for bb-remote-asset, it is currently actively maintained. Contributions are welcome from anybody, be it opening issues for bugs or feature requests or writing pull requests and being involved in the development. Some familiarity with REAPI, Golang and using Bazel will certainly help if you wish to contribute to the code, but they are by no means required, so don't be discouraged from getting involved if you don't have experience with these things. We'd also love to hear from anyone who is trying out bb-remote-asset and would like to encourage more people to give it a go. We are in #buildbarn in the BuildTeam Slack group and are happy to offer support and answer questions there.
Related to the blog post:
- Automated Linux kernel testing
- Native compilation on Arm servers is so much faster now
- Higher quality of FOSS: How we are helping GNOME to improve their test pipeline
- RISC-V: A Small Hardware Project
- Why aligning with open source mainline is the way to go
- Build Meetup 2021: The BuildTeam Community Event
- A new approach to software safety
- Does the "Hypocrite Commits" incident prove that Linux is unsafe?
- ABI Stability in freedesktop-sdk
- Why your organisation needs to embrace working in the open-source ecosystem
- RISC-V User space access Oops
- Tracking Players at the Edge: An Overview
- Running a devroom: FOSDEM 2021 Safety and Open Source
- Meet the codethings: Understanding BuildGrid and BuildBox with Beth White
- Streamlining Terraform configuration with Jsonnet
- Bloodlight: Designing a Heart Rate Sensor with STM32, LEDs and Photodiode
- Making the tech industry more inclusive for women
- Bloodlight Case Design: Lessons Learned
- Safety is a system property, not a software property
- RISC-V: Codethink's first research about the open instruction set
- Meet the Codethings: Safety-critical systems and the benefits of STPA with Shaun Mooney
- Why Project Managers are essential in an effective software consultancy
- FOSDEM 2021: Devroom for Safety and Open Source
- Meet the Codethings: Ben Dooks talks about Linux kernel and RISC-V
- Here we go 2021: 4 open source events for software engineers and project leaders
- Xmas Greetings from Codethink
- Call for Papers: FOSDEM 2021 Dev Room Safety and Open Source Software
- Building the abseil-hello Bazel project for a different architecture using a dynamically generated toolchain
- Advent of Code: programming puzzle challenges
- Improving performance on Interrogizer with the stm32
- Introducing Interrogizer: providing affordable troubleshooting
- Improving software security through input validation
- More time on top: My latest work improving Topplot
- Cycling around the world
- Orchestrating applications by (ab)using Ansible's Network XML Parser
- My experience of the MIT STAMP workshop 2020
- Red Hat announces new Flatpak Runtime for RHEL
- How to keep your staff healthy in lockdown
- Bloodlight: A Medical PPG Testbed
- Bringing Lorry into the 2020s
- How to use Tracecompass to analyse kernel traces from LTTng
- Fixing Rust's test suite on RISC-V
- The challenges behind electric vehicle infrastructure
- Investigating kernel user-space access
- Consuming BuildStream projects in Bazel: the bazelize plugin
- Improving RISC-V Linux support in Rust
- Creating a Build toolkit using the Remote Execution API
- Trusting software in a pandemic
- The Case For Open Source Software In The Medical Industry
- My experiences moving to remote working
- Impact of COVID-19 on the Medical Devices Industry
- COVID-19 (Coronavirus) and Codethink
- Codethink develops Open Source drivers for Microsoft Azure Sphere MediaTek MT3620
- Codethink partners with Wirepas
- Testing Bazel's Remote Execution API
- Passing the age of retirement: our work with Fortran and its compilers
- Sharing technical knowledge at Codethink
- Using the REAPI for Distributed Builds
- An Introduction to Remote Execution and Distributed Builds
- Gluing hardware and software: Board Support Packages (BSPs)
- Engineering's jack of all trades: an intro to FPGAs
- Bust out your pendrives: Debian 10 is out!
- Why you should attend local open source meet-ups
- Acceptance, strife, and progress in the LGBTIQ+ and open source communities
- Codethink helps York Instruments to deliver world-beating medical brain-scanner
- Codethink open sources part of staff onboarding - 'How To Git Going In FOSS'
- Getting into open source
- How to put GitOps to work for your software delivery
- Open Source Safety Requirements Analysis for Autonomous Vehicles based on STPA
- Codethink engineers develop custom debug solution for customer project
- Codethink contributes to CIP Super Long Term Kernel maintenance
- Codethink creates custom USB 3 switch to support customer's CI/CD pipeline requirements
- Codethink unlocks data analysis potential for British Cycling
- MIT Doctor delivers Manchester masterclass on innovative safety methodology
- Balance for Better: Women in Technology Codethink Interviews
- Introducing BuildGrid
- Configuring Linux to stabilise latency
- Full archive