Introducing the Remote Execution API Testing Project
When building software, we know that we need to be fast and we know we need to minimise any wasted time in the process. We need to get critical updates and features out to users as quickly as possible and we need to increase the productivity of teams who rely on an efficient edit and compile cycle. Introducing Remote Execution to builds will help achieve this, and for this, there is Bazel's Remote Execution API, which gives us the ability to parallelise the construction and validation phases of the software development cycle, and thus massively reduce elapsed time.
The API specifies protocols for both client and server. Assuming your client is Bazel, there are several options to choose from for the server implementation: Buildfarm, Buildbarn, BuildGrid or Google's Remote Build Execution (RBE). The first three are open source projects, and need to be set up on-prem and backed by the appropriate computing resource, whereas RBE is a service and comes with Google Cloud Platform included. But how do the different solutions compare? That's where the Remote Execution API Testing Project comes in. This blog post will introduce the project and it's goals, progress so far, what's up next, and how you can get involved.
Testing Bazel's Remote Execution API
The project is a community-driven initiative born out of discussions between folks working on Buildfarm, Buildbarn and BuildGrid, who thought it'd be a good idea to collaborate on creating an independent 'acid test' for all of the implementations. We are really interested in knowing which implementations make the best use of resources and improve execution times. To do this we aim to construct reproducible metrics of the behaviors of each of the systems we are comparing and see what their trend is as they are executed, allowing us to make judgements based on the data.
Progress so far
The initial goal to get the project off the ground was to put some basic Remote Execution testing in place. For this, we used Gitlab's CI pipelines - with Terraform, Kubernetes, and AWS - to spin up a Bazel build of Abseil that executes against each of the server implementations once a week. Upon completion, a PASS/FAIL badge is displayed. The pipeline works by pulling a pre-built version of Bazel, then building the latest Docker images for each server implementation and using Terraform to deploy a small Kubernetes cluster with EKS. After that, we use kubectl to deploy the individual services required for each of the server implementations and then set off the Abseil build job using Bazel and post the results to the README. This verifies that Remote Execution works with Bazel across all server implementations and we can see if anything falls over after one of the components has been updated. This has so far allowed us to catch a few bugs and raise these with the appropriate projects upstream.
Once these pipelines were up and running we focused on adding tests to record performance, capturing end to end build times. For this, we needed a bigger project than Abseil, so we chose to build Bazel itself. Whilst this was a good first step and highlighted some basic speed comparisons between each project, we wanted to dig deeper and be able to see more finer grained metrics such as CPU cost and memory usage. To do this we added a monitoring stack to the Kubernetes cluster, comprised of Prometheus, metrics servers (cAdvisor and node-exporter) and Grafana. The metrics servers collect data from the cluster at the pod and node level and provide an endpoint for Prometheus to retrieve. The data is presented using Grafana, which connects to Prometheus to query the metrics using Prometheus PromQL, and for each test run a dashboard is created which shows CPU and memory usage for each of the pods. The results are then stored as Gitlab artifacts in pdf format and also pushed to the Grafana dashboard hosting site, allowing for interaction with the data. See here for examples.
So far the project has created an overview of the compatibility status between different Remote Execution build clients and server implementations. We’d like to continue to enhance this and include the status for all projects in this space. For example, Bazel is not the only build client to work with the Remote Execution API; there are others which we've added tests for, such as BuildStream and RECC. These tests don't yet work with all of the server implementations, but we're working on adding these as well.
We'd also like to enhance performance analysis further by enabling gRPC tracing in the server implementations using open-tracing. This should produce huge amounts of data which can be analysed to identify where inefficiencies lie, which should prove invaluable in steering each server implementation towards performance improvements.
How can you get involved?
We would very much welcome any contributions that help us move closer to our goals. If you’re interested in the issues Remote Execution is trying to solve, reducing elapsed time and wasted developer time via parallelisation of builds, then we’d like to hear from you.
Are you working with another build client or server implementation that conforms to the Remote Execution API that you would like to see tested as part of this framework? Have you chosen Bazel as your build tool and now need to consider which Remote Execution solution to choose? Do you have any test projects that could be contributed?
If so, come and chat to us on Slack, tweet me @LaurenceUrhegyi or email me directly at firstname.lastname@example.org.
Follow our news about Build Engineering
Complete the form and receive in your inbox our latest updates about Build Engineering, BuildStream and Open Source.
- Using Git LFS and fast-import together
- Testing in a Box: Streamlining Embedded Systems Testing
- SDV Europe: What Codethink has planned
- How do Hardware Security Modules impact the automotive sector? The final blog in a three part discussion
- How do Hardware Security Modules impact the automotive sector? Part two of a three part discussion
- How do Hardware Security Modules impact the automotive sector? Part one of a three part discussion
- Automated Kernel Testing on RISC-V Hardware
- Automated end-to-end testing for Android Automotive on Hardware
- GUADEC 2023
- Embedded Open Source Summit 2023
- RISC-V: exploring a bug in stack unwinding
- Adding RISC-V Vector Cryptography Extension support to QEMU
- Introducing Our New Open-Source Tool: Quality Assurance Daemon
- Long Term Maintainability
- FOSDEM 2023
- Think before you Pip
- BuildStream 2.0 is here, just in time for the holidays!
- A Valuable & Comprehensive Firmware Code Review by Codethink
- GNOME OS & Atomic Upgrades on the PinePhone
- Flathub-Codethink Collaboration
- Codethink proudly sponsors GUADEC 2022
- Tracking Down an Obscure Reproducibility Bug in glibc
- Web app test automation with `cdt`
- FOSDEM Testing and Automation talk
- Protecting your project from dependency access problems
- Porting GNOME OS to Microchip's PolarFire Icicle Kit
- YAML Schemas: Validating Data without Writing Code
- Deterministic Construction Service
- Codethink becomes a Microchip Design Partner
- Hamsa: Using an NVIDIA Jetson Development Kit to create a fully open-source Robot Nano Hand
- Using STPA with software-intensive systems
- Codethink achieves ISO 26262 ASIL D Tool Certification
- RISC-V: running GNOME OS on SiFive hardware for the first time
- Automated Linux kernel testing
- Native compilation on Arm servers is so much faster now
- Higher quality of FOSS: How we are helping GNOME to improve their test pipeline
- RISC-V: A Small Hardware Project
- Why aligning with open source mainline is the way to go
- Build Meetup 2021: The BuildTeam Community Event
- A new approach to software safety
- Does the "Hypocrite Commits" incident prove that Linux is unsafe?
- ABI Stability in freedesktop-sdk
- Why your organisation needs to embrace working in the open-source ecosystem
- RISC-V User space access Oops
- Tracking Players at the Edge: An Overview
- What is Remote Asset API?
- Running a devroom at FOSDEM: Safety and Open Source
- Meet the codethings: Understanding BuildGrid and BuildBox with Beth White
- Streamlining Terraform configuration with Jsonnet
- Bloodlight: Designing a Heart Rate Sensor with STM32, LEDs and Photodiode
- Making the tech industry more inclusive for women
- Bloodlight Case Design: Lessons Learned
- Safety is a system property, not a software property
- RISC-V: Codethink's first research about the open instruction set
- Meet the Codethings: Safety-critical systems and the benefits of STPA with Shaun Mooney
- Why Project Managers are essential in an effective software consultancy
- FOSDEM 2021: Devroom for Safety and Open Source
- Meet the Codethings: Ben Dooks talks about Linux kernel and RISC-V
- Here we go 2021: 4 open source events for software engineers and project leaders
- Xmas Greetings from Codethink
- Call for Papers: FOSDEM 2021 Dev Room Safety and Open Source Software
- Building the abseil-hello Bazel project for a different architecture using a dynamically generated toolchain
- Advent of Code: programming puzzle challenges
- Full archive