Introducing the Remote Execution API Testing Project
When building software, we know that we need to be fast and we know we need to minimise any wasted time in the process. We need to get critical updates and features out to users as quickly as possible and we need to increase the productivity of teams who rely on an efficient edit and compile cycle. Introducing Remote Execution to builds will help achieve this, and for this, there is Bazel's Remote Execution API, which gives us the ability to parallelise the construction and validation phases of the software development cycle, and thus massively reduce elapsed time.
The API specifies protocols for both client and server. Assuming your client is Bazel, there are several options to choose from for the server implementation: Buildfarm, Buildbarn, BuildGrid or Google's Remote Build Execution (RBE). The first three are open source projects, and need to be set up on-prem and backed by the appropriate computing resource, whereas RBE is a service and comes with Google Cloud Platform included. But how do the different solutions compare? That's where the Remote Execution API Testing Project comes in. This blog post will introduce the project and it's goals, progress so far, what's up next, and how you can get involved.
Testing Bazel's Remote Execution API
The project is a community-driven initiative born out of discussions between folks working on Buildfarm, Buildbarn and BuildGrid, who thought it'd be a good idea to collaborate on creating an independent 'acid test' for all of the implementations. We are really interested in knowing which implementations make the best use of resources and improve execution times. To do this we aim to construct reproducible metrics of the behaviors of each of the systems we are comparing and see what their trend is as they are executed, allowing us to make judgements based on the data.
Progress so far
The initial goal to get the project off the ground was to put some basic Remote Execution testing in place. For this, we used Gitlab's CI pipelines - with Terraform, Kubernetes, and AWS - to spin up a Bazel build of Abseil that executes against each of the server implementations once a week. Upon completion, a PASS/FAIL badge is displayed. The pipeline works by pulling a pre-built version of Bazel, then building the latest Docker images for each server implementation and using Terraform to deploy a small Kubernetes cluster with EKS. After that, we use kubectl to deploy the individual services required for each of the server implementations and then set off the Abseil build job using Bazel and post the results to the README. This verifies that Remote Execution works with Bazel across all server implementations and we can see if anything falls over after one of the components has been updated. This has so far allowed us to catch a few bugs and raise these with the appropriate projects upstream.
Once these pipelines were up and running we focused on adding tests to record performance, capturing end to end build times. For this, we needed a bigger project than Abseil, so we chose to build Bazel itself. Whilst this was a good first step and highlighted some basic speed comparisons between each project, we wanted to dig deeper and be able to see more finer grained metrics such as CPU cost and memory usage. To do this we added a monitoring stack to the Kubernetes cluster, comprised of Prometheus, metrics servers (cAdvisor and node-exporter) and Grafana. The metrics servers collect data from the cluster at the pod and node level and provide an endpoint for Prometheus to retrieve. The data is presented using Grafana, which connects to Prometheus to query the metrics using Prometheus PromQL, and for each test run a dashboard is created which shows CPU and memory usage for each of the pods. The results are then stored as Gitlab artifacts in pdf format and also pushed to the Grafana dashboard hosting site, allowing for interaction with the data. See here for examples.
So far the project has created an overview of the compatibility status between different Remote Execution build clients and server implementations. We’d like to continue to enhance this and include the status for all projects in this space. For example, Bazel is not the only build client to work with the Remote Execution API; there are others which we've added tests for, such as BuildStream and RECC. These tests don't yet work with all of the server implementations, but we're working on adding these as well.
We'd also like to enhance performance analysis further by enabling gRPC tracing in the server implementations using open-tracing. This should produce huge amounts of data which can be analysed to identify where inefficiencies lie, which should prove invaluable in steering each server implementation towards performance improvements.
How can you get involved?
We would very much welcome any contributions that help us move closer to our goals. If you’re interested in the issues Remote Execution is trying to solve, reducing elapsed time and wasted developer time via parallelisation of builds, then we’d like to hear from you.
Are you working with another build client or server implementation that conforms to the Remote Execution API that you would like to see tested as part of this framework? Have you chosen Bazel as your build tool and now need to consider which Remote Execution solution to choose? Do you have any test projects that could be contributed?
If so, come and chat to us on Slack, tweet me @LaurenceUrhegyi or email me directly at email@example.com.
- Automated Linux kernel testing
- Native compilation on Arm servers is so much faster now
- Higher quality of FOSS: How we are helping GNOME to improve their test pipeline
- RISC-V: A Small Hardware Project
- Why aligning with open source mainline is the way to go
- Build Meetup 2021: The BuildTeam Community Event
- A new approach to software safety
- Does the "Hypocrite Commits" incident prove that Linux is unsafe?
- ABI Stability in freedesktop-sdk
- Why your organisation needs to embrace working in the open-source ecosystem
- RISC-V User space access Oops
- Tracking Players at the Edge: An Overview
- What is Remote Asset API?
- Running a devroom: FOSDEM 2021 Safety and Open Source
- Meet the codethings: Understanding BuildGrid and BuildBox with Beth White
- Streamlining Terraform configuration with Jsonnet
- Bloodlight: Designing a Heart Rate Sensor with STM32, LEDs and Photodiode
- Making the tech industry more inclusive for women
- Bloodlight Case Design: Lessons Learned
- Safety is a system property, not a software property
- RISC-V: Codethink's first research about the open instruction set
- Meet the Codethings: Safety-critical systems and the benefits of STPA with Shaun Mooney
- Why Project Managers are essential in an effective software consultancy
- FOSDEM 2021: Devroom for Safety and Open Source
- Meet the Codethings: Ben Dooks talks about Linux kernel and RISC-V
- Here we go 2021: 4 open source events for software engineers and project leaders
- Xmas Greetings from Codethink
- Call for Papers: FOSDEM 2021 Dev Room Safety and Open Source Software
- Building the abseil-hello Bazel project for a different architecture using a dynamically generated toolchain
- Advent of Code: programming puzzle challenges
- Improving performance on Interrogizer with the stm32
- Introducing Interrogizer: providing affordable troubleshooting
- Improving software security through input validation
- More time on top: My latest work improving Topplot
- Cycling around the world
- Orchestrating applications by (ab)using Ansible's Network XML Parser
- My experience of the MIT STAMP workshop 2020
- Red Hat announces new Flatpak Runtime for RHEL
- How to keep your staff healthy in lockdown
- Bloodlight: A Medical PPG Testbed
- Bringing Lorry into the 2020s
- How to use Tracecompass to analyse kernel traces from LTTng
- Fixing Rust's test suite on RISC-V
- The challenges behind electric vehicle infrastructure
- Investigating kernel user-space access
- Consuming BuildStream projects in Bazel: the bazelize plugin
- Improving RISC-V Linux support in Rust
- Creating a Build toolkit using the Remote Execution API
- Trusting software in a pandemic
- The Case For Open Source Software In The Medical Industry
- My experiences moving to remote working
- Impact of COVID-19 on the Medical Devices Industry
- COVID-19 (Coronavirus) and Codethink
- Codethink develops Open Source drivers for Microsoft Azure Sphere MediaTek MT3620
- Codethink partners with Wirepas
- Passing the age of retirement: our work with Fortran and its compilers
- Sharing technical knowledge at Codethink
- Using the REAPI for Distributed Builds
- An Introduction to Remote Execution and Distributed Builds
- Gluing hardware and software: Board Support Packages (BSPs)
- Engineering's jack of all trades: an intro to FPGAs
- Bust out your pendrives: Debian 10 is out!
- Why you should attend local open source meet-ups
- Acceptance, strife, and progress in the LGBTIQ+ and open source communities
- Codethink helps York Instruments to deliver world-beating medical brain-scanner
- Codethink open sources part of staff onboarding - 'How To Git Going In FOSS'
- Getting into open source
- How to put GitOps to work for your software delivery
- Open Source Safety Requirements Analysis for Autonomous Vehicles based on STPA
- Codethink engineers develop custom debug solution for customer project
- Codethink contributes to CIP Super Long Term Kernel maintenance
- Codethink creates custom USB 3 switch to support customer's CI/CD pipeline requirements
- Codethink unlocks data analysis potential for British Cycling
- MIT Doctor delivers Manchester masterclass on innovative safety methodology
- Balance for Better: Women in Technology Codethink Interviews
- Introducing BuildGrid
- Configuring Linux to stabilise latency
- Full archive