Tue 04 May 2021

ABI Stability in freedesktop-sdk

One of the greatest challenges in maintaining a stack of software (such as an SDK or a Linux Distribution) is in ensuring that users can update this base without breaking their own tooling. freedesktop-sdk is the base runtime used for applications distributed via flatpak, which means that any application from flathub needs to be able to trust that freedesktop-sdk is stable enough that an update to the runtime won't cause unexpected breakages.

The responsibility for stability in this context is not unlike the great maxim of kernel developers not to break userspace - when the job is done right, the end user shouldn't need to worry about the runtime at all.

One of the chief causes of unexpected breakages are Application Binary Interface (ABI) breaks.

What is ABI?

Application Binary Interface (or, more snappily, ABI) is a similar idea to its more famous cousin, API (or Application Programming Interface, to use its Sunday name.) API is a mechanism by which a programmer can have one program talk to another, allowing the reuse of other people's code. For example, one could write code to implement TLS oneself, but it would be much easier to simply use one of the existing TLS libraries via their public API.

If API is the way a programmer can call one program from another, then ABI is how the computer can call one program from another. This covers a whole lot of intricacies such as calling conventions and the object format. Much of this is beyond the scope of this article to talk about in depth, and largely irrelevant to the use case we have at hand. Let's consider that use case now.

freedesktop-sdk provides many, many dynamically linked binary libraries, which are depended upon by applications distributed via flatpak. During a stable release of freedesktop-sdk, we need to be careful to make sure that these binary libraries provide a consistent interface for application binaries to communicate with. If this interface should change during a runtime update, an application built against the old version may miscommunicate and crash!

What happens if ABI is broken?

Let's see what actually happens if we break ABI. For this example, I'll use functions to demonstrate the type of things that can go wrong, but bear in mind that ABI is a much larger surface than just the signatures of the functions. Let's write a simple C library to investigate what can go wrong:

int square (int x)
{
  return x * x;
}

int cube (int x)
{
  return x * x * x;
}

Here we define a pair of functions - one to square an integer, and the other to cube an integer. We can compile this into a dynamically linked library by running:

gcc -o example.o -fpic -c example.c
gcc -shared -o libexample.so example.o

# Let's put this in the /lib1 directory
mdkir -p /lib1
mv libexample.so /lib1

This gives us a library like those shipped in the freedesktop-sdk SDK. Now let's write an "application" that uses this library to offload the heavy lifting of squaring or cubing a number. Here we have a header, example.h:

#ifndef EXAMPLE_H
#define EXAMPLE_H

extern int square (int);
extern int cube (int);

#endif

And a main file, main.c:

#include <stdio.h>
#include "example.h"

int main () {
  printf("%d\n", square(3));
  printf("%d\n", cube(2));
  return 0;
}

We can compile this to an executable by running gcc as follows:

$ gcc -o main -lexample -L/lib1 main.c

$ # Let's run it too
$ LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/lib1 ./main
9
8

So, we now have an application and a library working together. What could go wrong?

To investigate, let's do a couple of changes to the library. We'll make these simple changes to the API as these will give dramatic effects, but bear in mind that ABI changes may be much more subtle, especially when a library has some symbols which don't get exposed to public API.

We'll make two changes, first let's remove both of these functions, and replace them with a generic power function:

int power (int x, unsigned int n)
{
  if (n == 0) {
    return 1;
  }
  return x * power(x, n - 1);
}

If we recompile the library but not the application, then run the application, we will discover that we can't:

$ LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/lib1 ./main
./main: symbol lookup error: ./main: undefined symbol: square

This is an ABI break! When our library was recompiled we removed some symbols that we depended on, and so now we cannot even run the application! This sort of ABI break is breaking backwards compatibility - updating the library means that we can no longer run applications built against an old version. For the most part, people only care about backwards compatibility breaks, but forwards compatibility breaks, where new symbols are added, may also be a problem when an application is distributed without updating the runtime.

Modifying a symbol can cause more subtle issues. Instead of replacing square with power, what if we modify it to take a different type?

double square (double x)
{
  return x * x;
}

If we recompile and run, this time we get:

$ LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/lib1 ./main
0
8

Uh oh, that's giving us 0 instead of 9! It's not hard to see how this could cause a multitude of problems for an application. This sort of change can be particularly hard to spot due to custom types changing.

How does freedesktop-sdk mitigate this?

Now we've seen the kinds of catastrophic failure breaking ABI can cause, let's take a look at how freedesktop-sdk mitigates the risk of such dreadful things happening.

freedesktop-sdk incorporates hundreds of components, all of which need regularly updating with the latest bug fixes and security patches. Clearly, checking all of these updates for ABI stability manually would be too much work for a team even larger than freedesktop-sdk has access to. To avoid this, they have an automatic ABI checker that compares the binaries of two revisions, developed by contributor Mathieu Bridon (bochecha).

The ABI checker leverages an open source project called libabigail to do the heavy lifting, which can be used to output a summary of the diff between the ABI of two binaries. This is wrapped in a simple python script, which reports whether there are any differences that require attention.

In order to minimise human involvement, the update process is almost entirely automated. BuildStream's track feature is used to check for new git tags in the upstream repositories, and a bot automatically creates branches and merge requests for any updates. In the GitLab CI pipeline for freedesktop-sdk, the ABI checker (and several other checks) are run on the updated version, which tells the team whether it's safe to update. If a break does happen, then a human can inspect the change in more detail, and make an informed decision on what to do.

Of course, this isn't a complete solution and there are issues. Firstly, some libraries in the SDK cannot be checked. For example, the LLVM compiler toolkit can't be checked as doing so causes the GitLab runners to run out of memory! As such, some libraries must be skipped by adding them to a configuration file The checker also doesn't currently cover interpreted languages, which don't have an ABI to speak of - for example Python API breaks cannot be detected.

All in all, this tooling allows a small team to keep hundreds of components up to date in two stable releases, without fearing for unexpected ABI breakages while updating.

Related to the blog post:

Other Articles

Get in touch to find out how Codethink can help you

sales@codethink.co.uk +44 161 660 9930