RISC-V is an open source instruction set architecture (ISA) based on reduced instruction set computer (RISC) principles. Codethink has been working with the RISC-V CPU architecture for several years. We've done some internal projects around hardware design, toolchain support and porting a desktop environment. We also do commercial work in this area, and a project team recently added support to QEMU for an extension to the RISC-V instruction set that provides Vector Cryptography. Read on for details of how they did this work.
Our task, sponsored by SiFive, was to add full support into QEMU (a generic system emulator capable of simulating different architectures) for RISC-V's vector cryptography, vcrypto, extension set. This extension set provides instructions for implementing various cryptographic programs, including AES, SHA-2 and the ShangMi suites. Adding support to QEMU is one of the required steps for getting the extension ratified.
Unlike its scalar equivalent (which had already been implemented in QEMU) the vector cryptography extension leverages vector registers to increase the throughput of cryptographic operations. Such registers can be of varying bit lengths and are divided into element groups of some smaller length. A cryptographic operation can then be applied to all the element groups at once.
Vector processing achieves a similar goal as to SIMD's, but purports to have some subtle advantages. With vector processing the CPU is given maximal information regarding the data it is operating on, which may in principle allow it to implement optimisations such as "vector chaining"1. The RISC-V implementation is also more flexible – for example the instructions are independent of vector register length.
QEMU
For each instruction in the vcrypto extension we had to add support for:
-
decoding the instruction's bit pattern
-
checking the instruction was valid
-
translating it into the host's instruction set.
To show how this was done let's work through vaesz
as an example – one of the
simplest vector crypto instructions. See the entry (at the time of writing) for
this instruction in the specification
here.
This instruction is part of the Zvkned
extension, which implements the AES
block cipher.
As you can see in the specification document, this instruction takes as
arguments two vector registers, labelled vd
and vs2
(only the first element
group from vs2
is used – hence the term scalar). vd
is then overwritten by
the output from the instruction.
In order to support this instruction in QEMU, the first step is to add its
bitwise encoding to
target/riscv/insn32.decode
,
allowing QEMU to recognise the instruction when it appears in binary. The
encoding is given by the table in the "Encoding (Vector-Scalar)" section, which
is shown graphically above. Note, OP-P
and OPMVV
correspond to 1110111
and 010
respectively.
The RISC-V vector spec defines several parameters that can be used to tune
the behaviour of the vector instructions. Such changes can render an
instruction illegal, so the next step is to have QEMU check this. For our
instruction vaesz
, the requirements are contained within the if
clause of
the pseudo code. (vstart%EGS)<>0
checks that the starting location of the
data within the register is valid and LMUL*VLEN < EGW
that enough register
space is provided for at least one element group.
Next we need to handle translating the RISC-V instruction into the host's
instruction set. As you might expect, QEMU has quite a lot of tooling already
setup for implementing this. We just needed to implement the pseudo code
written in the spec as C code and then QEMU would handle the dynamic
translation of this to the host's instruction set using its backend,
TCG. You may be able to
decipher that the pseudo code iterates across element groups in vd
and
applies a bitwise or
operation against the scalar in vs2
. QEMU treats
vector registers as arrays, so this was relatively simple to implement.
Implementing more sophisticated instructions follows the same principles as the simple one outlined here, it just involves more complicated pseudo-code.
Testing
The vcrypto specification was in its early stages when our project began and had no prior implementation, so testing was paramount to ensure we had understood and implemented the specification as written.
Our sponsor provided a test suite that they had written internally which we would use to test against our implementation. This suite consisted of auto-generated assembly code containing positive and negative test cases, which we could run within QEMU.
It was important to have rapid testing we could do ourselves to verify our work against the latest specifications, therefore we developed 'framework' tests. Our tests ran in Linux userland and generated JIT instructions with random parameters. However, while our tests covered weaknesses of the client's test suite: primarily the assembly code being harder to debug than C as well as lagging behind the specification, we were limited as we could only test positive cases.
Endianness
Running the above test suites on our x86-64 laptops gave us some confidence that our implementation was correct. However, it didn't provide the full picture, which to understand we need to go off on a (hopefully) interesting tangent.
There are two (sane) standards for loading data in and out of CPU registers: little endian (LE) and big endian (BE). With LE, the data's least significant byte is stored in the smallest memory address and the most significant in the largest. It is vice versa for BE.
Like almost all CPUs in personal computers, RISC-V CPUs are LE. QEMU, on the other hand, can run on LE and BE hosts. Hence someone may try emulating an LE RISC-V CPU on some BE host. If we weren't careful in our handling of vector registers we could've introduced some subtle bugs in this scenario, so we endeavoured to be as careful as possible.
Ideally we would've run our test suites on BE CPUs but alas we had no access to such hardware, as BE consumer hardware has been largely phased out in favour of LE CPUs which can run in BE mode such as some ARM chips. Somewhat bizarrely though, QEMU-in-QEMU is actually a supported use of the program. Hence we could run our QEMU tests within a QEMU emulation of BE hardware! Finding a pre-built BE linux image proved tricky so we opted for running FreeBSD in a 64-bit PowerPC emulation. As is typical of FreeBSD, they provide helpful documentation for doing this.
With this in hand we could ensure that our tests passed on BE CPUs 🥳. The only drawback to this approach came from the inherent performance hit associated with emulation. Building QEMU and running tests went from taking a few minutes on a laptop to multiple hours when done within the PowerPC emulation.
Upstreaming
The upstreaming process has been a cycle of submitting email patch submissions and implementing feedback. This began with an RFC before posting formal submissions, of which currently the 4th revision is being prepared: https://lists.nongnu.org/archive/html/qemu-devel/2023-04/msg05580.html. The vcrypto spec has ostensibly been frozen so hopefully future changes to the patchset will be minimal!
Cover photo by Laura Ockel on Unsplash.
Other Content
- Speed Up Embedded Software Testing with QEMU
- Open Source Summit Europe (OSSEU) 2024
- Watch: Real-time Scheduling Fault Simulation
- Improving systemd’s integration testing infrastructure (part 2)
- Meet the Team: Laurence Urhegyi
- A new way to develop on Linux - Part II
- Shaping the future of GNOME: GUADEC 2024
- Developing a cryptographically secure bootloader for RISC-V in Rust
- Meet the Team: Philip Martin
- Improving systemd’s integration testing infrastructure (part 1)
- A new way to develop on Linux
- RISC-V Summit Europe 2024
- Safety Frontier: A Retrospective on ELISA
- Codethink sponsors Outreachy
- The Linux kernel is a CNA - so what?
- GNOME OS + systemd-sysupdate
- Codethink has achieved ISO 9001:2015 accreditation
- Outreachy internship: Improving end-to-end testing for GNOME
- Lessons learnt from building a distributed system in Rust
- FOSDEM 2024
- QAnvas and QAD: Streamlining UI Testing for Embedded Systems
- Outreachy: Supporting the open source community through mentorship programmes
- Using Git LFS and fast-import together
- Testing in a Box: Streamlining Embedded Systems Testing
- SDV Europe: What Codethink has planned
- How do Hardware Security Modules impact the automotive sector? The final blog in a three part discussion
- How do Hardware Security Modules impact the automotive sector? Part two of a three part discussion
- How do Hardware Security Modules impact the automotive sector? Part one of a three part discussion
- Automated Kernel Testing on RISC-V Hardware
- Automated end-to-end testing for Android Automotive on Hardware
- GUADEC 2023
- Embedded Open Source Summit 2023
- RISC-V: Exploring a Bug in Stack Unwinding
- Introducing Our New Open-Source Tool: Quality Assurance Daemon
- Achieving Long-Term Maintainability with Open Source
- FOSDEM 2023
- Think before you Pip
- BuildStream 2.0 is here, just in time for the holidays!
- A Valuable & Comprehensive Firmware Code Review by Codethink
- GNOME OS & Atomic Upgrades on the PinePhone
- Flathub-Codethink Collaboration
- Codethink proudly sponsors GUADEC 2022
- Tracking Down an Obscure Reproducibility Bug in glibc
- Web app test automation with `cdt`
- FOSDEM Testing and Automation talk
- Protecting your project from dependency access problems
- Porting GNOME OS to Microchip's PolarFire Icicle Kit
- YAML Schemas: Validating Data without Writing Code
- Full archive