Adding big‑endian support to CVA6 RISC‑V FPGA processor

Introduction

Big‑endian RISC‑V is an interesting area of experiments, and Codethink has previously demonstrated booted big‑endian RISC‑V Linux in QEMU. Building on that, we now achieved booting big‑endian Linux on an FPGA board. To do this we used the CVA6 open-source processor, which we investigated in one of our previous projects. CVA6 has its source code available on GitHub, written in SystemVerilog. It implements the RISC‑V instruction set and can target a few FPGA boards. We modified its code to add support for runtime configurable endianness. Read on to find out how we did it!

RISC‑V big‑endian specification

Let's start with RISC‑V specification. The endianness of the RISC‑V processor is controlled by the {M/S/U}BE bits in the Control-and-Status-Register mstatus. For example, if the processor is operating in M privilege mode and bit 37 of mstatus is set then the processor operates in big‑endian mode. Each privilege mode (M/S/U) endianness can be set individually. It’s also valid for SBE and UBE to just be mirrors of MBE, which is what we chose to implement. One interesting point of the specification is that processor instructions are always little‑endian, regardless of the current endianness mode.

How does different endianness look in practice? When CVA6 is operating in little‑endian mode, then everything happens as normal. But once the MBE bit is set in mstatus, then the processor switches to big‑endian mode, and all loads and stores to the memory are treated as being in big‑endian format. But because the CVA6 core still carries out operations in little‑endian format, we need to byte swap the data to convert between the two.

CVA6 diagram

Assembly example

This is a simple assembly program that demonstrates changing endianness at runtime:

.section .data
.balign 32
    /* Variable stored in LE format */
    var1: .word 0x11223344

.section .text
.global _start

_start:
    /* Some initial vars */
    li        a1, (1 << 37)
    la        a2, var1
    li        a3, 0x1

    /* Add 1 to var1, keep the result in a4 (LE) */
    lwu a4, (a2)
    add a4, a3, a4

    /* Switch to BE */
    csrs mstatus, a1
    fence

    /* Add 1 to var1, keep the result in a5 (BE) */
    lwu a5, (a2)
    add a5, a3, a5

    /* Expected outcome:
       a4: 0x11223345
       a5: 0x44332212  */

Now let's see which CVA6 units we had to change.

Load/store unit and MMU

We started with load/store unit and added endianness flag and byte swapping there. An important point is that load/store operations can have different sizes – 8, 16, 32, and 64 bits – and they have to be byte-swapped differently. For example, 8‑bit sized operations don't need to be modified at all. Another place where memory accesses happen is memory management unit's (MMU) page table walker, so we added byte swapping there too. This was easier because we just replicated what we'd done for the load/store unit.

Atomic memory operations

Big‑endian support for atomic memory operations (AMOs) is more involved than for load/store unit, because in CVA6 AMOs are encoded as memory bus transactions. These transactions are then processed in another module called axi_riscv_atomics. AXI is the specification of the memory bus that CVA6 uses. Conveniently for us, AXI specifies an endianness flag as part of the transaction, it's just CVA6 doesn't utilise it. Having learnt that, we modified code that sends these transactions to set the endianness field depending on the current endianness mode, and we added code to the axi_riscv_atomics module to handle that flag correctly. axi_riscv_atomics module does AMOs calculations in its own arithmetic logic unit (ALU), so our changes there were mostly byte-swapping input and output of the ALU.

CVA6 AMOs diagram

Testing

CVA6 supports GDB debug, the process of doing that is described in the documentation. So, for testing our changes we used GDB, with which we ran our small assembly programs on the processor and validated their results. We discovered that GDB doesn't connect to the processor when it's in big‑endian mode, so at the end of our test programs we switched the mode back to little‑endian to work around that.

In parallel we modified the CVA6 SDK to build Linux for big‑endian system. For that we mostly reused our work from a previous project, where we ran big‑endian Linux in QEMU. We published our CVA6 SDK modifications here. You can build it with make BE=1 FS=1 images, and then flash it with sudo -E make BE=1 FS=1 flash-sdcard SDDEVICE=/dev/sdX, where sdX is your SD card.

After successful big‑endian Linux boot, we celebrated by playing Tetris there :)

CVA6 big‑endian boot log

Conclusion

We modified CVA6 to allow runtime endianness switch and demonstrated big‑endian Linux running there. This demonstrates possibility of big‑endian RISC‑V systems and makes experiments on them easier.

You can read more RISC‑V articles from our blog here.

Are you looking for help with your RISC‑V projects? Get in touch with Codethink's team of experts by writing to us at connect@codethink.co.uk.