On one of our RISC-V projects, we were recently debugging a complex driver probing sequence in the U-Boot bootloader. Along the way we improved the debugging facilities available for U-Boot on RISC-V, specifically allowing developers and system integrators to see the callstack in the case that there is a crash during the early boot sequence. Read on to find out what we did.
Adding Stack Unwinding
The U-Boot exception code prints out the state of the system when an exception happens (for example pointer outside of usable memory) but for RISC-V this is just the current cpu registers. To add more information, we decided to add code to print the call stack and since this is a useful feature we submitted this patch.
There are several ways to do stack unwinding, either using table built by the compiler or using a frame pointer register which points to the current stack frame. We decided to go with the frame pointer as that would be the easiest to write code for (approx 23 lines) from this example. The only downside is the code is bigger so this feature is put behind a config option CONFIG_FRAMEPOINTER
For GCC, the default is not to use a frame pointer so in U-Boot when the configuration for unwinding is enabled the compiler flags need the -mno-omit-frame-pointer
option. As noted above this makes the U-Boot code larger as each function requires a few more instructions to save the frame pointer and to ensure it is updated. For RISC-V, the frame pointer is in register s0
.
The stack frame is always setup so that this pointer can be used to find the previous frame and caller's address and defined by the architecture and compiler options used. The frame pointer (fp) always points to the following data on the stack and thus can be used to follow back to the first frame. On main entry to U-Boot we set the fp to NULL or use the global data gd
pointer to avoid changing some of the other entry points:
fp[-1] = return address
fp[-2] = previous frame pointer
In the above example, the fp is a pointer type, so on 64bit systems it is stored as an 8-byte unsigned value.
The Bug Report
After the initial submission, we got a report our patch was not working and the unwinder was going off into places that could not be valid code. Since this was working for us for our build, we tried to dig into the report in case this was being caused by a difference in build config, compiler version or other unforseen issues. We asked the reporter to send some dumps from the code to aid investigation. When we saw the code it was clear there was something we had not seen going on:
0000002000817228 <board_init_r>:
2000817228: 7d9612ef jal t0,2000879200 <__riscv_save_2>
200081722c: 81aa mv gp,a0
There is a call to __riscv_save_2
here instead of the sequence we expected below which stores the registers onto the stack and updates the frame register to the new frame.
000000200081b3ce <board_init_r>:
200081b3ce: 1101 add sp,sp,-32
200081b3d0: e822 sd s0,16(sp)
200081b3d2: e04a sd s2,0(sp)
200081b3d4: ec06 sd ra,24(sp)
200081b3d6: e426 sd s1,8(sp)
200081b3d8: 1000 add s0,sp,32
200081b3da: 81aa mv gp,a0
Since we hadn't seen this before, we did a quick search for the __riscv_save_2
and this turns out to be a provided by the compiler support library for a feature -msave-restore
. The save-restore is a RISC-V compiler option to try and make function call start and end code smaller by using a library routine to do it.
The issue is obviously that the function the error was injected into looked like there was a valid stack frame pointer, but there was no actual frame unwinding data saved and therefore no way to follow this function back. It seems that no-one had noticed before that using the -msave-restore
option clashed with the -mno-omit-frame-pointer
and that the reporter's local change to either compiler or U-Boot build options caused an issue we had not seen.
How to Fix
The quick fix is to disable the save-restore code, but due to the reporter's local changes to U-Boot we can't change the configuration system to disable having both in at the same time. Better would be to update the compiler to warn, error or disable one of these clashing options. The last and probably the most difficult would be to add support for having both by updating both the compiler and compiler library to have versions of __riscv_save_%0
which would work with the frame pointer.
Since we are using GCC for our builds, the first action was to clone the code and look into the places that generate the code for function wrappers to see what would be required. During the initial checkout and test of GCC, it turns out that we could not repeat the issue. Looking through the gcc/config/riscv/riscv.cc
file it turns out this has been fixed recently in another update. In refactoring the test for whether to use the save/restore lib-calls a test function was added which also added a check for the use of frame pointer, and if that was set avoid using the save/restore calls as so:
+riscv_avoid_save_libcall (void)
+{
+ if (!TARGET_SAVE_RESTORE
+ || crtl->calls_eh_return
+ || frame_pointer_needed
+ || cfun->machine->interrupt_handler_p
+ || cfun->machine->varargs_size != 0
+ || crtl->args.pretend_args_size != 0)
+ return true;
+
+ return false;
+}
This means future GCC will not have this issue, which is an improvement. Either this patch could be backported to earlier versions or something similar could be added to avoid this.
LLVM is left as as a future exercise.
Improving GCC
It would be great if the two options could be used together, but when starting to modify gcc/config/riscv/riscv.cc
and the gcc/config/riscv/riscv.md
we ran into issues with not knowing how the internals of gcc well enough to decipher the compiler errors we got when trying to build some test code. Adding UNSPECV_GPR_SAVE_FP
as a version of UNSPECV_GPR_SAVE
just gave us an unrecognizable instruction error, as shown below:
test.c: In function ‘main’:
test.c:12:1: error: unrecognizable insn:
12 | }
| ^
(insn/f 39 11 40 3 (parallel [
(unspec_volatile [
(const_int 3 [0x3])
] UNSPECV_GPR_SAVE_FP)
If we want to go further with this, then we would have to work out what we missed in this addition to get gcc to output the relevant code to call the frame-pointer versions when frame_pointer_needed
is set.
Summary
When adding features it is difficult to work out all the permutations of configuration, build environment and where the code is running. Working with the community to iron out issues quickly is good, and hopefully if there are things you cannot fix then someone else can look at fixing.
Codethink are available to help with your RISC-V project, please contact us to find out more about our services.
The U-Boot logo thumbnail is used under CC-BY-SA-4 license.
Other Content
- Speed Up Embedded Software Testing with QEMU
- Open Source Summit Europe (OSSEU) 2024
- Watch: Real-time Scheduling Fault Simulation
- Improving systemd’s integration testing infrastructure (part 2)
- Meet the Team: Laurence Urhegyi
- A new way to develop on Linux - Part II
- Shaping the future of GNOME: GUADEC 2024
- Developing a cryptographically secure bootloader for RISC-V in Rust
- Meet the Team: Philip Martin
- Improving systemd’s integration testing infrastructure (part 1)
- A new way to develop on Linux
- RISC-V Summit Europe 2024
- Safety Frontier: A Retrospective on ELISA
- Codethink sponsors Outreachy
- The Linux kernel is a CNA - so what?
- GNOME OS + systemd-sysupdate
- Codethink has achieved ISO 9001:2015 accreditation
- Outreachy internship: Improving end-to-end testing for GNOME
- Lessons learnt from building a distributed system in Rust
- FOSDEM 2024
- QAnvas and QAD: Streamlining UI Testing for Embedded Systems
- Outreachy: Supporting the open source community through mentorship programmes
- Using Git LFS and fast-import together
- Testing in a Box: Streamlining Embedded Systems Testing
- SDV Europe: What Codethink has planned
- How do Hardware Security Modules impact the automotive sector? The final blog in a three part discussion
- How do Hardware Security Modules impact the automotive sector? Part two of a three part discussion
- How do Hardware Security Modules impact the automotive sector? Part one of a three part discussion
- Automated Kernel Testing on RISC-V Hardware
- Automated end-to-end testing for Android Automotive on Hardware
- GUADEC 2023
- Embedded Open Source Summit 2023
- Adding RISC-V Vector Cryptography Extension support to QEMU
- Introducing Our New Open-Source Tool: Quality Assurance Daemon
- Achieving Long-Term Maintainability with Open Source
- FOSDEM 2023
- Think before you Pip
- BuildStream 2.0 is here, just in time for the holidays!
- A Valuable & Comprehensive Firmware Code Review by Codethink
- GNOME OS & Atomic Upgrades on the PinePhone
- Flathub-Codethink Collaboration
- Codethink proudly sponsors GUADEC 2022
- Tracking Down an Obscure Reproducibility Bug in glibc
- Web app test automation with `cdt`
- FOSDEM Testing and Automation talk
- Protecting your project from dependency access problems
- Porting GNOME OS to Microchip's PolarFire Icicle Kit
- YAML Schemas: Validating Data without Writing Code
- Full archive