On one of our RISC-V projects, we were recently debugging a complex driver probing sequence in the U-Boot bootloader. Along the way we improved the debugging facilities available for U-Boot on RISC-V, specifically allowing developers and system integrators to see the callstack in the case that there is a crash during the early boot sequence. Read on to find out what we did.
Adding stack unwinding
The U-Boot exception code prints out the state of the system when an exception happens (for example pointer outside of usable memory) but for RISC-V this is just the current cpu registers. To add more information, we decided to add code to print the call stack and since this is a useful feature we submitted this patch.
There are several ways to do stack unwinding, either using table built by the compiler or using a frame pointer register which points to the current stack frame. We decided to go with the frame pointer as that would be the easiest to write code for (approx 23 lines) from this example. The only downside is the code is bigger so this feature is put behind a config option CONFIG_FRAMEPOINTER
For GCC, the default is not to use a frame pointer so in U-Boot when the configuration for unwinding is enabled the compiler flags need the -mno-omit-frame-pointer
option. As noted above this makes the U-Boot code larger as each function requires a few more instructions to save the frame pointer and to ensure it is updated. For RISC-V, the frame pointer is in register s0
.
The stack frame is always setup so that this pointer can be used to find the previous frame and caller's address and defined by the architecture and compiler options used. The frame pointer (fp) always points to the following data on the stack and thus can be used to follow back to the first frame. On main entry to U-Boot we set the fp to NULL or use the global data gd
pointer to avoid changing some of the other entry points:
fp[-1] = return address
fp[-2] = previous frame pointer
In the above example, the fp is a pointer type, so on 64bit systems it is stored as an 8-byte unsigned value.
The bug report
After the initial submission, we got a report our patch was not working and the unwinder was going off into places that could not be valid code. Since this was working for us for our build, we tried to dig into the report in case this was being caused by a difference in build config, compiler version or other unforseen issues. We asked the reporter to send some dumps from the code to aid investigation. When we saw the code it was clear there was something we had not seen going on:
0000002000817228 <board_init_r>:
2000817228: 7d9612ef jal t0,2000879200 <__riscv_save_2>
200081722c: 81aa mv gp,a0
There is a call to __riscv_save_2
here instead of the sequence we expected below which stores the registers onto the stack and updates the frame register to the new frame.
000000200081b3ce <board_init_r>:
200081b3ce: 1101 add sp,sp,-32
200081b3d0: e822 sd s0,16(sp)
200081b3d2: e04a sd s2,0(sp)
200081b3d4: ec06 sd ra,24(sp)
200081b3d6: e426 sd s1,8(sp)
200081b3d8: 1000 add s0,sp,32
200081b3da: 81aa mv gp,a0
Since we hadn't seen this before, we did a quick search for the __riscv_save_2
and this turns out to be a provided by the compiler support library for a feature -msave-restore
. The save-restore is a RISC-V compiler option to try and make function call start and end code smaller by using a library routine to do it.
The issue is obviously that the function the error was injected into looked like there was a valid stack frame pointer, but there was no actual frame unwinding data saved and therefore no way to follow this function back. It seems that no-one had noticed before that using the -msave-restore
option clashed with the -mno-omit-frame-pointer
and that the reporter's local change to either compiler or U-Boot build options caused an issue we had not seen.
How to fix
The quick fix is to disable the save-restore code, but due to the reporter's local changes to U-Boot we can't change the configuration system to disable having both in at the same time. Better would be to update the compiler to warn, error or disable one of these clashing options. The last and probably the most difficult would be to add support for having both by updating both the compiler and compiler library to have versions of __riscv_save_%0
which would work with the frame pointer.
Since we are using GCC for our builds, the first action was to clone the code and look into the places that generate the code for function wrappers to see what would be required. During the initial checkout and test of GCC, it turns out that we could not repeat the issue. Looking through the gcc/config/riscv/riscv.cc
file it turns out this has been fixed recently in another update. In refactoring the test for whether to use the save/restore lib-calls a test function was added which also added a check for the use of frame pointer, and if that was set avoid using the save/restore calls as so:
+riscv_avoid_save_libcall (void)
+{
+ if (!TARGET_SAVE_RESTORE
+ || crtl->calls_eh_return
+ || frame_pointer_needed
+ || cfun->machine->interrupt_handler_p
+ || cfun->machine->varargs_size != 0
+ || crtl->args.pretend_args_size != 0)
+ return true;
+
+ return false;
+}
This means future GCC will not have this issue, which is an improvement. Either this patch could be backported to earlier versions or something similar could be added to avoid this.
LLVM is left as as a future exercise.
Improving GCC
It would be great if the two options could be used together, but when starting to modify gcc/config/riscv/riscv.cc
and the gcc/config/riscv/riscv.md
we ran into issues with not knowing how the internals of gcc well enough to decipher the compiler errors we got when trying to build some test code. Adding UNSPECV_GPR_SAVE_FP
as a version of UNSPECV_GPR_SAVE
just gave us an unrecognizable instruction error, as shown below:
test.c: In function ‘main’:
test.c:12:1: error: unrecognizable insn:
12 | }
| ^
(insn/f 39 11 40 3 (parallel [
(unspec_volatile [
(const_int 3 [0x3])
] UNSPECV_GPR_SAVE_FP)
If we want to go further with this, then we would have to work out what we missed in this addition to get gcc to output the relevant code to call the frame-pointer versions when frame_pointer_needed
is set.
Summary
When adding features it is difficult to work out all the permutations of configuration, build environment and where the code is running. Working with the community to iron out issues quickly is good, and hopefully if there are things you cannot fix then someone else can look at fixing.
Codethink are available to help with your RISC-V project, please contact us to find out more about our services.
The U-Boot logo thumbnail is used under CC-BY-SA-4 license.
Other Content
- Using Git LFS and fast-import together
- Testing in a Box: Streamlining Embedded Systems Testing
- SDV Europe: What Codethink has planned
- How do Hardware Security Modules impact the automotive sector? The final blog in a three part discussion
- How do Hardware Security Modules impact the automotive sector? Part two of a three part discussion
- How do Hardware Security Modules impact the automotive sector? Part one of a three part discussion
- Automated Kernel Testing on RISC-V Hardware
- Automated end-to-end testing for Android Automotive on Hardware
- GUADEC 2023
- Embedded Open Source Summit 2023
- Adding RISC-V Vector Cryptography Extension support to QEMU
- Introducing Our New Open-Source Tool: Quality Assurance Daemon
- Long Term Maintainability
- FOSDEM 2023
- Think before you Pip
- BuildStream 2.0 is here, just in time for the holidays!
- A Valuable & Comprehensive Firmware Code Review by Codethink
- GNOME OS & Atomic Upgrades on the PinePhone
- Flathub-Codethink Collaboration
- Codethink proudly sponsors GUADEC 2022
- Tracking Down an Obscure Reproducibility Bug in glibc
- Web app test automation with `cdt`
- FOSDEM Testing and Automation talk
- Protecting your project from dependency access problems
- Porting GNOME OS to Microchip's PolarFire Icicle Kit
- YAML Schemas: Validating Data without Writing Code
- Deterministic Construction Service
- Codethink becomes a Microchip Design Partner
- Hamsa: Using an NVIDIA Jetson Development Kit to create a fully open-source Robot Nano Hand
- Using STPA with software-intensive systems
- Codethink achieves ISO 26262 ASIL D Tool Certification
- RISC-V: running GNOME OS on SiFive hardware for the first time
- Automated Linux kernel testing
- Native compilation on Arm servers is so much faster now
- Higher quality of FOSS: How we are helping GNOME to improve their test pipeline
- RISC-V: A Small Hardware Project
- Why aligning with open source mainline is the way to go
- Build Meetup 2021: The BuildTeam Community Event
- A new approach to software safety
- Does the "Hypocrite Commits" incident prove that Linux is unsafe?
- ABI Stability in freedesktop-sdk
- Why your organisation needs to embrace working in the open-source ecosystem
- RISC-V User space access Oops
- Tracking Players at the Edge: An Overview
- What is Remote Asset API?
- Running a devroom at FOSDEM: Safety and Open Source
- Meet the codethings: Understanding BuildGrid and BuildBox with Beth White
- Streamlining Terraform configuration with Jsonnet
- Bloodlight: Designing a Heart Rate Sensor with STM32, LEDs and Photodiode
- Making the tech industry more inclusive for women
- Bloodlight Case Design: Lessons Learned
- Safety is a system property, not a software property
- RISC-V: Codethink's first research about the open instruction set
- Meet the Codethings: Safety-critical systems and the benefits of STPA with Shaun Mooney
- Why Project Managers are essential in an effective software consultancy
- FOSDEM 2021: Devroom for Safety and Open Source
- Meet the Codethings: Ben Dooks talks about Linux kernel and RISC-V
- Here we go 2021: 4 open source events for software engineers and project leaders
- Xmas Greetings from Codethink
- Call for Papers: FOSDEM 2021 Dev Room Safety and Open Source Software
- Building the abseil-hello Bazel project for a different architecture using a dynamically generated toolchain
- Advent of Code: programming puzzle challenges
- Full archive