📝 19 Jan 2025
Earlier This Week: uname became unusually quieter on Apache NuttX RTOS…
## Hmmm something is missing
NuttShell (NSH) NuttX-12.8.0
nsh> uname -a
NuttX 12.8.0 risc-v rv-virt
See the subtle bug? The Commit Hash is missing!
## Commit Hash should always appear
nsh> uname -a
NuttX 12.8.0 5f4a15b690 Jan 13 2025 00:34:30 risc-v rv-virt
Can we ignore it? Maybe nobody will notice?
Noooooo! Commit Hash identifies the Exact Commit of NuttX that was used to produce the NuttX Build. (Pic above)
Watch as we stomp the seemingly simple bug… That turns out to be something seriously sinister! (Spoiler: Static Vars are broken)
uname on NuttX: How does it work?
Use the Source, Luke! First we peek inside the uname command.
Our bug happens in NuttX Shell. Thus we search NuttX Apps Repo for uname…
Searching for uname returns this code in NuttX Shell: nsh_syscmds.c
// Declare the uname() function
#include <sys/utsname.h>
// NuttX Shell: To execute the uname command...
// We call the uname() function
int cmd_uname(...) { ...
struct utsname info;
ret = uname(&info);
We see that uname command calls the uname function.
So we search the NuttX Kernel Repo for uname…
NuttX Kernel Search says that uname is defined here: lib_utsname.c
// CONFIG_VERSION_BUILD goes inside Static Var g_version
static char g_version[] = CONFIG_VERSION_BUILD; // Omitted: Date and Time
// g_version goes into the uname output
int uname(FAR struct utsname *output) { ...
strlcpy(
output->version, // Copy into the Output Version
g_version, // From our Static Var (CONFIG_VERSION_BUILD a.k.a Commit Hash)
sizeof(output->version) // Making sure we don't overflow
);
(Is uname a Kernel Function? We’ll see soon)
What’s this CONFIG_VERSION_BUILD?
Earlier we saw that uname function returns CONFIG_VERSION_BUILD: lib_utsname.c
// CONFIG_VERSION_BUILD goes inside Static Var g_version
static char g_version[] = CONFIG_VERSION_BUILD; // Omitted: Date and Time
// g_version goes into the uname output
int uname(FAR struct utsname *output) { ...
strlcpy(
output->version, // Copy into the Output Version
g_version, // From our Static Var (CONFIG_VERSION_BUILD a.k.a Commit Hash)
sizeof(output->version) // Making sure we don't overflow
);
Let’s track the origin of CONFIG_VERSION_BUILD. We build NuttX for QEMU RISC-V 64-bit (Kernel Mode)
## Download the NuttX Kernel and NuttX Apps
git clone https://github.com/apache/nuttx
git clone https://github.com/apache/nuttx-apps apps
## Configure NuttX for QEMU RISC-V 64-bit (Kernel Mode)
cd nuttx
tools/configure.sh rv-virt:knsh64
## Build the NuttX Kernel
make -j
## Build the NuttX Apps
make export
pushd ../apps
./tools/mkimport.sh -z -x ../nuttx/nuttx-export-*.tar.gz
make import
popd
Maybe CONFIG_VERSION_BUILD is in the NuttX Config File?
$ grep CONFIG_VERSION_BUILD .config
[ Nothing ]
## Nope it's not!
We head back to NuttX Kernel Repo and search for CONFIG_VERSION_BUILD…
The Version Number you are looking at comes from the Header File nuttx/include/nuttx/version.h.
That Header File was created at build time from a Hidden File that you can find in the top-level nuttx directory called .version.
Aha! CONFIG_VERSION_BUILD a.k.a. Commit Hash comes from version.h
$ cat include/nuttx/version.h
#define CONFIG_VERSION_BUILD "a2d4d74af7"
(Thanks to Ludovic Vanasse for porting the docs)
Is CONFIG_VERSION_BUILD compiled correctly into our NuttX Image?
We snoop the NuttX Kernel Image to verify that CONFIG_VERSION_BUILD is correct.
Recall that CONFIG_VERSION_BUILD is stored in Static Variable g_version: lib_utsname.c
// CONFIG_VERSION_BUILD goes inside Static Var g_version
static char g_version[] = CONFIG_VERSION_BUILD; // Omitted: Date and Time
// g_version goes into the uname output
int uname(FAR struct utsname *output) { ...
strlcpy(
output->version, // Copy into the Output Version
g_version, // From our Static Var (CONFIG_VERSION_BUILD a.k.a Commit Hash)
sizeof(output->version) // Making sure we don't overflow
);
According to NuttX Linker Map: Address of g_version is 0x8040
03B8
## Search for g_version in Linker Map, show 1 line after
$ grep \
--after-context=1 \
g_version \
nuttx.map
.data.g_version
0x804003b8 0x21 staging/libkc.a(lib_utsname.o)
What’s the value inside g_version? We dump the Binary Image from NuttX Kernel ELF…
## Export the NuttX Binary Image to nuttx.bin
riscv-none-elf-objcopy \
-O binary \
nuttx \
nuttx.bin
Earlier we said g_version is at 0x8040
03B8
.
We open nuttx.bin in VSCode Hex Editor, press Ctrl-G and jump to 0x2003B8
…
(Because NuttX Kernel loads at 0x8020
0000
)
And that’s our CONFIG_VERSION_BUILD with Commit Hash! Looks hunky dory, why wasn’t it returned correctly to uname and NuttX Shell?
Maybe NuttX Kernel got corrupted? Returning bad data for uname?
We tweak the NuttX Kernel and call uname at Kernel Startup: qemu_rv_appinit.c
// Declare the uname() function
#include <sys/utsname.h>
// When Kernel Boots:
// Call the uname() function
int board_app_initialize(uintptr_t arg) { ...
struct utsname info;
int ret2 = uname(&info);
// If uname() returns OK:
// Print the Commit Hash a.k.a. g_version
if (ret2 == 0) {
_info("version=%s\n", info.version);
}
Then inside the uname function, we dump the value of g_version: lib_utsname.c
// Inside the uname() function:
// Print g_version with _info() and printf()
int uname(FAR struct utsname *name) { ...
_info("From _info: g_version=%s\n", g_version); // Kernel Only
printf("From printf: g_version=%s\n", g_version); // Kernel and Apps
printf("Address of g_version=%p\n", g_version); // Kernel and Apps
(Why print twice? We’ll see soon)
We boot NuttX on QEMU RISC-V 64-bit…
## Start QEMU with NuttX
$ qemu-system-riscv64 \
-semihosting \
-M virt,aclint=on \
-cpu rv64 \
-kernel nuttx \
-nographic
## NuttX Kernel shows Commit Hash
From _info:
g_version=bd6e5995ef Jan 16 2025 15:29:02
From printf:
g_version=bd6e5995ef Jan 16 2025 15:29:02
Address of g_version=0x804003b8
board_app_initialize:
version=bd6e5995ef Jan 16 2025 15:29:02
NuttShell (NSH) NuttX-12.4.0
Yep NuttX Kernel correctly prints g_version a.k.a. CONFIG_VERSION_BUILD a.k.a. Commit Hash. No Kernel Corruption! (Phew)
Maybe something got corrupted in our NuttX App?
Wow that’s so diabolical, sure hope not. We mod the NuttX Hello App and call uname: hello_main.c
// Declare the uname() function
#include <sys/utsname.h>
// In Hello App: Call the uname() function
int main(int argc, FAR char *argv[]) {
struct utsname info;
int ret = uname(&info);
// If uname() returns OK:
// Print the Commit Hash a.k.a. g_version
if (ret >= 0) {
printf("version=%s\n", info.version);
}
Indeed something is messed up with g_version a.k.a. CONFIG_VERSION_BUILD a.k.a. Commit Hash…
## Why is Commit Hash empty?
NuttShell (NSH) NuttX-12.8.0
nsh> hello
version=
Inside our NuttX App: Why is g_version empty? Wasn’t it OK in NuttX Kernel?
Why did uname work differently: NuttX Kernel vs NuttX Apps?
Now we chase the uname raving rabbid inside our NuttX App. Normally we’d dump the RISC-V Disassembly for our Hello App ELF…
## Dump the RISC-V Disassembly for apps/bin/hello
$ riscv-none-elf-objdump \
--syms --source --reloc --demangle --line-numbers --wide \
--debugging \
../apps/bin/hello \
>hello.S \
2>&1
## Impossible to read, without Debug Symbols
$ more hello.S
SYMBOL TABLE: no symbols
00000000c0000000 <.text>:
c0000000: 1141 add sp, sp, -16
c0000002: e006 sd ra, 0(sp)
c0000004: 82aa mv t0, a0
But ugh NuttX Build has unhelpfully Discarded the Debug Symbols from our Hello App ELF, making it hard to digest.
How to recover the Debug Symbols?
We sniff the NuttX Build…
## Update our Hello App
$ cd ../apps
$ touch examples/hello/hello_main.c
## Trace the NuttX Build for Hello App
$ make import V=1
LD: apps/bin/hello
riscv-none-elf-ld -e main --oformat elf64-littleriscv -T nuttx/libs/libc/modlib/gnu-elf.ld -e __start -Bstatic -Tapps/import/scripts/gnu-elf.ld -Lapps/import/libs -L "xpack-riscv-none-elf-gcc-13.2.0-2/bin/../lib/gcc/riscv-none-elf/13.2.0/rv64imafdc_zicsr/lp64d" apps/import/startup/crt0.o hello_main.c...apps.examples.hello.o --start-group -lmm -lc -lproxies -lgcc apps/libapps.a xpack-riscv-none-elf-gcc-13.2.0-2/bin/../lib/gcc/riscv-none-elf/13.2.0/rv64imafdc_zicsr/lp64d/libgcc.a --end-group -o apps/bin/hello
cp apps/bin/hello apps/bin_debug
riscv-none-elf-strip --strip-unneeded apps/bin/hello
## apps/bin/hello is missing the Debug Symbols
## apps/bin_debug/hello retains the Debug Symbols!
Ah NuttX Build has squirrelled away the Debug Version of Hello App into apps/bin_debug. We dump its RISC-V Disassembly…
## Dump the RISC-V Disassembly for apps/bin_debug/hello
cd ../nuttx
riscv-none-elf-objdump \
--syms --source --reloc --demangle --line-numbers --wide \
--debugging \
../apps/bin_debug/hello \
>hello.S \
2>&1
(See the RISC-V Disassembly hello.S)
Once Again: How is uname different in NuttX Kernel vs NuttX App?
Earlier we dumped the RISC-V Disassembly for our modded Hello App: hello.S
We browse the disassembly and search for uname. This appears: hello.S
// Inside Hello App: The RISC-V Disassembly of uname() function
int uname(FAR struct utsname *name) { ...
// Call _info() to print g_version
_info("From _info: g_version=%s\n", g_version);
auipc a3, 0x100
add a3, a3, 170 // Arg #3: g_version
auipc a2, 0x2
add a2, a2, -270 // Arg #2: Format String
auipc a1, 0x2
add a1, a1, -814 // Arg #1: VarArgs Size (I think)
li a0, 6 // Arg #0: Info Logging Priority
jal c00007c8 // Call syslog()
// Call printf() to print g_version
printf("From printf: g_version=%s\n", g_version);
auipc a1, 0x100
add a1, a1, 140 // Arg #1: g_version
auipc a0, 0x2
add a0, a0, -804 // Arg #0: Format String
jal c00001e6 // Call printf()
// Call printf() to print Address of g_version
printf("Address of g_version=%p\n", g_version);
auipc a1, 0x100
add a1, a1, 120 // Arg #1: g_version
auipc a0, 0x2
add a0, a0, -792 // Arg #0: Format String
jal c00001e6 // Call printf()
// Copy g_version into the uname() output
strlcpy(name->version, g_version, sizeof(name->version));
li a2, 51 // Arg #2: Size of name->version
auipc a1, 0x100
add a1, a1, 96 // Arg #1: g_version
add a0, s0, 74 // Arg #0: name->version
jal c0000748 // Call strlcpy()
Which does 4 things…
Call _info (a.k.a. syslog) to print g_version
Call printf to print g_version
Followed by Address of g_version
Copy g_version into the uname output
Huh? Isn’t this the exact same Kernel Code we saw earlier?
Precisely! We expected uname to be a System Call to NuttX Kernel…
But nope, uname is a Local Function. (Not a System Call)
Every NuttX App has a Local Copy of g_version and Commit Hash. (That’s potentially corruptible hmmm…)
Which explains why printf appears in the Hello Output but not _info…
## NuttX Kernel: Shows _info() and printf()
From _info:
g_version=bd6e5995ef Jan 16 2025 15:29:02
From printf:
g_version=bd6e5995ef Jan 16 2025 15:29:02
Address of g_version=0x804003b8
## NuttX Apps: Won't show _info()
NuttShell (NSH) NuttX-12.4.0
nsh> hello
From printf:
g_version=
Address of g_version=0xc0100218
(Because _info and syslog won’t work in NuttX Apps)
The Full Path of uname is a dead giveaway: It’s a Library Function. (Not a Kernel Function)
libs/libc/misc/lib_utsname.c
(uname is a System Call in Linux)
Gasp! What if g_version a.k.a. Commit Hash got corrupted inside our app?
Earlier we saw that g_version is a Static Variable that contains our Commit Hash: lib_utsname.c
// CONFIG_VERSION_BUILD goes inside Static Var g_version
static char g_version[] = CONFIG_VERSION_BUILD; // Omitted: Date and Time
// g_version goes into the uname output
int uname(FAR struct utsname *output) { ...
strlcpy(
output->version, // Copy into the Output Version
g_version, // From our Static Var (CONFIG_VERSION_BUILD a.k.a Commit Hash)
sizeof(output->version) // Making sure we don't overflow
);
We have a hefty hunch that Static Variables might be broken 😱. We test our hypothesis in Hello App: hello_main.c
// Define our Static Var
static char test_static[] =
"Testing Static Var";
// In Hello App: Print our Static Var
// "test_static=Testing Static Var"
int main(int argc, FAR char *argv[]) {
printf("test_static=%s\n", test_static);
printf("Address of test_static=%p\n", test_static);
Our hunch is 100% correct: Static Variables are Broken!
## Why is Static Var `test_static` empty???
NuttShell (NSH) NuttX-12.4.0
nsh> hello
test_static=
Address of test_static=0xc0100200
OK this goes waaaaay beyond our debugging capability. (NuttX App Data Section got mapped incorrectly into the Memory Space?)
We call in the NuttX Experts for help. And it’s awesomely fixed by anjiahao yay! 🎉
Lesson Learnt: Please pay attention to the slightest disturbance, like the uname output…
It might be a sign of something seriously sinister simmering under the surface!
Next Article: Why Sync-Build-Ingest is super important for NuttX Continuous Integration. And how we monitor it with our Magic Disco Light.
After That: Since we can Rewind NuttX Builds and automatically Git Bisect… Can we create a Bot that will fetch the Failed Builds from NuttX Dashboard, identify the Breaking PR, and escalate to the right folks?
Many Thanks to the awesome NuttX Admins and NuttX Devs! And My Sponsors, for sticking with me all these years.
Got a question, comment or suggestion? Create an Issue or submit a Pull Request here…