Fixing a uname bug (Apache NuttX RTOS)

📝 19 Jan 2025

Fixing a uname bug (Apache NuttX RTOS)

Earlier This Week: uname became unusually quieter on Apache NuttX RTOS

## Hmmm something is missing
NuttShell (NSH) NuttX-12.8.0
nsh> uname -a
NuttX 12.8.0  risc-v rv-virt

See the subtle bug? The Commit Hash is missing!

## Commit Hash should always appear
nsh> uname -a
NuttX 12.8.0 5f4a15b690 Jan 13 2025 00:34:30 risc-v rv-virt

Commit Hash identifies the Exact Commit of NuttX that was used to produce the NuttX Build

Can we ignore it? Maybe nobody will notice?

Noooooo! Commit Hash identifies the Exact Commit of NuttX that was used to produce the NuttX Build. (Pic above)

Watch as we stomp the seemingly simple bug… That turns out to be something seriously sinister! (Spoiler: Static Vars are broken)

§1 Inside uname

uname on NuttX: How does it work?

Use the Source, Luke! First we peek inside the uname command.

Our bug happens in NuttX Shell. Thus we search NuttX Apps Repo for uname

Search NuttX Apps Repo for uname

Searching for uname returns this code in NuttX Shell: nsh_syscmds.c

// Declare the uname() function
#include <sys/utsname.h>

// NuttX Shell: To execute the uname command...
// We call the uname() function
int cmd_uname(...) { ...
  struct utsname info;
  ret = uname(&info);

We see that uname command calls the uname function.

So we search the NuttX Kernel Repo for uname

Search the NuttX Kernel Repo for uname

NuttX Kernel Search says that uname is defined here: lib_utsname.c

// CONFIG_VERSION_BUILD goes inside Static Var g_version
static char g_version[] = CONFIG_VERSION_BUILD;  // Omitted: Date and Time

// g_version goes into the uname output
int uname(FAR struct utsname *output) { ...
  strlcpy(
    output->version,         // Copy into the Output Version
    g_version,               // From our Static Var (CONFIG_VERSION_BUILD a.k.a Commit Hash)
    sizeof(output->version)  // Making sure we don't overflow
  );

(Is uname a Kernel Function? We’ll see soon)

CONFIG_VERSION_BUILD inside uname

§2 CONFIG_VERSION_BUILD

What’s this CONFIG_VERSION_BUILD?

Earlier we saw that uname function returns CONFIG_VERSION_BUILD: lib_utsname.c

// CONFIG_VERSION_BUILD goes inside Static Var g_version
static char g_version[] = CONFIG_VERSION_BUILD;  // Omitted: Date and Time

// g_version goes into the uname output
int uname(FAR struct utsname *output) { ...
  strlcpy(
    output->version,         // Copy into the Output Version
    g_version,               // From our Static Var (CONFIG_VERSION_BUILD a.k.a Commit Hash)
    sizeof(output->version)  // Making sure we don't overflow
  );

Let’s track the origin of CONFIG_VERSION_BUILD. We build NuttX for QEMU RISC-V 64-bit (Kernel Mode)

## Download the NuttX Kernel and NuttX Apps
git clone https://github.com/apache/nuttx
git clone https://github.com/apache/nuttx-apps apps

## Configure NuttX for QEMU RISC-V 64-bit (Kernel Mode)
cd nuttx
tools/configure.sh rv-virt:knsh64

## Build the NuttX Kernel
make -j

## Build the NuttX Apps
make export
pushd ../apps
./tools/mkimport.sh -z -x ../nuttx/nuttx-export-*.tar.gz
make import
popd

(See the Build Log)

Maybe CONFIG_VERSION_BUILD is in the NuttX Config File?

$ grep CONFIG_VERSION_BUILD .config
[ Nothing ]
## Nope it's not!

We head back to NuttX Kernel Repo and search for CONFIG_VERSION_BUILD

The Version Number you are looking at comes from the Header File nuttx/include/nuttx/version.h.

That Header File was created at build time from a Hidden File that you can find in the top-level nuttx directory called .version.

Aha! CONFIG_VERSION_BUILD a.k.a. Commit Hash comes from version.h

$ cat include/nuttx/version.h 
#define CONFIG_VERSION_BUILD "a2d4d74af7"

(Thanks to Ludovic Vanasse for porting the docs)

Static Variable g_version inside uname

§3 Static Variable g_version

Is CONFIG_VERSION_BUILD compiled correctly into our NuttX Image?

We snoop the NuttX Kernel Image to verify that CONFIG_VERSION_BUILD is correct.

Recall that CONFIG_VERSION_BUILD is stored in Static Variable g_version: lib_utsname.c

// CONFIG_VERSION_BUILD goes inside Static Var g_version
static char g_version[] = CONFIG_VERSION_BUILD;  // Omitted: Date and Time

// g_version goes into the uname output
int uname(FAR struct utsname *output) { ...
  strlcpy(
    output->version,         // Copy into the Output Version
    g_version,               // From our Static Var (CONFIG_VERSION_BUILD a.k.a Commit Hash)
    sizeof(output->version)  // Making sure we don't overflow
  );

According to NuttX Linker Map: Address of g_version is 0x8040 03B8

## Search for g_version in Linker Map, show 1 line after
$ grep \
  --after-context=1 \
  g_version \
  nuttx.map

.data.g_version
  0x804003b8  0x21  staging/libkc.a(lib_utsname.o)

What’s the value inside g_version? We dump the Binary Image from NuttX Kernel ELF…

## Export the NuttX Binary Image to nuttx.bin
riscv-none-elf-objcopy \
  -O binary \
  nuttx \
  nuttx.bin

Earlier we said g_version is at 0x8040 03B8.

We open nuttx.bin in VSCode Hex Editor, press Ctrl-G and jump to 0x2003B8

(Because NuttX Kernel loads at 0x8020 0000)

nuttx.bin in VSCode Hex Viewer

And that’s our CONFIG_VERSION_BUILD with Commit Hash! Looks hunky dory, why wasn’t it returned correctly to uname and NuttX Shell?

Call uname in NuttX Kernel

§4 Call uname in NuttX Kernel

Maybe NuttX Kernel got corrupted? Returning bad data for uname?

We tweak the NuttX Kernel and call uname at Kernel Startup: qemu_rv_appinit.c

// Declare the uname() function
#include <sys/utsname.h>

// When Kernel Boots:
// Call the uname() function
int board_app_initialize(uintptr_t arg) { ...
  struct utsname info;
  int ret2 = uname(&info);

  // If uname() returns OK:
  // Print the Commit Hash a.k.a. g_version
  if (ret2 == 0) {
    _info("version=%s\n", info.version);
  }

Then inside the uname function, we dump the value of g_version: lib_utsname.c

// Inside the uname() function:
// Print g_version with _info() and printf()
int uname(FAR struct utsname *name) { ...
  _info("From _info: g_version=%s\n",   g_version);  // Kernel Only
  printf("From printf: g_version=%s\n", g_version);  // Kernel and Apps
  printf("Address of g_version=%p\n",   g_version);  // Kernel and Apps

(Why print twice? We’ll see soon)

We boot NuttX on QEMU RISC-V 64-bit

## Start QEMU with NuttX
$ qemu-system-riscv64 \
  -semihosting \
  -M virt,aclint=on \
  -cpu rv64 \
  -kernel nuttx \
  -nographic

## NuttX Kernel shows Commit Hash
From _info:
  g_version=bd6e5995ef Jan 16 2025 15:29:02
From printf:
  g_version=bd6e5995ef Jan 16 2025 15:29:02
  Address of g_version=0x804003b8
board_app_initialize:
  version=bd6e5995ef Jan 16 2025 15:29:02
NuttShell (NSH) NuttX-12.4.0

(See the Complete Log)

Yep NuttX Kernel correctly prints g_version a.k.a. CONFIG_VERSION_BUILD a.k.a. Commit Hash. No Kernel Corruption! (Phew)

Call uname in NuttX App

§5 Call uname in NuttX App

Maybe something got corrupted in our NuttX App?

Wow that’s so diabolical, sure hope not. We mod the NuttX Hello App and call uname: hello_main.c

// Declare the uname() function
#include <sys/utsname.h>

// In Hello App: Call the uname() function
int main(int argc, FAR char *argv[]) {
  struct utsname info;
  int ret = uname(&info);

  // If uname() returns OK:
  // Print the Commit Hash a.k.a. g_version
  if (ret >= 0) {
    printf("version=%s\n", info.version);
  }

Indeed something is messed up with g_version a.k.a. CONFIG_VERSION_BUILD a.k.a. Commit Hash…

## Why is Commit Hash empty?
NuttShell (NSH) NuttX-12.8.0
nsh> hello
version=

(See the Complete Log)

Inside our NuttX App: Why is g_version empty? Wasn’t it OK in NuttX Kernel?

§6 Dump the NuttX App Disassembly

Why did uname work differently: NuttX Kernel vs NuttX Apps?

Now we chase the uname raving rabbid inside our NuttX App. Normally we’d dump the RISC-V Disassembly for our Hello App ELF…

## Dump the RISC-V Disassembly for apps/bin/hello
$ riscv-none-elf-objdump \
  --syms --source --reloc --demangle --line-numbers --wide \
  --debugging \
  ../apps/bin/hello \
  >hello.S \
  2>&1

## Impossible to read, without Debug Symbols
$ more hello.S
SYMBOL TABLE: no symbols
00000000c0000000 <.text>:
  c0000000: 1141  add sp, sp, -16
  c0000002: e006  sd  ra, 0(sp)
  c0000004: 82aa  mv  t0, a0

But ugh NuttX Build has unhelpfully Discarded the Debug Symbols from our Hello App ELF, making it hard to digest.

How to recover the Debug Symbols?

We sniff the NuttX Build

## Update our Hello App
$ cd ../apps
$ touch examples/hello/hello_main.c

## Trace the NuttX Build for Hello App
$ make import V=1
LD:  apps/bin/hello 
riscv-none-elf-ld -e main --oformat elf64-littleriscv -T nuttx/libs/libc/modlib/gnu-elf.ld -e __start -Bstatic -Tapps/import/scripts/gnu-elf.ld  -Lapps/import/libs -L "xpack-riscv-none-elf-gcc-13.2.0-2/bin/../lib/gcc/riscv-none-elf/13.2.0/rv64imafdc_zicsr/lp64d" apps/import/startup/crt0.o  hello_main.c...apps.examples.hello.o --start-group -lmm -lc -lproxies -lgcc apps/libapps.a xpack-riscv-none-elf-gcc-13.2.0-2/bin/../lib/gcc/riscv-none-elf/13.2.0/rv64imafdc_zicsr/lp64d/libgcc.a --end-group -o  apps/bin/hello
cp apps/bin/hello apps/bin_debug
riscv-none-elf-strip --strip-unneeded apps/bin/hello

## apps/bin/hello is missing the Debug Symbols
## apps/bin_debug/hello retains the Debug Symbols!

Ah NuttX Build has squirrelled away the Debug Version of Hello App into apps/bin_debug. We dump its RISC-V Disassembly

## Dump the RISC-V Disassembly for apps/bin_debug/hello
cd ../nuttx
riscv-none-elf-objdump \
  --syms --source --reloc --demangle --line-numbers --wide \
  --debugging \
  ../apps/bin_debug/hello \
  >hello.S \
  2>&1

(See the RISC-V Disassembly hello.S)

§7 Snoop uname in NuttX App

Once Again: How is uname different in NuttX Kernel vs NuttX App?

Earlier we dumped the RISC-V Disassembly for our modded Hello App: hello.S

We browse the disassembly and search for uname. This appears: hello.S

// Inside Hello App: The RISC-V Disassembly of uname() function
int uname(FAR struct utsname *name) { ...

// Call _info() to print g_version
_info("From _info: g_version=%s\n", g_version);
  auipc a3, 0x100
  add   a3, a3, 170  // Arg #3: g_version
  auipc a2, 0x2
  add   a2, a2, -270 // Arg #2: Format String
  auipc a1, 0x2
  add   a1, a1, -814 // Arg #1: VarArgs Size (I think)
  li    a0, 6        // Arg #0: Info Logging Priority
  jal   c00007c8     // Call syslog()

// Call printf() to print g_version
printf("From printf: g_version=%s\n", g_version);
  auipc a1, 0x100
  add   a1, a1, 140  // Arg #1: g_version
  auipc a0, 0x2
  add   a0, a0, -804 // Arg #0: Format String
  jal   c00001e6     // Call printf()

// Call printf() to print Address of g_version
printf("Address of g_version=%p\n", g_version);
  auipc a1, 0x100
  add   a1, a1, 120  // Arg #1: g_version
  auipc a0, 0x2
  add   a0, a0, -792 // Arg #0: Format String
  jal   c00001e6     // Call printf()

// Copy g_version into the uname() output
strlcpy(name->version,  g_version, sizeof(name->version));
  li    a2, 51       // Arg #2: Size of name->version
  auipc a1, 0x100
  add   a1, a1, 96   // Arg #1: g_version
  add   a0, s0, 74   // Arg #0: name->version
  jal   c0000748     // Call strlcpy()

Which does 4 things…

  1. Call _info (a.k.a. syslog) to print g_version

  2. Call printf to print g_version

  3. Followed by Address of g_version

  4. Copy g_version into the uname output

§8 uname is Not a Kernel Call

Huh? Isn’t this the exact same Kernel Code we saw earlier?

Precisely! We expected uname to be a System Call to NuttX Kernel

NuttX App calls NuttX Kernel

But nope, uname is a Local Function. (Not a System Call)

uname is a Local Function, not a System Call

Every NuttX App has a Local Copy of g_version and Commit Hash. (That’s potentially corruptible hmmm…)

Which explains why printf appears in the Hello Output but not _info

## NuttX Kernel: Shows _info() and printf()
From _info:
  g_version=bd6e5995ef Jan 16 2025 15:29:02
From printf:
  g_version=bd6e5995ef Jan 16 2025 15:29:02
  Address of g_version=0x804003b8

## NuttX Apps: Won't show _info()
NuttShell (NSH) NuttX-12.4.0
nsh> hello
From printf:
  g_version=
  Address of g_version=0xc0100218

(Because _info and syslog won’t work in NuttX Apps)

The Full Path of uname is a dead giveaway: It’s a Library Function. (Not a Kernel Function)

libs/libc/misc/lib_utsname.c

(uname is a System Call in Linux)

§9 Static Variables are Broken

Gasp! What if g_version a.k.a. Commit Hash got corrupted inside our app?

Earlier we saw that g_version is a Static Variable that contains our Commit Hash: lib_utsname.c

// CONFIG_VERSION_BUILD goes inside Static Var g_version
static char g_version[] = CONFIG_VERSION_BUILD;  // Omitted: Date and Time

// g_version goes into the uname output
int uname(FAR struct utsname *output) { ...
  strlcpy(
    output->version,         // Copy into the Output Version
    g_version,               // From our Static Var (CONFIG_VERSION_BUILD a.k.a Commit Hash)
    sizeof(output->version)  // Making sure we don't overflow
  );

We have a hefty hunch that Static Variables might be broken 😱. We test our hypothesis in Hello App: hello_main.c

// Define our Static Var
static char test_static[] =
  "Testing Static Var";

// In Hello App: Print our Static Var
// "test_static=Testing Static Var"
int main(int argc, FAR char *argv[]) {
  printf("test_static=%s\n", test_static);
  printf("Address of test_static=%p\n", test_static);

Our hunch is 100% correct: Static Variables are Broken!

## Why is Static Var `test_static` empty???
NuttShell (NSH) NuttX-12.4.0
nsh> hello
test_static=
Address of test_static=0xc0100200

(See the Complete Log)

Static Variables are Broken!

OK this goes waaaaay beyond our debugging capability. (NuttX App Data Section got mapped incorrectly into the Memory Space?)

We call in the NuttX Experts for help. And it’s awesomely fixed by anjiahao yay! 🎉

Lesson Learnt: Please pay attention to the slightest disturbance, like the uname output…

It might be a sign of something seriously sinister simmering under the surface!

Fixing a uname bug (Apache NuttX RTOS)

§10 What’s Next

Next Article: Why Sync-Build-Ingest is super important for NuttX Continuous Integration. And how we monitor it with our Magic Disco Light.

After That: Since we can Rewind NuttX Builds and automatically Git Bisect… Can we create a Bot that will fetch the Failed Builds from NuttX Dashboard, identify the Breaking PR, and escalate to the right folks?

Many Thanks to the awesome NuttX Admins and NuttX Devs! And My Sponsors, for sticking with me all these years.

Got a question, comment or suggestion? Create an Issue or submit a Pull Request here…

lupyuen.org/src/uname.md