Failing a Continuous Integration Test for Apache NuttX RTOS (QEMU RISC-V)

📝 22 Dec 2024

Failing a Continuous Integration Test for Apache NuttX RTOS (QEMU RISC-V)

Every Day: Our Dashboard for Apache NuttX RTOS will flag this bothersome bug, since a month ago

NuttX Dashboard

Which says that NuttX on QEMU RISC-V Emulator (32-bit) has failed our Continuous Integration Test. Again and again (like Kafka on Groundhog Day)

Configuration/Tool: rv-virt/citest
test_cmocka      PASSED
test_hello       PASSED
test_helloxx     FAILED
test_pipe        FAILED
test_usrsocktest FAILED
[...Failing all the way...]

The Bug Stops Here! In this article: We study the internals of the NuttX CI Test (for Continuous Integration) as we stomp the bug…

We begin with Docker…

Running the CI Test with Docker

§1 Run the CI Test

CI Test runs a bunch of Test Apps and fails. Why?

Thanks to Docker: We can run CI Test risc-v-05 on our Ubuntu PC. And figure out why it fails rv-virt:citest

(Steps for macOS Arm64)

## TODO: Install Docker Engine
## https://docs.docker.com/engine/install/ubuntu/

## TODO: For WSL, we may need to install Docker on Native Windows
## https://github.com/apache/nuttx/issues/14601#issuecomment-2453595402

## Start the NuttX Docker Image
## Name the Docker Container as `nuttx`
sudo docker run \
  -it \
  --name nuttx \
  ghcr.io/apache/nuttx/apache-nuttx-ci-linux:latest \
  /bin/bash

## Inside our Docker Container:
## Checkout the NuttX Repo and NuttX Apps
cd
git clone https://github.com/apache/nuttx
git clone https://github.com/apache/nuttx-apps apps
pushd nuttx ; echo NuttX Source: https://github.com/apache/nuttx/tree/$(git rev-parse HEAD) ; popd
pushd apps  ; echo NuttX Apps: https://github.com/apache/nuttx-apps/tree/$(git rev-parse HEAD) ; popd

## Run CI Test risc-v-05 inside our Docker Container
cd nuttx/tools/ci
./cibuild.sh \
  -c -A -N -R \
  testlist/risc-v-05.dat 

(More about NuttX Docker Build)

Docker will build rv-virt:citest and start the CI Test

Configuration/Tool: rv-virt/citest
python3 -m pytest -m 'qemu or rv_virt' ./ -B rv-virt -P /root/nuttx -L /root/nuttx/boards/risc-v/qemu-rv/rv-virt/configs/citest/logs/rv-virt/qemu -R qemu -C --json=/root/nuttx/boards/risc-v/qemu-rv/rv-virt/configs/citest/logs/rv-virt/qemu/pytest.json
test_cmocka      PASSED
test_hello       PASSED
test_helloxx     FAILED
test_pipe        FAILED
test_usrsocktest FAILED
[...Failing all the way...]

Which is totally unhelpful. Why is it failing?

(See the Complete Log)

Inside a CI Test

§2 Snoop the CI Test

To understand what went wrong: We connect to the Docker Container. And snoop the Background Processes

## Connect to our Running Docker Container
sudo docker exec \
  -it \
  nuttx \
  /bin/bash

## What's running now?
## `more` will show the Entire Command-Line
ps aux | more

A-ha! We see NuttX running on (32-bit) QEMU RISC-V Emulator

## We started this...
## https://github.com/apache/nuttx/blob/master/tools/ci/cibuild.sh
/root/nuttx/tools/ci/cibuild.sh -c -A -N -R testlist/risc-v-05.dat

## Which calls testbuild.sh...
## https://github.com/apache/nuttx/blob/master/tools/testbuild.sh
/root/nuttx/tools/testbuild.sh -A -N -R -j 24 -e -W

## Which calls citest/run...
## https://github.com/apache/nuttx/blob/master/boards/risc-v/qemu-rv/rv-virt/configs/citest/run
/root/nuttx/boards/risc-v/qemu-rv/rv-virt/configs/citest/run

## Which is sym-linked to cirun.sh...
## https://github.com/apache/nuttx/blob/master/tools/ci/cirun.sh
/root/nuttx/tools/ci/cirun.sh

## Which calls pytest...
## python3 -m pytest -m "${mark}" ./ -B ${BOARD} -P ${path} -L ${logs}/${BOARD}/${core} -R ${target} -C --json=${logs}/${BOARD}/${core}/pytest.json
python3 -m pytest -m 'qemu or rv_virt' ./ -B rv-virt -P /root/nuttx -L /root/nuttx/boards/risc-v/qemu-rv/rv-virt/configs/citest/logs/rv-virt/qemu -R qemu -C --json=/root/nuttx/boards/risc-v/qemu-rv/rv-virt/configs/citest/logs/rv-virt/qemu/pytest.json

## Which (probably) calls testrun...
## https://github.com/apache/nuttx/blob/master/tools/ci/testrun
/root/nuttx/tree/master/tools/ci/testrun

## Which boots NuttX on QEMU RISC-V...
qemu-system-riscv32 -M virt -bios ./nuttx -nographic -drive index=0,id=userdata,if=none,format=raw,file=./fatfs.img -device virtio-blk-device,bus=virtio-mmio-bus.0,drive=userdata
  | tee /root/nuttx/boards/risc-v/qemu-rv/rv-virt/configs/citest/logs/rv-virt/qemu/rv-virt_20241211_063532.log

## And tees the output to the above Log File

Let’s inspect the Log File…

(See the Complete Log)

Dumping the CI Log File

§3 Dump the CI Log File

From the Background Processes: We see that everything goes into this CI Test Log File

## Dump the CI Test Log File (e.g. rv-virt_20241211_063532.log)
$ cat /root/nuttx/boards/risc-v/qemu-rv/rv-virt/configs/citest/logs/rv-virt/qemu/*

NuttShell (NSH) NuttX-12.7.0
nsh> cmocka --skip test_case_posix_timer|test_case_oneshot|write_default|read_defaulCmocka Test Start.
Cmocka Test Completed.

nsh> ps
  PID GROUP PRI POLICY   TYPE    NPX STATE    EVENT     SIGMASK            STACK    USED FILLED COMMAND
    0     0   0 FIFO     Kthread   - Ready              0000000000000000 0001952 0000844  43.2%  Idle_Task
    1     0 224 RR       Kthread   - Waiting  Semaphore 0000000000000000 0001904 0000524  27.5%  hpwork 0x8014b1f4 0x8014b220
    2     0 100 RR       Kthread   - Waiting  Semaphore 0000000000000000 0001896 0000508  26.7%  lpwork 0x8014b1b0 0x8014b1dc
riscv_exception: EXCEPTION: Load access fault. MCAUSE: 00000005, EPC: 80008bfe, MTVAL: 01473e00
riscv_exception: PANIC!!! Exception = 00000005
dump_assert_info: Current Version: NuttX  12.7.0 5607eece84 Dec 11 2024 06:34:00 risc-v
dump_assert_info: Assertion failed panic: at file: common/riscv_exception.c:131 task: nsh_main process: nsh_main 0x8000a806
up_dump_register: EPC: 80008bfe

(See the Complete Log)

Hmmm this looks super interesting…

That’s why CI Test hangs, refuses to accept NSH Test Commands, and fails all subsequent tests!

Isolating the bug with QEMU RISC-V

§4 Test NuttX on QEMU RISC-V

What if we test rv-virt:citest on QEMU?

For simpler troubleshooting: This is how we compile rv-virt:citest inside Docker…

## Start Docker Container for NuttX
sudo docker run \
  -it \
  ghcr.io/apache/nuttx/apache-nuttx-ci-linux:latest \
  /bin/bash

## Inside Docker:
## We compile rv-virt:citest
cd
git clone https://github.com/apache/nuttx
git clone https://github.com/apache/nuttx-apps apps
pushd nuttx ; echo NuttX Source: https://github.com/apache/nuttx/tree/$(git rev-parse HEAD) ; popd
pushd apps  ; echo NuttX Apps: https://github.com/apache/nuttx-apps/tree/$(git rev-parse HEAD) ; popd
cd nuttx
tools/configure.sh rv-virt:citest
make -j

Now we boot NuttX on QEMU

$ qemu-system-riscv32 \
    -M virt \
    -bios ./nuttx \
    -nographic

NuttShell (NSH) NuttX-12.7.0
nsh> uname -a
NuttX  12.7.0 5607eece84 Dec 11 2024 07:05:48 risc-v rv-virt

nsh> ps
  PID GROUP PRI POLICY   TYPE    NPX STATE    EVENT     SIGMASK            STACK    USED FILLED COMMAND
    0     0   0 FIFO     Kthread   - Ready              0000000000000000 0001952 0000908  46.5%  Idle_Task
    1     0 224 RR       Kthread   - Waiting  Semaphore 0000000000000000 0001904 0000508  26.6%  hpwork 0x8014b1e4 0x8014b210
    2     0 100 RR       Kthread   - Waiting  Semaphore 0000000000000000 0001896 0000508  26.7%  lpwork 0x8014b1a0 0x8014b1cc
riscv_exception: EXCEPTION: Load access fault. MCAUSE: 00000005, EPC: 80008bfe, MTVAL: 01473e00
riscv_exception: PANIC!!! Exception = 00000005
dump_assert_info: Current Version: NuttX  12.7.0 5607eece84 Dec 11 2024 07:05:48 risc-v
dump_assert_info: Assertion failed panic: at file: common/riscv_exception.c:131 task: nsh_main process: nsh_main 0x8000a806
up_dump_register: EPC: 80008bfe

(See the Complete Log)

Yep we can easily reproduce (and fix) the “ps” crash using plain old Make and QEMU. No need for CI Test Script!

How will we fix the bug?

We’ll probably Rewind The Build and retest rv-virt:citest on QEMU RISC-V.

UPDATE: Tiago Medicci Serrano has just fixed the “ps” crash yay!

(But another bug appears. Exactly like Kafka on Groundhog Day sigh)

Continuous Integration Test for Apache NuttX RTOS (QEMU RISC-V)

§5 Why So Difficult?

  1. What’s this Pytest we saw earlier?

    Filipe Cavalcanti wrote an excellent article on Pytest in NuttX…

    “Testing applications with Pytest and NuttX”

    Pytest is a Python Testing Framework. Though it gets messy because NuttX CI isn’t actually testing Python Code with Pytest…

    It’s testing External Programs in QEMU. (Because NuttX boots inside QEMU)

  2. How does Pytest control QEMU?

    Remember test_helloxx that failed earlier? It calls…

  3. Why Pexpect?

    That’s how our CI Test spawns the QEMU Process and controls it…

    Pexpect is a pure Python module for spawning child applications; controlling them; and responding to expected patterns in their output. Pexpect works like Don Libes’ Expect. Pexpect allows your script to spawn a child application and control it as if a human were typing commands.”

  4. What about Don Libes’ Expect?

    Oh yes we use Plain Old Expect for Daily-Testing SG2000 NuttX and Ox64 NuttX. It terminates reliably, and it won’t hang forever, unlike Pexpect.

    (Hence this Horrible Hack)

  5. Why so hard to extract the CI Test Log?

    Yeah we should probably add the CI Test Log to GitHub Actions Artifacts for easier downloading. Then NuttX Dashboard can ingest the CI Test Log and render it sensibly.

    (Also: Extract the Leftover Files that fail the defconfig check)

  6. Can we do more with Pytest?

    Someday we could call Pytest to do Test-Driven Development of NuttX Apps and NuttX Drivers. Which means we can write the Test Cases in Pytest, before writing a Single Line of NuttX Code!

No more CI Test errors yay!

No more CI Test errors yay!

§6 What’s Next

Next Article: We’ll chat about an Experimental Mastodon Server for NuttX Continuous Integration.

Then Later: Let’s talk about Git Bisect and how we auto-magically discover a Breaking Commit in NuttX.

After That: What would NuttX Life be like without GitHub? We try out (self-hosted open-source) Forgejo Git Forge with NuttX.

Many Thanks to the awesome NuttX Admins and NuttX Devs! And my GitHub Sponsors, for sticking with me all these years.

Got a question, comment or suggestion? Create an Issue or submit a Pull Request here…

lupyuen.github.io/src/ci7.md

Running the CI Test with macOS Arm64

§7 Appendix: CI Test on macOS Arm64

macOS won’t work with the NuttX Docker Image. How to run the CI Test on macOS?

In the previous article we spoke about Compiling Everything NuttX on macOS Arm64…

Today we apply the same steps to replicate the CI Test on macOS Apple Silicon

## From https://lupyuen.org/articles/ci5
git clone https://github.com/lupyuen/nuttx-build-farm
cd nuttx-build-farm

## Set the GitHub Token: (Should have Gist Permission)
## export GITHUB_TOKEN=...
. $HOME/github-token.sh
brew install neofetch gh glab

## Run One Single NuttX CI Job on macOS
./run-job-macos.sh risc-v-05

(Also works for GitLab Snippets)

(See the Complete Log)

We wait for the CI Test to fail. Then we snoop the Background Processes

$ ps aux | grep qemu

## citest/run calls pytest...
## python3 -m pytest -m "${mark}" ./ -B ${BOARD} -P ${path} -L ${logs}/${BOARD}/${core} -R ${target} -C --json=${logs}/${BOARD}/${core}/pytest.json
Python -m pytest -m 'qemu or rv_virt' ./ -B rv-virt -P /private/tmp/run-job-macos/nuttx -L /private/tmp/run-job-macos/nuttx/boards/risc-v/qemu-rv/rv-virt/configs/citest/logs/rv-virt/qemu -R qemu -C --json=/private/tmp/run-job-macos/nuttx/boards/risc-v/qemu-rv/rv-virt/configs/citest/logs/rv-virt/qemu/pytest.json

## Which boots NuttX on QEMU RISC-V...
qemu-system-riscv32 -M virt -bios ./nuttx -nographic -drive index=0,id=userdata,if=none,format=raw,file=./fatfs.img -device virtio-blk-device,bus=virtio-mmio-bus.0,drive=userdata

## And tees the output to this Log File
tee /private/tmp/run-job-macos/nuttx/boards/risc-v/qemu-rv/rv-virt/configs/citest/logs/rv-virt/qemu/rv-virt_20241210_184405.log

(See the Complete Log)

Which says that the CI Test Logs are here (revealing the NuttX ps crash)

$ cat /tmp/run-job-macos/nuttx/boards/risc-v/qemu-rv/rv-virt/configs/citest/logs/rv-virt/qemu/* 

NuttShell (NSH) NuttX-12.7.0
nsh> cmocka --list
Cmocka Test Start ...

nsh> ps
  PID GROUP PRI POLICY   TYPE    NPX STATE    EVENT     SIGMASK            STACK    USED FILLED COMMAND
    0     0   0 FIFO     Kthread   - Ready              0000000000000000 0001952 0000828  42.4%  Idle_Task
    1     0 224 RR       Kthread   - Waiting  Semaphore 0000000000000000 0001896 0000524  27.6%  hpwork 0x8014b258 0x8014b27c
    2     0 100 RR       Kthread   - Waiting  Semaphore 0000000000000000 0001896 0000508  26.7%  lpwork 0x8014b220 0x8014b244
riscv_exception: EXCEPTION: Load access fault. MCAUSE: 00000005, EPC: 80022d50, MTVAL: 00000048
riscv_exception: PANIC!!! Exception = 00000005
dump_assert_info: Current Version: NuttX  12.7.0 1c7d81881c Dec 10 2024 19:01:06 risc-v
dump_assert_info: Assertion failed panic: at file: common/riscv_exception.c:131 task: nsh_main process: nsh_main 0x8000a7a0
up_dump_register: EPC: 80022d50

(See the CI Test Log)