Auto-Rewind for Daily Test (Apache NuttX RTOS)

📝 9 Feb 2025

Auto-Rewind for Daily Test (Apache NuttX RTOS)

If the Daily Test fails for Apache NuttX RTOS … Can we Auto-Rewind and discover the Breaking Commit? Let’s try this (pic above)

  1. Every Day at 00:00 UTC: Ubuntu Cron shall trigger a Daily Build and Test of NuttX for QEMU RISC-V (knsh64 / 64-bit Kernel Build)

  2. If The Test Fails: Our Machine will Backtrack The Commits, rebuilding and retesting each commit (on QEMU Emulator)

  3. When it discovers the Breaking Commit: Our Machine shall post a Mastodon Alert, that includes the (suspicious) Pull Request

  4. Bonus: The Machine will draft a Polite Note for our NuttX Colleague to investigate the Pull Request, please

    Mastodon Alert for Daily Test

Why are we doing this?

If NuttX Fails on QEMU RISC-V: High chance that NuttX will also fail on RISC-V SBCs like Ox64 BL808 and Oz64 SG2000.

Thus it’s important to Nip the Bud and Fix the Bug early, before it hurts our RISC-V Devs. (Be Kind, Rewind!)

Find the Breaking Commit

§1 Find the Breaking Commit

We wrote a script that will Rewind the NuttX Build and discover the Breaking Commit…

## Set the GitLab Token, check that it's OK
## export GITLAB_TOKEN=...
. $HOME/gitlab-token.sh
glab auth status

## Set the GitLab User and Repo for posting GitLab Snippets
export GITLAB_USER=lupyuen
export GITLAB_REPO=nuttx-build-log

## Download the NuttX Rewind Script
git clone https://github.com/lupyuen/nuttx-build-farm
cd nuttx-build-farm

## Find the Breaking Commit for QEMU RISC-V (64-bit Kernel Build)
nuttx_hash=  ## Optional: Begin with this NuttX Hash
apps_hash=   ## Optional: Begin with this Apps Hash
./rewind-build.sh \
  rv-virt:knsh64_test \
  $nuttx_hash \
  $apps_hash

Our Rewind Script runs 20 Iterations of Build + Test like so…

## Build and Test: Latest NuttX Commit
git reset --hard HEAD
tools/configure.sh rv-virt:knsh64
make -j
qemu-system-riscv64 -kernel nuttx

## Build and Test: Previous NuttX Commit
git reset --hard HEAD~1
tools/configure.sh rv-virt:knsh64
make -j
qemu-system-riscv64 -kernel nuttx
...
## Build and Test: 20th NuttX Commit
git reset --hard HEAD~19
tools/configure.sh rv-virt:knsh64
make -j
qemu-system-riscv64 -kernel nuttx

## Roughly One Hour for 20 Rewinds of Build + Test

(What about Git Bisect? We’ll come back to this)

Build and Test 20 times! Won’t it look mighty messy?

Ah that’s why we present neatly the 20 Outcomes (Build + Test) as the NuttX Build History, inside our NuttX Dashboard

NuttX Build History

What’s inside our script? We dive in…

(Which Apps Hash to use? NuttX Build History can help)

Testing One Commit

§2 Testing One Commit

How to find the Breaking Commit?

We zoom out and explain slowly, from Micro to Macro. This script will Build and Test NuttX for One Single Commit on QEMU: build-test-knsh64.sh

## Build and Test NuttX for QEMU RISC-V 64-bit (Kernel Build)
## Download NuttX and Apps
git clone https://github.com/apache/nuttx
git clone https://github.com/apache/nuttx-apps apps

## Switch to this NuttX Commit and Apps Commit
pushd nuttx ; git reset --hard $nuttx_hash ; popd
pushd apps  ; git reset --hard $apps_hash  ; popd

## Configure the NuttX Build
cd nuttx
tools/configure.sh rv-virt:knsh64

## Build the NuttX Kernel
make -j

## Build the NuttX Apps
make -j export
pushd ../apps
./tools/mkimport.sh -z -x ../nuttx/nuttx-export-*.tar.gz
make -j import
popd

## Boot NuttX on QEMU RISC-V 64-bit
## Run OSTest with our Expect Script
wget https://raw.githubusercontent.com/lupyuen/nuttx-riscv64/main/qemu-riscv-knsh64.exp
expect qemu-riscv-knsh64.exp

(Expect Script shall validate the QEMU Output)

The script above is called by build_nuttx below. Which will wrap the output in the Log Format that NuttX Dashboard expects: rewind-commit.sh

## Build and Test One Commit
function build_nuttx { ...

  ## NuttX Dashboard expects this Log Format
  echo "===================================================================================="
  echo "Configuration/Tool: rv-virt/knsh64_test,"
  echo "$timestamp"
  echo "------------------------------------------------------------------------------------"

  ## Build and Test Locally: QEMU RISC-V 64-bit Kernel Build
  $script_dir/build-test-knsh64.sh \
    $nuttx_commit \
    $apps_commit
  res=$?

  ## Omitted: Build and Test Other Targets
  echo "===================================================================================="
}

Our Build-Test Log becomes this GitLab Snippet

Build-Test Log in GitLab Snippet

What’s the other stuff inside the GitLab Snippet above?

For Every Commit, we bundle Three Commits into a single Log File: This Commit, Previous Commit, Next Commit: rewind-commit.sh

## Build and Test This Commit
build_nuttx $nuttx_hash $apps_hash

## If Build / Test Fails...
if [[ "$res" != "0" ]]; then
  echo "BUILD / TEST FAILED FOR THIS COMMIT: nuttx @ $nuttx_hash / nuttx-apps @ $apps_hash"

  ## Rebuild / Retest with the Previous Commit
  build_nuttx $prev_hash $apps_hash
  if [[ "$res" != "0" ]]; then
    echo "BUILD / TEST FAILED FOR PREVIOUS COMMIT: nuttx @ $prev_hash / nuttx-apps @ $apps_hash"
  fi

  ## Rebuild / Retest with the Next Commit
  build_nuttx $next_hash $apps_hash
  if [[ "$res" != "0" ]]; then
    echo "BUILD / TEST FAILED FOR NEXT COMMIT: nuttx @ $next_hash / nuttx-apps @ $apps_hash"
  fi
fi

## Why the Long Echoes? We'll ingest them later

Our Three-In-One Log becomes a little easier to read, less flipping back and forth. Let’s zoom out…

Testing 20 Commits

§3 Testing 20 Commits

Who calls the script above: rewind-commit.sh?

The script above is called by run_job below: rewind-build.sh

## Build and Test This Commit
## And capture the Build-Test Output
function run_job { ...
  script $log_file \
    $script_option \
    " \
      $script_dir/rewind-commit.sh \
        $target \
        $nuttx_hash $apps_hash \
        $timestamp \
        $prev_hash $next_hash \
    "
}

Which captures the Build-Test Output into a Log File.

What happens to the Log File? We upload and publish it as a GitLab Snippet: rewind-build.sh

## Build and Test One Commit for the NuttX Target
function build_commit { ...

  ## Build and Test This Commit
  ## And capture the Build-Test Output into a Log File
  run_job \
    $log $timestamp \
    $apps_hash $nuttx_hash \
    $prev_hash $next_hash
  clean_log $log
  find_messages $log

  ## Upload the Build-Test Log File
  ## As GitLab Snippet
  upload_log \
    $log unknown \
    $nuttx_hash $apps_hash \
    $timestamp
}

(upload_log creates the GitLab Snippet)

(GitHub Gists are supported too)

Remember we need to Build and Test 20 Commits? We call the script above 20 times: rewind-build.sh

## Build and Test the Latest 20 Commits
num_commits=20
num_success=0
count=1
for commit in $(
  TZ=UTC0 \
  git log \
  -$(( $num_commits + 1 )) \
  --date='format-local:%Y-%m-%dT%H:%M:%S' \
  --format="%cd,%H"
); do
  ## Extract the Commit Timestamp and Commit Hash
  ## Commit looks like 2024-11-24T09:52:42,9f9cc7ecebd97c1a6b511a1863b1528295f68cd7
  prev_timestamp=$(echo $commit | cut -d ',' -f 1)  ## 2024-11-24T09:52:42
  prev_hash=$(echo $commit | cut -d ',' -f 2)       ## 9f9cc7ecebd97c1a6b511a1863b1528295f68cd7

  ## Build and Test the NuttX Hash + Apps Hash
  ## If It Fails: Build and Test the Previous NuttX Hash + Previous Apps Hash
  build_commit \
    $tmp_dir/$nuttx_hash.log \
    $timestamp \
    $apps_hash $nuttx_hash \
    $prev_hash $next_hash

  ## Shift the Commits
  ## Omitted: Skip the First Commit (because we need a Previous Commit)
  next_hash=$nuttx_hash
  nuttx_hash=$prev_hash
  timestamp=$prev_timestamp
  ((count++)) || true

  ## Stop when we have reached the
  ## Minimum Number of Successful Commits
  if [[ "$num_success" == "$min_commits" ]]; then
    break
  fi
done

With the Test Logs: We have everything we need to identify the Breaking Commit. Just dig up the Earliest Commit that fails the NuttX Test…

$ nuttx_hash=657247bda89d60112d79bb9b8d223eca5f9641b5
$ qemu-system-riscv64 -semihosting -M virt,aclint=on -cpu rv64 -kernel nuttx -nographic
  riscv_exception: EXCEPTION: Instruction page fault. MCAUSE: 000000000000000c, EPC: 000000018000001a, MTVAL: 000000018000001a
  Error: Test Failed

Though there’s an easier way…

Ingest the Test Log

§4 Ingest the Test Log

Why publish the Test Log as a GitLab Snippet?

That’s because we’ll Ingest the Test Log into our NuttX Dashboard. (So we can present the logs neatly as NuttX Build History)

NuttX Build History

This is how we Ingest a Test Log into our Prometheus Time-Series Database (that powers our NuttX Dashboard)

# TYPE build_score gauge
# HELP build_score 1.0 for successful build, 0.0 for failed build
build_score{ 

  ## These fields shall be rendered in Grafana (NuttX Dashboard and Build History)
  timestamp="2025-01-11T10:54:36",
  user="rewind",
  board="rv-virt",
  config="knsh64_test",
  target="rv-virt:knsh64_test",
  url="https://gitlab.com/lupyuen/nuttx-build-log/-/snippets/4800059#L85",

  ## Here's the NuttX Hash and Apps Hash for This Commit
  nuttx_hash="657247bda89d60112d79bb9b8d223eca5f9641b5",
  apps_hash="a6b9e718460a56722205c2a84a9b07b94ca664aa",

  ## Previous Commit is OK (Score=1)
  nuttx_hash_prev="be40c01ddd6f43a527abeae31042ba7978aabb58",
  apps_hash_prev="a6b9e718460a56722205c2a84a9b07b94ca664aa",
  build_score_prev="1",

  ## Next Commit is Not OK (Score=0)
  nuttx_hash_next="48846954d8506e1c95089a8654787fdc42cc098c",
  apps_hash_next="a6b9e718460a56722205c2a84a9b07b94ca664aa",
  build_score_next="0"

} 0  ## Means This Commit Failed (Score=0)

Hello Prometheus: We’re sending you this Test Log at the Specified URL…

Which is transformed and transmitted by our Rust App, from GitLab Snippet to Prometheus: ingest-nuttx-builds/main.rs

// Post the Test Log to Prometheus Pushgateway:
// Compose the Pushgateway Metric containing the Test Log
let body = format!(r##"
build_score ... url="{url}" ... {build_score}
"##);

// Send the Metric to Pushgateway via HTTP POST
let client = reqwest::Client::new();
let pushgateway = format!("http://localhost:9091/metrics/job/{user}/instance/{target_rewind}");
let res = client
  .post(pushgateway)
  .body(body)
  .send()
  .await?;

(How we fetch GitLab Snippets)

(And Extract the Fields from Test Logs)

(See the Rust App Log)

We promised to find the Earliest Commit that fails the NuttX Test…

$ nuttx_hash=657247bda89d60112d79bb9b8d223eca5f9641b5
$ qemu-system-riscv64 -semihosting -M virt,aclint=on -cpu rv64 -kernel nuttx -nographic
  riscv_exception: EXCEPTION: Instruction page fault. MCAUSE: 000000000000000c, EPC: 000000018000001a, MTVAL: 000000018000001a
  Error: Test Failed

Finally we’re ready…

Query Prometheus for Breaking Commit

§5 Query Prometheus for Breaking Commit

Test Logs are now inside Prometheus Database. How will Prometheus tell us the Breaking Commit?

Recall that our Prometheus Database contains…

The Test Logs in Prometheus will look like this…

Test Log #1This Commit FAILED
Previous Commit FAILED
Test Log #2This Commit FAILED
Previous Commit FAILED
Test Log #6This Commit FAILED
Previous Commit is OK
Test Log #7This Commit is OK
Test Log #8This Commit is OK
Test Log #9This Commit is OK
 

Ding ding: Test Log #6 will reveal the Breaking Commit!

NuttX Build History

Inside Prometheus: How to find Test Log #6?

We fetch the Breaking Commit with this Prometheus Query

build_score{
  target="rv-virt:knsh64_test",
  build_score_prev="1"
} == 0

Dear Prometheus: Please find the Test Log that matches the following…

Prometheus returns the Breaking Commit that we seek…

Prometheus returns the Breaking Commit that we seek

Coded in our Rust App like so: nuttx-rewind-notify/main.rs

// Query Prometheus for the Breaking Commit
let query = format!(r##"
  build_score{ ...
    target="{TARGET}",
    build_score_prev="1"
  ... } == 0
"##);

// Send query to Prometheus via HTTP Form Post
let params = [("query", query)];
let client = reqwest::Client::new();
let prometheus = format!("http://{prometheus_server}/api/v1/query");
let res = client
  .post(prometheus)
  .form(&params)
  .send()
  .await?;

// Process the Query Results (Breaking Commit)
let body = res.text().await?;
let data: Value = serde_json::from_str(&body).unwrap();
let builds = &data["data"]["result"];

One more thing to do with our Breaking Commit…

Write a Polite Note

§6 Write a Polite Note

Great! Our Machine has auto-discovered the Breaking Commit. But Our Machine can’t fix it right?

Here comes the Human-Computer Interface: Our Machine (kinda) Escalates the Breaking Commit to a Human Expert for fixing, politely please…

Sorry @USERNAME: The above PR is failing for rv-virt:knsh64_test. Could you please take a look? Thanks! nuttx-build-log/snippets/4800059

    $ git clone https://github.com/apache/nuttx
    $ git clone https://github.com/apache/nuttx-apps apps
    $ pushd nuttx
    $ git reset --hard 657247bda89d60112d79bb9b8d223eca5f9641b5
    HEAD is now at 657247bda8 libc/modlib: preprocess gnu-elf.ld
    $ popd
    NuttX Source: https://github.com/apache/nuttx/tree/657247bda89d60112d79bb9b8d223eca5f9641b5
    NuttX Apps: https://github.com/apache/nuttx-apps/tree/a6b9e718460a56722205c2a84a9b07b94ca664aa
    $ cd nuttx
    $ tools/configure.sh rv-virt:knsh64
    $ make -j
    ...
    $ qemu-system-riscv64 -semihosting -M virt,aclint=on -cpu rv64 -kernel nuttx -nographic
    riscv_exception: EXCEPTION: Instruction page fault. MCAUSE: 000000000000000c, EPC: 000000018000001a, MTVAL: 000000018000001a
    riscv_exception: Segmentation fault in PID 2: /system/bin/init

(Earlier Commit is OK) (See the Build History)

This goes to our Mastodon Server for NuttX Continuous Integration. I’ll copy this and paste it as a PR Comment, after my vetting.

Our Machine writes this based on the Breaking Commit? From the Previous Section?

Exactly! We won’t explain the Dull Bits. They involve…

  1. Extracting the Test Log

    (Only the Important Parts)

  2. Fetching the Breaking PR from GitHub

    (Based on the Breaking Commit)

  3. Composing the Mastodon Post

    (And Posting to Mastodon)

  4. Without any AI or LLM

    (Because they ain’t cheap)

Mastodon Alert for Daily Test

But Mastodon Posts are limited to 500 chars?

Bummer. That’s why we Create a GitLab Snippet for our Polite Note. And embed the Hyperlink in our Mastodon Post.

How to get the Breaking PR from GitHub?

We call the GitHub API

## Fetch the Pull Request for this Commit
$ commit=be40c01ddd6f43a527abeae31042ba7978aabb58
$ curl -L \
  -H "Accept: application/vnd.github+json" \
  -H "X-GitHub-Api-Version: 2022-11-28" \
  https://api.github.com/repos/apache/nuttx/commits/$commit/pulls

[{ html_url: "https://github.com/apache/nuttx/pull/15444",
   title:    "modlib: preprocess gnu-elf.ld for executable ELF",
   user: { login: "GITHUB_USERID", ...

Which becomes this function in our Rust App.

(See the Complete Log)

(Searching for NuttX Commit in Prometheus)

Cron Job for Daily Test and Rewind

§7 Cron Everything

We coded plenty of goodies over the Lunar New Year. How will they be triggered?

Via Ubuntu Cron. Every Day it shall trigger the Daily Test and Rewind (pic above)

## Add a Cron Job
$ crontab -e

## Then insert this...
## Test and Rewind: Every Day at 00:00 UTC
0 0 * * * /home/luppy/nuttx-build-farm/cron.sh 2>&1 | logger -t nuttx-rewind-build

## Or For Testing...
## Test and Rewind: Every Hour at 00:16, 01:16, 12:16, ...
16 * * * * /home/luppy/nuttx-build-farm/cron.sh 2>&1 | logger -t nuttx-rewind-build

## Exit and Monitor our Cron Job
$ tail -f /var/log/syslog

(cron.sh will start rewind-build.sh)

We’ll see the Test and Rewind in action…

(luppy) CMD (/home/luppy/nuttx-build-farm/cron.sh 2>&1 | logger -t nuttx-rewind-build)
+ ./rewind-build.sh rv-virt:knsh64_test HEAD HEAD 1 20
  /tmp/rewind-build-rv-virt:knsh64_test/apps /tmp/rewind-build-rv-virt:knsh64_test
  #1 of 20: Building nuttx @ 8995e5a66e14819e2bfda467d4f9fb8719fd9134 / nuttx_apps @ 43439a6b16a435bce7d9ac85f05c3a6013f91348
+ build_commit /tmp/rewind-build-rv-virt:knsh64_test/8995e5a66e14819e2bfda467d4f9fb8719fd9134.log 2025-02-03T08:21:26 43439a6b16a435bce7d9ac85f05c3a6013f91348 8995e5a66e14819e2bfda467d4f9fb8719fd9134 dc5251f9c8db878ac9706586eb85ad7e201286b6 8995e5a66e14819e2bfda467d4f9fb8719fd9134
+ run_job /tmp/rewind-build-rv-virt:knsh64_test/8995e5a66e14819e2bfda467d4f9fb8719fd9134.log 2025-02-03T08:21:26 43439a6b16a435bce7d9ac85f05c3a6013f91348 8995e5a66e14819e2bfda467d4f9fb8719fd9134 dc5251f9c8db878ac9706586eb85ad7e201286b6 8995e5a66e14819e2bfda467d4f9fb8719fd9134
+ script /tmp/rewind-build-rv-virt:knsh64_test/8995e5a66e14819e2bfda467d4f9fb8719fd9134.log -c '       /home/luppy/nuttx-build-farm/rewind-commit.sh         rv-virt:knsh64_test         8995e5a66e14819e2bfda467d4f9fb8719fd9134         43439a6b16a435bce7d9ac85f05c3a6013f91348         2025-02-03T08:21:26         dc5251f9c8db878ac9706586eb85ad7e201286b6         8995e5a66e14819e2bfda467d4f9fb8719fd9134     '
+ /home/luppy/nuttx-build-farm/build-test-knsh64.sh 8995e5a66e14819e2bfda467d4f9fb8719fd9134 43439a6b16a435bce7d9ac85f05c3a6013f91348#015
...
+ glab snippet new --repo lupyuen/nuttx-build-log --visibility public --title '[unknown] CI Log for rv-virt:knsh64_test @ 2025-02-03T08:21:26 / nuttx @ 8995e5a66e14819e2bfda467d4f9fb8719fd9134 / nuttx-apps @ 43439a6b16a435bce7d9ac85f05c3a6013f91348' --filename ci-unknown.log
  Creating snippet in https://gitlab.com/lupyuen/nuttx-build-log/-/snippets/4802191
  Done!

(See the Complete Log)

(See the GitLab Snippets)

Cron Jab for Mastodon Notification

And the Polite Note? That goes to our Mastodon Server?

Every 15 Minutes: Ubuntu Cron shall trigger the Mastodon Notification (pic above)

## Add a Cron Job
$ crontab -e

## Then insert this...
## Notify Mastodon: Every 15 minutes
*/15 * * * * /home/luppy/nuttx-rewind-notify/cron.sh 2>&1 | logger -t nuttx-rewind-notify

## Or For Testing...
## Notify Mastodon: Every Hour at 00:16, 01:16, 12:16, ...
16 * * * * /home/luppy/nuttx-rewind-notify/cron.sh 2>&1 | logger -t nuttx-rewind-notify

## Exit and Monitor our Cron Job
$ tail -f /var/log/syslog

(cron.sh will start our Rust App)

The Mastodon Notification appears like so…

(luppy) CMD (/home/luppy/nuttx-rewind-notify/cron.sh 2>&1 | logger -t nuttx-rewind-notify)
+ cargo run
build_score{
  target="rv-virt:knsh64_test",
  build_score_prev="1"
} == 0
rv-virt : KNSH64_TEST - Build Failed (rewind)
  Breaking PR: https://github.com/apache/nuttx/pull/15444
  Build History: https://nuttx-dashboard.org/d/fe2q876wubc3kc/nuttx-build-history?var-board=rv-virt&var-config=knsh64_test6
  Sorry @USERNAME: The above PR is failing for rv-virt:knsh64_test. Could you please take a look? Thanks!

(See the Complete Log)

Auto-Rewind for Daily Test (Apache NuttX RTOS)

§8 Be Kind, Rewind!

  1. Wow this looks super complicated. Does it work?

    Dunno, we’re still testing? Hopefully the New System will make my Daily Routine a little less painful…

    That’s why we’re Fast Tracking the complicated new system: Right now it runs Every Hour. (Instead of every day)

  2. What if it’s a smashing success?

    We might extend the Daily Rewind to a Real Board: Oz64 SG2000 RISC-V SBC.

    Or maybe SG2000 Emulator and Ox64 Emulator, since they’re quicker and more consistent than Real Hardware. (Though less accurate)

    Plus other QEMU Targets: qemu-armv8a:nsh, rv-virt:nsh / nsh64 / knsh

  3. Suppose we wish to add Our Own Boards to the System?

    Let’s assume we have Automated Board Testing. Then we could upload the NuttX Test Logs (in the prescribed format) to GitLab Snippets or GitHub Gists. They’ll appear in NuttX Dashboard and Build History.

    (Rewinding the Build on Our Own Boards? Needs more work)

  4. Why Rewind every commit? Isn’t Git Bisect quicker?

    Ah remember that we’re fixing Runtime Bugs, not Compile Errors. Git Bisect won’t work if the Runtime Bug is Not Reliably Reproducible.

    When we Rewind 20 Commits, we’ll know if the bug is Reliably Reproducible.

    Our bug is NOT Reliably Reproducible

  5. Why aren’t we using Docker?

    Docker doesn’t run OSTest correctly on QEMU RISC-V 64-bit.

  6. Any more Grand Plans?

    Our new PR Test Bot will allow a PR Comment to trigger a Build + Test on QEMU. For example, this PR Comment…

    @nuttxpr test rv-virt:knsh64

    Will trigger our PR Test Bot to download the PR Code, and run Build + Test on QEMU RISC-V. Or on Real Hardware

    @nuttxpr test milkv_duos:nsh

    Super helpful for Testing Pull Requests before Merging. But might have Security Implications. We’ll chat in the Next Article!

NuttX Bot for Building and Testing Pull Requests

§9 What’s Next

Special Thanks to Mr Gregory Nutt for your guidance and kindness. I’m also grateful to My Sponsors, for supporting my writing.

Got a question, comment or suggestion? Create an Issue or submit a Pull Request here…

lupyuen.org/src/rewind.md

Daily Test + Rewind is hosted on this hefty Ubuntu Xeon Workstation

Daily Test + Rewind is hosted on this hefty Ubuntu Xeon Workstation