Commit Graph

4 Commits

Author SHA1 Message Date
c9e67d8b47 fix(linux/build): staged divergence diagnostic, avoid OOM (M1.1 iter26)
Some checks failed
Build SilverMetal Linux ISO (reproducibility-gated) / builder-image (push) Successful in 1s
Build SilverMetal Linux ISO (reproducibility-gated) / build-and-verify (push) Failing after 33m36s
Run #4273 confirmed two things:

1. The reproducibility gate works end-to-end. Both builds produced
   ISOs (1077194752 vs 1077202944 bytes — 8 KiB delta, exactly one
   squashfs block worth of compressed-payload drift) and the compare
   step caught it.

2. diffoscope, run on the whole 1 GB ISO inside the silvermetal-builder
   container, gets OOM-killed before producing any output:

       diagnose-divergence.sh: line 44:    13 Killed
         diffoscope --max-report-size 100000000 --html ... --text ... A.iso B.iso

   The host has 19 GiB free, but diffoscope's full recursion through
   ISO -> squashfs -> ~thousands of inner files needs more memory than
   that for a 1 GB image. Setting --max-report-size only caps the
   output, not the working-set.

Rewrite diagnose-divergence.sh to do staged, cheap-to-expensive
analysis:
  1. sha256 + sizes (always)
  2. xorriso TOC of both ISOs (every node: mode/size/mtime/path) -> diff
  3. Pull just live/filesystem.squashfs out of each ISO,
     sha256 it + `unsquashfs -ll` it, diff the listings — this is
     where the per-file-size signal lives.
  4. Targeted diffoscope on the squashfs payload only, with
     --max-container-depth 2 + --max-text-report-size 5MB + --no-html
     + a 10-minute timeout. Bounded enough to finish without the OOM.

Drops `set -e` — every step `|| true`s itself so we get partial output
even when one stage fails.

Workflow tail-into-log step now prints the new staged outputs:
  * toc-diff.txt   — what changed at the ISO level
  * sqfs-ls-diff.txt — which inner files have different sizes/mtimes
  * sqfs-diff.txt   — diffoscope on the squashfs only
  * squashfs-sha256.txt
  * iso-header-cmp.txt — first-8KB cmp -l for header-level drift
  * sizes.txt / sha256.txt / checklist.md as before

Should land us a focused list of "these N files inside the squashfs
have different bytes" — that's what we need to find what's leaking
non-determinism into the build.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-07 19:54:35 +01:00
3f51b2fd7f feat(linux/build): run diffoscope inside silvermetal-builder + tail diff to log (M1.1 iter25)
Some checks failed
Build SilverMetal Linux ISO (reproducibility-gated) / builder-image (push) Successful in 1s
Build SilverMetal Linux ISO (reproducibility-gated) / build-and-verify (push) Failing after 35m22s
Run #4272 hit the M1.1 reproducibility gate as designed — both builds
completed, ISOs differed (A=ff2e7444…, B=9ec7f3da…), diagnose-divergence
fired. Two things stopped that diagnostic from being useful:

1. **diffoscope wasn't available.** diagnose-divergence.sh runs in the
   catthehacker job container, which has cmp but no diffoscope. The
   silvermetal-builder image we built two minutes earlier *does* have
   diffoscope-minimal (Dockerfile.builder line 109). Run the diagnostic
   inside that image: docker run --volumes-from $self_cid + the digest
   the builder-image job passed in via BUILDER_IMAGE. Mounts the same
   /workspace path so REPO_ROOT-relative resolution in
   diagnose-divergence.sh works unchanged.

2. **The artifact was unreachable.** actions/upload-artifact@v3 against
   Gitea 1.25.2 reports "successfully uploaded" but the
   /api/v1/repos/.../actions/runs/{id}/artifacts list comes back empty,
   and every download path probed returns 404. Known v3 incompatibility
   — v3 uses the legacy GitHub Services API endpoint that Gitea
   doesn't expose for retrieval.

   Workaround: tail the divergence content into the workflow log
   directly, so it shows up in `gitea actions logs` regardless of
   upload-artifact's behaviour. Specifically: sizes.txt, sha256.txt,
   checklist.md, head -n 400 of diff.txt (or cmp.txt as fallback).
   That's enough to see what's diverging without needing the artifact.
   Upload-artifact step kept in place for whenever Gitea's API gets
   sorted (fix-once-then-forget).

The self-discovery loop (docker ps + inspect filtering by
/workspace/SilverLABS/SilverMetal mount destination) is the same one
build.sh uses; concurrency: 1 in this workflow guarantees a single
match.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-07 19:14:44 +01:00
e260fe1c81 ci(linux/build): self-host the builder image build + iter16 reprepro wrap (M1.1)
Some checks failed
Build SilverMetal Linux ISO (reproducibility-gated) / builder-image (push) Failing after 2s
Build SilverMetal Linux ISO (reproducibility-gated) / build-and-verify (push) Has been skipped
Two coupled changes that unblock the M1.1 iter loop. Both belong in CI;
iter1-15 was wrong to require human-in-the-loop steps to make progress.

1. **CI now builds Dockerfile.builder.**

   `.gitea/workflows/build-iso-linux.yaml` grows a `builder-image` job
   that runs ahead of `build-and-verify`. It rebuilds the silvermetal-
   builder image from `linux/build/docker/Dockerfile.builder`, pushes it
   to `docker-registry.silverlabs.uk/silvermetal-builder:m1.1-<sha>` (and
   `:latest`), reads the resulting digest off `docker inspect`, and
   feeds it forward as a job output. `build-and-verify` consumes that
   digest as the `BUILDER_IMAGE` env override that `build.sh` already
   honours (and validates is digest-form on line ~37).

   That kills the old workflow where every Dockerfile.builder change
   required a human to `docker build` + `docker push` on 10.0.0.51 by
   hand and then bump the digest in `build.sh` in lockstep. The crash
   that triggered this (exit 126 mid-iter16 build run) was a symptom of
   that off-CI step still existing.

   Both jobs run on the existing `silvermetal-builder` runner; the host
   docker daemon is shared via DooD and is already authenticated to
   `docker-registry.silverlabs.uk` (linux/build/runner/docker-compose.yml
   mounts `/root/.docker:/root/.docker:ro`), so no extra login step.

   The hardcoded `BUILDER_IMAGE` digest in `build.sh` stays as the
   local-developer / offline-rebuild fallback. Comments updated in
   `build.sh`, `Dockerfile.builder`, and `linux/build/README.md` to
   match the new flow.

2. **reprepro wrapper for the benign "No priority for X" case.**

   Pinned derivative-maker's `2100_create-debian-packages` (with
   --target iso) re-imports source packages from snapshot.debian.org
   into a local apt repo via `reprepro --basedir … includedsc local
   <foo>.dsc`. The local repo's `conf/distributions` ships no
   `DscOverride` entries, so any source package whose `.dsc` lacks an
   explicit Priority field trips:

       No priority for 'X', skipping.
       There have been errors!

   …and reprepro exits 255. dm-reprepro-wrapper bubbles that up,
   2100_create-debian-packages aborts. The current offender is
   `virtualbox_*.dsc` (key import is now fine — debian-keyring landed in
   commit 4aa59ba — but the priority field gap remains). VirtualBox is
   not in SilverMetal's `--target iso` set, so the sane behaviour is
   "log it, continue".

   New `linux/build/docker/silvermetal-reprepro-wrap.sh` shadows
   `/usr/bin/reprepro` at `/usr/local/bin/reprepro` (PATH precedence).
   It runs the real reprepro, captures merged stdout+stderr, and:
   - if rc != 0 AND every non-blank output line matches one of the
     known-benign patterns ("No priority for 'X', skipping." plus the
     trailing "There have been errors!"), emits the output, logs one
     line of explanation to stderr, and exits 0;
   - otherwise emits the output and propagates rc unchanged.

   Any *other* reprepro error path stays fatal — only the specific
   "No priority for X" pattern is neutralised. `dm-reprepro-wrapper`
   resolves `reprepro` via `\$PATH` so it picks up the wrapper
   transparently.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-07 17:30:08 +01:00
4444dc11f3 feat(linux/build): scaffold reproducible ISO build pipeline (M1.1)
Vendors Kicksecure derivative-maker as a pinned submodule (18.1.7.4),
adds the wrapper + verify + diagnose scripts, the pinned builder image,
and the reproducibility-gated Gitea Actions workflow. Base flavour only —
no hardening overlay (that's M1.2).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-26 04:25:48 +01:00