Run #4273 confirmed two things:
1. The reproducibility gate works end-to-end. Both builds produced
ISOs (1077194752 vs 1077202944 bytes — 8 KiB delta, exactly one
squashfs block worth of compressed-payload drift) and the compare
step caught it.
2. diffoscope, run on the whole 1 GB ISO inside the silvermetal-builder
container, gets OOM-killed before producing any output:
diagnose-divergence.sh: line 44: 13 Killed
diffoscope --max-report-size 100000000 --html ... --text ... A.iso B.iso
The host has 19 GiB free, but diffoscope's full recursion through
ISO -> squashfs -> ~thousands of inner files needs more memory than
that for a 1 GB image. Setting --max-report-size only caps the
output, not the working-set.
Rewrite diagnose-divergence.sh to do staged, cheap-to-expensive
analysis:
1. sha256 + sizes (always)
2. xorriso TOC of both ISOs (every node: mode/size/mtime/path) -> diff
3. Pull just live/filesystem.squashfs out of each ISO,
sha256 it + `unsquashfs -ll` it, diff the listings — this is
where the per-file-size signal lives.
4. Targeted diffoscope on the squashfs payload only, with
--max-container-depth 2 + --max-text-report-size 5MB + --no-html
+ a 10-minute timeout. Bounded enough to finish without the OOM.
Drops `set -e` — every step `|| true`s itself so we get partial output
even when one stage fails.
Workflow tail-into-log step now prints the new staged outputs:
* toc-diff.txt — what changed at the ISO level
* sqfs-ls-diff.txt — which inner files have different sizes/mtimes
* sqfs-diff.txt — diffoscope on the squashfs only
* squashfs-sha256.txt
* iso-header-cmp.txt — first-8KB cmp -l for header-level drift
* sizes.txt / sha256.txt / checklist.md as before
Should land us a focused list of "these N files inside the squashfs
have different bytes" — that's what we need to find what's leaking
non-determinism into the build.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Run #4272 hit the M1.1 reproducibility gate as designed — both builds
completed, ISOs differed (A=ff2e7444…, B=9ec7f3da…), diagnose-divergence
fired. Two things stopped that diagnostic from being useful:
1. **diffoscope wasn't available.** diagnose-divergence.sh runs in the
catthehacker job container, which has cmp but no diffoscope. The
silvermetal-builder image we built two minutes earlier *does* have
diffoscope-minimal (Dockerfile.builder line 109). Run the diagnostic
inside that image: docker run --volumes-from $self_cid + the digest
the builder-image job passed in via BUILDER_IMAGE. Mounts the same
/workspace path so REPO_ROOT-relative resolution in
diagnose-divergence.sh works unchanged.
2. **The artifact was unreachable.** actions/upload-artifact@v3 against
Gitea 1.25.2 reports "successfully uploaded" but the
/api/v1/repos/.../actions/runs/{id}/artifacts list comes back empty,
and every download path probed returns 404. Known v3 incompatibility
— v3 uses the legacy GitHub Services API endpoint that Gitea
doesn't expose for retrieval.
Workaround: tail the divergence content into the workflow log
directly, so it shows up in `gitea actions logs` regardless of
upload-artifact's behaviour. Specifically: sizes.txt, sha256.txt,
checklist.md, head -n 400 of diff.txt (or cmp.txt as fallback).
That's enough to see what's diverging without needing the artifact.
Upload-artifact step kept in place for whenever Gitea's API gets
sorted (fix-once-then-forget).
The self-discovery loop (docker ps + inspect filtering by
/workspace/SilverLABS/SilverMetal mount destination) is the same one
build.sh uses; concurrency: 1 in this workflow guarantees a single
match.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two coupled changes that unblock the M1.1 iter loop. Both belong in CI;
iter1-15 was wrong to require human-in-the-loop steps to make progress.
1. **CI now builds Dockerfile.builder.**
`.gitea/workflows/build-iso-linux.yaml` grows a `builder-image` job
that runs ahead of `build-and-verify`. It rebuilds the silvermetal-
builder image from `linux/build/docker/Dockerfile.builder`, pushes it
to `docker-registry.silverlabs.uk/silvermetal-builder:m1.1-<sha>` (and
`:latest`), reads the resulting digest off `docker inspect`, and
feeds it forward as a job output. `build-and-verify` consumes that
digest as the `BUILDER_IMAGE` env override that `build.sh` already
honours (and validates is digest-form on line ~37).
That kills the old workflow where every Dockerfile.builder change
required a human to `docker build` + `docker push` on 10.0.0.51 by
hand and then bump the digest in `build.sh` in lockstep. The crash
that triggered this (exit 126 mid-iter16 build run) was a symptom of
that off-CI step still existing.
Both jobs run on the existing `silvermetal-builder` runner; the host
docker daemon is shared via DooD and is already authenticated to
`docker-registry.silverlabs.uk` (linux/build/runner/docker-compose.yml
mounts `/root/.docker:/root/.docker:ro`), so no extra login step.
The hardcoded `BUILDER_IMAGE` digest in `build.sh` stays as the
local-developer / offline-rebuild fallback. Comments updated in
`build.sh`, `Dockerfile.builder`, and `linux/build/README.md` to
match the new flow.
2. **reprepro wrapper for the benign "No priority for X" case.**
Pinned derivative-maker's `2100_create-debian-packages` (with
--target iso) re-imports source packages from snapshot.debian.org
into a local apt repo via `reprepro --basedir … includedsc local
<foo>.dsc`. The local repo's `conf/distributions` ships no
`DscOverride` entries, so any source package whose `.dsc` lacks an
explicit Priority field trips:
No priority for 'X', skipping.
There have been errors!
…and reprepro exits 255. dm-reprepro-wrapper bubbles that up,
2100_create-debian-packages aborts. The current offender is
`virtualbox_*.dsc` (key import is now fine — debian-keyring landed in
commit 4aa59ba — but the priority field gap remains). VirtualBox is
not in SilverMetal's `--target iso` set, so the sane behaviour is
"log it, continue".
New `linux/build/docker/silvermetal-reprepro-wrap.sh` shadows
`/usr/bin/reprepro` at `/usr/local/bin/reprepro` (PATH precedence).
It runs the real reprepro, captures merged stdout+stderr, and:
- if rc != 0 AND every non-blank output line matches one of the
known-benign patterns ("No priority for 'X', skipping." plus the
trailing "There have been errors!"), emits the output, logs one
line of explanation to stderr, and exits 0;
- otherwise emits the output and propagates rc unchanged.
Any *other* reprepro error path stays fatal — only the specific
"No priority for X" pattern is neutralised. `dm-reprepro-wrapper`
resolves `reprepro` via `\$PATH` so it picks up the wrapper
transparently.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Vendors Kicksecure derivative-maker as a pinned submodule (18.1.7.4),
adds the wrapper + verify + diagnose scripts, the pinned builder image,
and the reproducibility-gated Gitea Actions workflow. Base flavour only —
no hardening overlay (that's M1.2).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>