The boot.wim now carries WinPE-NetFx/PowerShell (collector), growing the image ~0.4GB,
and each build persists a ~5GB ISO to C:\silvermetal\out. On the single-volume runner
that accumulation starved oscdimg ('Insufficient disk space'). Clear prior output +
stale smbuild work dirs at job start so free space self-heals each run.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
A prior aborted build left a DISM image mounted in the fixed WorkDir,
locking install.wim and breaking the Stage 2 extract clean-up. Add a
Stage 0 that discards any orphaned SilverMetal mounts + loaded hives
before recreating the work dirs, and run CI in an ephemeral per-job
RUNNER_TEMP WorkDir so concurrent/aborted runs can't collide.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
RUNNER_TEMP is ephemeral; copy the validated build output to C:\silvermetal\out\
so it can be retrieved out of band (e.g. for VM boot-testing).
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Their job is done (runner topology mapped, C: extended, ISO staged). The build
+ offline-validation pipeline is green on the runner.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Base eval ISO staged at C:\silvermetal\base.iso on GITEA-RUN-WIN (SHA256
2CEE70BD...CB29 pinned in inputs.manifest.json). Repo var now points at that
local path, so the build reads locally - no NAS share auth / no CI creds.
Dropped -SkipInputVerify so the build verifies the pinned hash.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Master creds must not live in this public repo's Actions, so ISO staging is
handled out-of-band. runner-prep now only extends C: into the resized virtual
disk. Quoted the step name (trailing-colon YAML fix).
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Temporary diagnostic to see the silverlabs-runner-win host identity, drives,
share mounts/stored creds, and ISO reachability before wiring the base-ISO
source. Removed once the source is settled.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
SILVERMETAL_BASE_ISO_URL now accepts an HTTP(S) URL or a UNC/local path. For a
UNC share that the SYSTEM-context runner can't read anonymously, optional repo
secrets SILVERMETAL_ISO_SHARE_USER/_PASS map the share root via net use first.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Implement build.ps1 (M2): mount/extract the base ISO, offline-service
install.wim (inject GPD drivers if staged, debloat appx, bake SetupComplete.cmd
+ hardening modules into \Windows\Setup\Scripts), inject autounattend.xml,
oscdimg UEFI repack, emit SHA-256 + SBOM. Elevation + oscdimg guarded.
Add .gitea/workflows/build-iso-windows.yaml: runs on the self-hosted
silverlabs-runner-win (windows-latest), ensures ADK Deployment Tools, acquires
the base ISO from repo var SILVERMETAL_BASE_ISO_URL or a pre-staged path, builds,
validates the baked payload offline, uploads SBOM/SHA (+ISO on dispatch/tag),
attaches to a Gitea release on win-v* tags. Mirrors build-iso-linux.yaml.
Add tests/Assert-IsoStructure.ps1: the no-nested-virt CI gate - mounts the built
ISO + install.wim read-only and asserts autounattend.xml, SetupComplete.cmd, and
the hardening modules are correctly baked. Full QEMU boot+Verify is a follow-on.
Switch autounattend to Windows' native SetupComplete.cmd auto-run (SYSTEM, end
of setup) instead of a duplicate FirstLogonCommands call.
Untested until first runner execution (dev box is ARM64). All PS parse-clean;
autounattend XML + workflow YAML valid.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Run #4281 cleared every layer above the ISO9660 wrapper:
SHA256 (squashfs payload)
caed117ca72c6c1d9204c49dd749d5f7b372f3a19cac1b2a7e66bee452a8d501 /tmp/.../a.squashfs
caed117ca72c6c1d9204c49dd749d5f7b372f3a19cac1b2a7e66bee452a8d501 /tmp/.../b.squashfs
…squashfs is now byte-identical, ISO TOC is identical, file listing
diff is empty, but ISO SHA still differs. The remaining drift is in
the ISO9660 metadata region between the system area (first 32 KiB)
and the file payload start.
Two complementary changes:
1. xorriso post-process now sets *every* date field xorriso writes,
not just the obvious two:
-alter_date_r all — atime + mtime + btime on all nodes,
not just mtime. ISO9660 directory
records carry creation+modification
timestamps.
-volume_date c m x f u s — every volume-descriptor date:
c=creation m=modification x=expiration f=effective
u=system area s=path table
Default for any unset volume_date is "now", which is what was
leaking through despite us setting c+m.
2. diagnose-divergence.sh now does whole-file cmp -l (capped at 200
lines so 1 GiB of all-different doesn't drown the report) and on
any divergence, dumps a 128-byte xxd window from each ISO around
the first differing byte plus a unified diff between the two
windows. This tells us in the next failure log "first byte differs
at offset N (LBA M), bytes around it look like X" — pinpoints the
ISO9660 region without needing artifact download.
Workflow tail-into-log step wired up the two new files
(iso-cmp-first-200.txt, iso-around-first-diff.diff).
If iter34 still fails the gate, the new diagnostic tells us exactly
which structure (volume descriptor, path table, directory record,
boot catalog…) is still drifting.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Run #4273 confirmed two things:
1. The reproducibility gate works end-to-end. Both builds produced
ISOs (1077194752 vs 1077202944 bytes — 8 KiB delta, exactly one
squashfs block worth of compressed-payload drift) and the compare
step caught it.
2. diffoscope, run on the whole 1 GB ISO inside the silvermetal-builder
container, gets OOM-killed before producing any output:
diagnose-divergence.sh: line 44: 13 Killed
diffoscope --max-report-size 100000000 --html ... --text ... A.iso B.iso
The host has 19 GiB free, but diffoscope's full recursion through
ISO -> squashfs -> ~thousands of inner files needs more memory than
that for a 1 GB image. Setting --max-report-size only caps the
output, not the working-set.
Rewrite diagnose-divergence.sh to do staged, cheap-to-expensive
analysis:
1. sha256 + sizes (always)
2. xorriso TOC of both ISOs (every node: mode/size/mtime/path) -> diff
3. Pull just live/filesystem.squashfs out of each ISO,
sha256 it + `unsquashfs -ll` it, diff the listings — this is
where the per-file-size signal lives.
4. Targeted diffoscope on the squashfs payload only, with
--max-container-depth 2 + --max-text-report-size 5MB + --no-html
+ a 10-minute timeout. Bounded enough to finish without the OOM.
Drops `set -e` — every step `|| true`s itself so we get partial output
even when one stage fails.
Workflow tail-into-log step now prints the new staged outputs:
* toc-diff.txt — what changed at the ISO level
* sqfs-ls-diff.txt — which inner files have different sizes/mtimes
* sqfs-diff.txt — diffoscope on the squashfs only
* squashfs-sha256.txt
* iso-header-cmp.txt — first-8KB cmp -l for header-level drift
* sizes.txt / sha256.txt / checklist.md as before
Should land us a focused list of "these N files inside the squashfs
have different bytes" — that's what we need to find what's leaking
non-determinism into the build.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Run #4272 hit the M1.1 reproducibility gate as designed — both builds
completed, ISOs differed (A=ff2e7444…, B=9ec7f3da…), diagnose-divergence
fired. Two things stopped that diagnostic from being useful:
1. **diffoscope wasn't available.** diagnose-divergence.sh runs in the
catthehacker job container, which has cmp but no diffoscope. The
silvermetal-builder image we built two minutes earlier *does* have
diffoscope-minimal (Dockerfile.builder line 109). Run the diagnostic
inside that image: docker run --volumes-from $self_cid + the digest
the builder-image job passed in via BUILDER_IMAGE. Mounts the same
/workspace path so REPO_ROOT-relative resolution in
diagnose-divergence.sh works unchanged.
2. **The artifact was unreachable.** actions/upload-artifact@v3 against
Gitea 1.25.2 reports "successfully uploaded" but the
/api/v1/repos/.../actions/runs/{id}/artifacts list comes back empty,
and every download path probed returns 404. Known v3 incompatibility
— v3 uses the legacy GitHub Services API endpoint that Gitea
doesn't expose for retrieval.
Workaround: tail the divergence content into the workflow log
directly, so it shows up in `gitea actions logs` regardless of
upload-artifact's behaviour. Specifically: sizes.txt, sha256.txt,
checklist.md, head -n 400 of diff.txt (or cmp.txt as fallback).
That's enough to see what's diverging without needing the artifact.
Upload-artifact step kept in place for whenever Gitea's API gets
sorted (fix-once-then-forget).
The self-discovery loop (docker ps + inspect filtering by
/workspace/SilverLABS/SilverMetal mount destination) is the same one
build.sh uses; concurrency: 1 in this workflow guarantees a single
match.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two coupled changes that unblock the M1.1 iter loop. Both belong in CI;
iter1-15 was wrong to require human-in-the-loop steps to make progress.
1. **CI now builds Dockerfile.builder.**
`.gitea/workflows/build-iso-linux.yaml` grows a `builder-image` job
that runs ahead of `build-and-verify`. It rebuilds the silvermetal-
builder image from `linux/build/docker/Dockerfile.builder`, pushes it
to `docker-registry.silverlabs.uk/silvermetal-builder:m1.1-<sha>` (and
`:latest`), reads the resulting digest off `docker inspect`, and
feeds it forward as a job output. `build-and-verify` consumes that
digest as the `BUILDER_IMAGE` env override that `build.sh` already
honours (and validates is digest-form on line ~37).
That kills the old workflow where every Dockerfile.builder change
required a human to `docker build` + `docker push` on 10.0.0.51 by
hand and then bump the digest in `build.sh` in lockstep. The crash
that triggered this (exit 126 mid-iter16 build run) was a symptom of
that off-CI step still existing.
Both jobs run on the existing `silvermetal-builder` runner; the host
docker daemon is shared via DooD and is already authenticated to
`docker-registry.silverlabs.uk` (linux/build/runner/docker-compose.yml
mounts `/root/.docker:/root/.docker:ro`), so no extra login step.
The hardcoded `BUILDER_IMAGE` digest in `build.sh` stays as the
local-developer / offline-rebuild fallback. Comments updated in
`build.sh`, `Dockerfile.builder`, and `linux/build/README.md` to
match the new flow.
2. **reprepro wrapper for the benign "No priority for X" case.**
Pinned derivative-maker's `2100_create-debian-packages` (with
--target iso) re-imports source packages from snapshot.debian.org
into a local apt repo via `reprepro --basedir … includedsc local
<foo>.dsc`. The local repo's `conf/distributions` ships no
`DscOverride` entries, so any source package whose `.dsc` lacks an
explicit Priority field trips:
No priority for 'X', skipping.
There have been errors!
…and reprepro exits 255. dm-reprepro-wrapper bubbles that up,
2100_create-debian-packages aborts. The current offender is
`virtualbox_*.dsc` (key import is now fine — debian-keyring landed in
commit 4aa59ba — but the priority field gap remains). VirtualBox is
not in SilverMetal's `--target iso` set, so the sane behaviour is
"log it, continue".
New `linux/build/docker/silvermetal-reprepro-wrap.sh` shadows
`/usr/bin/reprepro` at `/usr/local/bin/reprepro` (PATH precedence).
It runs the real reprepro, captures merged stdout+stderr, and:
- if rc != 0 AND every non-blank output line matches one of the
known-benign patterns ("No priority for 'X', skipping." plus the
trailing "There have been errors!"), emits the output, logs one
line of explanation to stderr, and exits 0;
- otherwise emits the output and propagates rc unchanged.
Any *other* reprepro error path stays fatal — only the specific
"No priority for X" pattern is neutralised. `dm-reprepro-wrapper`
resolves `reprepro` via `\$PATH` so it picks up the wrapper
transparently.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Vendors Kicksecure derivative-maker as a pinned submodule (18.1.7.4),
adds the wrapper + verify + diagnose scripts, the pinned builder image,
and the reproducibility-gated Gitea Actions workflow. Base flavour only —
no hardening overlay (that's M1.2).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>