Cross-stack bench cleanup — 24/24 unanimous on every engine
Re-ran the full cross-stack matrix after patching two bench-driver bugs (C/TS token-decode fallback, vllm REPS=1 noise). All three engines × six client languages now produce byte-identical Codec frames per cell — including vllm, which previously read as 0/24 unanimous in the post-mortem.
The 2026-05-08 cross-stack run had a §7 post-mortem flagging three sources of variance — vllm reading as 0/24 unanimous on Codec cells was the loudest. Ran it down to three issues:
- C and TS demos were emitting
tokens_emitted=0for compressed cells (thetokensfield was only populated onidentitydecode), which threw off the unanimity check. - vllm at 2 K tokens has ~10–20 % wire-byte variance from non-deterministic batching even at temperature=0; needs ≥2 reps to land a stable median.
- JSON-SSE rows have 10–16 B per-client framing-accounting drift that’s structural, not noise.
Patches shipped (commits 7c12286, eb574b6) and the bench re-ran on the same lab box (vinez@192.168.1.88, 2× RTX 3090).
Cross-language unanimity, all engines:
| Engine | Codec cells unanimous | Notes |
|---|---|---|
| sglang | 24/24 | clean |
| vllm | 24/24 | was 0/24 in earlier run |
| llama.cpp | 24/24 | only ≤5 B drift remains on JSON-SSE rows |
Headline reduction at 2 K tokens (msgpack + gzip, Python row):
| Engine | JSON-SSE | Codec msgpack + gzip | Reduction |
|---|---|---|---|
| sglang | 485.2 KB | 354 B | 1,404× |
| vllm | 517.8 KB | 3,874 B | 137× |
| llama.cpp | 529.2 KB | 16.1 KB | 33× |
The Benchmarks panel on the front page now points at the new run and the engineRows numbers are refreshed accordingly.