← What's new

Cross-stack bench cleanup — 24/24 unanimous on every engine

Re-ran the full cross-stack matrix after patching two bench-driver bugs (C/TS token-decode fallback, vllm REPS=1 noise). All three engines × six client languages now produce byte-identical Codec frames per cell — including vllm, which previously read as 0/24 unanimous in the post-mortem.

The 2026-05-08 cross-stack run had a §7 post-mortem flagging three sources of variance — vllm reading as 0/24 unanimous on Codec cells was the loudest. Ran it down to three issues:

  1. C and TS demos were emitting tokens_emitted=0 for compressed cells (the tokens field was only populated on identity decode), which threw off the unanimity check.
  2. vllm at 2 K tokens has ~10–20 % wire-byte variance from non-deterministic batching even at temperature=0; needs ≥2 reps to land a stable median.
  3. JSON-SSE rows have 10–16 B per-client framing-accounting drift that’s structural, not noise.

Patches shipped (commits 7c12286, eb574b6) and the bench re-ran on the same lab box (vinez@192.168.1.88, 2× RTX 3090).

Cross-language unanimity, all engines:

EngineCodec cells unanimousNotes
sglang24/24clean
vllm24/24was 0/24 in earlier run
llama.cpp24/24only ≤5 B drift remains on JSON-SSE rows

Headline reduction at 2 K tokens (msgpack + gzip, Python row):

EngineJSON-SSECodec msgpack + gzipReduction
sglang485.2 KB354 B1,404×
vllm517.8 KB3,874 B137×
llama.cpp529.2 KB16.1 KB33×

The Benchmarks panel on the front page now points at the new run and the engineRows numbers are refreshed accordingly.