v0.5.0 — efficiency, observability, and cohort honesty
Wire-additive over v0.4 (v0.4 → v0.5 happy-path bytes identical). Four new opt-in surfaces — delta-varint stream encoding, discoverable Zstandard dictionaries, GPU-side latent quantize, bolt-on tool dispatcher. 11 client artifacts bumped to 0.5.0 across npm, PyPI, NuGet, crates.io, Maven Central. Engine cohort cut to sglang + vLLM + llama.cpp + ComfyUI + diffusers (TGI dropped). 72/72 wire + 72/72 decode unanimous on the cross-stack matrix; numbers byte-identical to v0.4.1, confirming the wire-additive invariant. Upstream PRs filed at sgl-project/sglang#25544 and vllm-project/vllm#42896, both DCO-signed and through bot review.
v0.5.0 ships four new wire surfaces — all opt-in — without changing the v0.4 happy path. Every existing v0.4 client decodes a v0.5 server byte-for-byte unless it explicitly negotiates a new surface via stream_format, Accept-Encoding, or a new env var.
Four new opt-in wire surfaces
Delta-varint stream encoding. New stream_format values "msgpack-delta" and "protobuf-delta". Frames carry base_id plus zigzag-encoded deltas against the prior frame’s last identifier; stateless framing preserved. ~10–15% wire reduction pre-zstd, ~3–5% post-zstd. Python reference impl; engine-side emit pending in v0.5.x.
Discoverable Zstandard dictionaries. Engines now publish their pre-trained dicts at <origin>/.well-known/codec/dicts/<sha256>.zstd. Hash-pinned: the client MUST verify the bytes hash to the URL component. Closes the v0.4.1 silent-COPY-dicts-drop regression class — dictionary drift now fails loudly (404 or hash-mismatch) instead of falling back silently to identity bytes. Release-checklist §1.7 codifies a four-sub-gate audit; the v0.5 cut actually caught a llama.cpp regression where master was vanilla upstream without the codec patches and the engine was silently serving identity-encoded msgpack.
GPU-side latent quantize fast path. LatentStreamEncoderOptions.gpu_quantize=True accepts a CUDA torch.Tensor, quantizes on-device, and transfers the int4/int8 result instead of the fp16 latent. ~75% PCIe reduction on int4 SDXL; smaller wins at SD-1.5.
Bolt-on tool dispatcher. The engine can dispatch directly to tools published via the @codecai/tool-kit manifest, without ever detokenizing the model’s <tool_call> region. Manifest schema + _codec_meta envelope let a tool author publish pre-tokenized IDs that flow into and out of the engine’s generation context.
11 packages at 0.5.0
- npm:
@codecai/{web, web-safety, web-llm, maps-cli, mcp-leaf, tool-kit, wire-compress} - PyPI:
codecai - NuGet:
Codec.Net - crates.io:
codec-rs - Maven Central:
ai.codec:codec
New cross-cohort surfaces: content-aware + per-stack-aware compression picker rewrite with a typed PickReasonCode enum, policies-enumerate subcommand on @codecai/maps-cli (resolves v0.4-OQ4), @codecai/tool-kit promoted to first-class family member with a runnable reference tool (@codecai/codec-time-tool).
Engine cohort
wdunn001/codec-{sglang,vllm,llamacpp,comfyui,diffusers}:v0.5.0 and :latest live on Docker Hub. Each image bakes the canonical zstd dicts at /opt/codec/dicts/, ships the /opt/codec/check-dict-availability.sh probe, and is dep-verified for import brotli, zstandard, msgpack before push.
Upstream PRs filed at sgl-project/sglang#25544 and vllm-project/vllm#42896. Both DCO-signed; both through five gemini-code-assist bot review-fix iterations (struct.unpack bytes path, hardened _decode_varint shift-cap, async dispatch, cached registry, manifest dict-shape guard).
wdunn001/codec-tgi is dropped — TGI treated as a dead project; the cohort is now five engines.
Bench: byte-identical to v0.4.1
The §1 + §1b numbers are unchanged from v0.4.1 — which is exactly what wire-additive is supposed to mean. The §1.7 and §1.9 gates added in this release exist to guarantee that, not change it.
§1b engine-output @ 2K tokens, Codec msgpack + dict-zstd:
| Engine | JSON-SSE | Best Codec | Reduction |
|---|---|---|---|
| llama.cpp | 528.8 KB | 140 B | 3,868× |
| sglang | 485.2 KB | 291 B | 1,707× |
| vllm | 517.8 KB | 3.9 KB | 137× |
§2 cross-language interop: 72/72 wire-unanimous + 72/72 decode-unanimous across three engines and six client languages. vllm required REPS=4 to median out its documented ~10–20% scheduler variance at T=0; ran clean on the second pass.
IETF Internet-Draft
draft-dunn-codec-00 rewritten to RFC 2026 compliance. Required sections present, kramdown-rfc compatible frontmatter, threat model expanded with five inline Codec-specific threats (binary-WAF blindness, capability-trust, discovery cache poisoning, frame-size + varint exhaustion, sentinel-identifier integrity), explicit out-of-specification behaviour table, liberal/conservative acceptance rules, implementation-experience section. Companion SUBMITTING.md walkthrough covers the kdrfc → datatracker submission flow.
Migration
v0.4.1 → v0.5.0 is non-breaking. Bump the package version; nothing else changes for existing v0.4 consumers. To opt into new surfaces, set the appropriate env var or request field — see the CHANGELOG entry for the per-surface opt-in matrix.