codec-comfyui (Docker)

ComfyUI image-generation server with the Codec v0.3 latent transport patch. Streams VAE latents on the wire instead of decoded pixels — 48× smaller, decoder runs at the leaf.

codec-comfyui is a pre-built Docker image of ComfyUI with the Codec v0.3 latent transport patch applied. Stand it up like any image-gen server, point any Codec-aware client at it, and image generations ship as VAE latents instead of decoded pixels — same physics as text-token streams in codec-sglang / codec-vllm, but for diffusion.

Why latents and not pixels: a 512×512 RGB frame at fp16 is ~1.5 MB; the SD-1 latent that produced it is 4×64×64 fp16 = 32 KB, a 48× reduction. With per-channel int8 quantization on top, the wire weight collapses further. The client does vae_decode locally and never re-encodes, so the round-trip pixel quality is bounded by the published per-pipeline LPIPS thresholds (see spec/PIPELINES.md).

This image is built from the wdunn001/ComfyUI fork at branch feat/codec-latent-transport. The fork is the canonical surface — ComfyUI’s plugin/custom-node architecture would let us ship the codec endpoints as a custom node, but the latent-frame emitter and zstd-dict overlay touch enough of the request loop that maintaining a downstream fork is cleaner.

Quick start

Default boot loads stabilityai/sd-vae-ft-mse (SD-1 VAE) and serves it.

docker run -d --gpus all \
  -p 8080:8080 \
  -v codec-models:/models \
  --shm-size 8g \
  wdunn001/codec-comfyui:latest

# Codec wire format — msgpack frames of LatentStreamHeader + LatentFrame
curl http://localhost:8080/v1/images/generations \
  -H "Content-Type: application/json" \
  -H "Accept: application/x-codec-msgpack" \
  -H "Accept-Encoding: zstd" \
  -d '{
    "model": "sd1.5",
    "prompt": "a wide-angle photograph of a snowy mountain at dusk",
    "stream_format": "msgpack",
    "modality":      "image-latents",
    "latent_space":  "stabilityai/sd-vae-ft-mse",
    "pipeline":      "int8",
    "size": "512x512", "steps": 30, "seed": 42
  }'

The response carries:

Content-Encoding: zstd (when a per-pipeline zstd dict is loaded)
Codec-Latent-Map: sha256:… — the latent-space map document hash so the client can fail-fast if it doesn’t have a matching map loaded
Codec-Zstd-Dict: sha256:… — the active dict identifier

Body is one LatentStreamHeader followed by one LatentFrame (image) or N LatentFrames (video).

Pipelines

codec-comfyui advertises the seven Codec v0.3 pipelines documented in spec/PIPELINES.md:

Pipeline	Wire shape	Reduction vs `raw`	Use case
`raw`	Pack tensor in row-major order	1×	Bit-exact baseline
`int8`	Per-channel symmetric int8	2× over fp16	Default for SD-family images
`int4`	Per-channel symmetric int4 (packed)	4× over fp16	Aggressive lossy mode
`int8-adaptive`	int8 with per-keyframe scales	~2×	Heterogeneous frames
`int4-adaptive`	int4 with per-keyframe scales	~4×	Same use case, more lossy
`delta+int8`	int8 residual against prior keyframe	2× + temporal collapse	Video only
`delta+int4`	int4 residual against prior keyframe	4× + temporal collapse	Video, most aggressive

Adding a pipeline is an additive v0.3+ point release — the registry is normative, not extensible per-deployment.

Pointing a Codec client at it

Any @codecai/web client (v0.4+) speaks the latent wire shape via LatentStreamDecoder:

import {
  decodeLatentHeaderMsgpack,
  decodeLatentFrameMsgpack,
  LatentStreamDecoder,
} from "@codecai/web";

const resp = await fetch("http://localhost:8080/v1/images/generations", {
  method: "POST",
  headers: {
    "Content-Type": "application/json",
    "Accept": "application/x-codec-msgpack",
    "Accept-Encoding": "zstd",
  },
  body: JSON.stringify({ /* …request as above… */ }),
});

// Frames stream length-prefixed; iterate them as Uint8Array chunks
// (see decodeMsgpackStream for the streaming helper).
const [headerBytes, ...frameChunks] = /* …split per the stream protocol… */;
const header = decodeLatentHeaderMsgpack(headerBytes);
const decoder = new LatentStreamDecoder(header);

for (const chunk of frameChunks) {
  const frame = decodeLatentFrameMsgpack(chunk);
  const latent = decoder.decodeFrame(frame); // Float32Array, channel-first
  // Hand `latent` to a browser-side VAE (WebGPU / ONNX-Web / etc.)
}

The Python (codecai) and the polyglot clients (rust / java / dotnet / c) carry the same parser surface — a single tokenizer-map and latent-space-map registry; one wire shape; six languages.

When to use this

Use codec-comfyui when you want browser- or edge-side VAE decoding, when you’re streaming frames into a downstream vision model that accepts latents directly, or when bandwidth is the bottleneck.
Use upstream ComfyUI when you need the full ComfyUI workflow surface (custom nodes, queue management, the visual graph editor) and pixel output is fine.

The Codec patch is fully backwards-compatible per request — JSON-SSE clients see exactly the upstream behaviour.

Source & links

Image: wdunn001/codec-comfyui:latest on Docker Hub.
Codec patch source: github.com/wdunn001/ComfyUI.
Image build recipe: github.com/wdunn001/codec-supervisor/blob/main/Dockerfile.comfyui.
v0.3 spec section: Codec PROTOCOL.md § Latent Modality.
Pipeline math: Codec PIPELINES.md.

Quick start

Pipelines

Pointing a Codec client at it

When to use this

Source & links

See also