codec-comfyui (Docker)
ComfyUI image-generation server with the Codec v0.3 latent transport patch. Streams VAE latents on the wire instead of decoded pixels — 48× smaller, decoder runs at the leaf.
codec-comfyui is a pre-built Docker image of ComfyUI with the Codec v0.3 latent transport patch applied. Stand it up like any image-gen server, point any Codec-aware client at it, and image generations ship as VAE latents instead of decoded pixels — same physics as text-token streams in codec-sglang / codec-vllm, but for diffusion.
Why latents and not pixels: a 512×512 RGB frame at fp16 is ~1.5 MB; the SD-1 latent that produced it is 4×64×64 fp16 = 32 KB, a 48× reduction. With per-channel int8 quantization on top, the wire weight collapses further. The client does vae_decode locally and never re-encodes, so the round-trip pixel quality is bounded by the published per-pipeline LPIPS thresholds (see spec/PIPELINES.md).
This image is built from the wdunn001/ComfyUI fork at branch feat/codec-latent-transport. The fork is the canonical surface — ComfyUI’s plugin/custom-node architecture would let us ship the codec endpoints as a custom node, but the latent-frame emitter and zstd-dict overlay touch enough of the request loop that maintaining a downstream fork is cleaner.
Quick start
Default boot loads stabilityai/sd-vae-ft-mse (SD-1 VAE) and serves it.
docker run -d --gpus all \
-p 8080:8080 \
-v codec-models:/models \
--shm-size 8g \
wdunn001/codec-comfyui:latest
# Codec wire format — msgpack frames of LatentStreamHeader + LatentFrame
curl http://localhost:8080/v1/images/generations \
-H "Content-Type: application/json" \
-H "Accept: application/x-codec-msgpack" \
-H "Accept-Encoding: zstd" \
-d '{
"model": "sd1.5",
"prompt": "a wide-angle photograph of a snowy mountain at dusk",
"stream_format": "msgpack",
"modality": "image-latents",
"latent_space": "stabilityai/sd-vae-ft-mse",
"pipeline": "int8",
"size": "512x512", "steps": 30, "seed": 42
}'
The response carries:
Content-Encoding: zstd(when a per-pipeline zstd dict is loaded)Codec-Latent-Map: sha256:…— the latent-space map document hash so the client can fail-fast if it doesn’t have a matching map loadedCodec-Zstd-Dict: sha256:…— the active dict identifier
Body is one LatentStreamHeader followed by one LatentFrame (image) or N LatentFrames (video).
Pipelines
codec-comfyui advertises the seven Codec v0.3 pipelines documented in spec/PIPELINES.md:
| Pipeline | Wire shape | Reduction vs raw | Use case |
|---|---|---|---|
raw | Pack tensor in row-major order | 1× | Bit-exact baseline |
int8 | Per-channel symmetric int8 | 2× over fp16 | Default for SD-family images |
int4 | Per-channel symmetric int4 (packed) | 4× over fp16 | Aggressive lossy mode |
int8-adaptive | int8 with per-keyframe scales | ~2× | Heterogeneous frames |
int4-adaptive | int4 with per-keyframe scales | ~4× | Same use case, more lossy |
delta+int8 | int8 residual against prior keyframe | 2× + temporal collapse | Video only |
delta+int4 | int4 residual against prior keyframe | 4× + temporal collapse | Video, most aggressive |
Adding a pipeline is an additive v0.3+ point release — the registry is normative, not extensible per-deployment.
Pointing a Codec client at it
Any @codecai/web client (v0.4+) speaks the latent wire shape via LatentStreamDecoder:
import {
decodeLatentHeaderMsgpack,
decodeLatentFrameMsgpack,
LatentStreamDecoder,
} from "@codecai/web";
const resp = await fetch("http://localhost:8080/v1/images/generations", {
method: "POST",
headers: {
"Content-Type": "application/json",
"Accept": "application/x-codec-msgpack",
"Accept-Encoding": "zstd",
},
body: JSON.stringify({ /* …request as above… */ }),
});
// Frames stream length-prefixed; iterate them as Uint8Array chunks
// (see decodeMsgpackStream for the streaming helper).
const [headerBytes, ...frameChunks] = /* …split per the stream protocol… */;
const header = decodeLatentHeaderMsgpack(headerBytes);
const decoder = new LatentStreamDecoder(header);
for (const chunk of frameChunks) {
const frame = decodeLatentFrameMsgpack(chunk);
const latent = decoder.decodeFrame(frame); // Float32Array, channel-first
// Hand `latent` to a browser-side VAE (WebGPU / ONNX-Web / etc.)
}
The Python (codecai) and the polyglot clients (rust / java / dotnet / c) carry the same parser surface — a single tokenizer-map and latent-space-map registry; one wire shape; six languages.
When to use this
- Use
codec-comfyuiwhen you want browser- or edge-side VAE decoding, when you’re streaming frames into a downstream vision model that accepts latents directly, or when bandwidth is the bottleneck. - Use upstream ComfyUI when you need the full ComfyUI workflow surface (custom nodes, queue management, the visual graph editor) and pixel output is fine.
The Codec patch is fully backwards-compatible per request — JSON-SSE clients see exactly the upstream behaviour.
Source & links
- Image:
wdunn001/codec-comfyui:lateston Docker Hub. - Codec patch source: github.com/wdunn001/ComfyUI.
- Image build recipe: github.com/wdunn001/codec-supervisor/blob/main/Dockerfile.comfyui.
- v0.3 spec section: Codec PROTOCOL.md § Latent Modality.
- Pipeline math: Codec PIPELINES.md.
See also
- codec-diffusers — sister image, also a v0.3 latent server. Doubles as the bench/golden perceptual reference.
- codec-metamcp — gateway in front of latent servers + tool servers.
- Protocol overview — the wire format spec the framing in this image speaks.