Quickstart

90 seconds from "never seen Codec" to a streaming binary completion in your terminal.

This is the fastest path. Pick the language you’d like to write the client in — the server (sglang or vLLM) speaks Codec on the same /v1/completions endpoint it already serves; no special build.

Server prerequisites. You need an LLM server that speaks Codec on its completions endpoint. The fastest path is the pre-built codec-sglang Docker image — docker run --gpus all -p 8080:8080 wdunn001/codec-sglang:latest and you’re done. If you’d rather DIY, see vanilla sglang for cherry-picking the two PRs into your own build. vLLM support is in flight (PR #41765).

TypeScript / Node

npm install @codecai/web

import { loadMap, Detokenizer, decodeStream } from "@codecai/web";

const map = await loadMap({
  url:  "https://cdn.jsdelivr.net/gh/wdunn001/codec-maps/maps/qwen/qwen2.json",
  hash: "sha256:887311099cdc09e7022001a01fa1da396750d669b7ed2c242a000b9badd09791",
});

const resp = await fetch("http://localhost:8000/v1/completions", {
  method: "POST",
  headers: { "Content-Type": "application/json" },
  body: JSON.stringify({
    model: "Qwen/Qwen2.5-7B-Instruct",
    prompt: "Explain entropy in one paragraph.",
    stream_format: "msgpack",
    max_tokens: 256,
  }),
});

const detok = new Detokenizer(map);
for await (const frame of decodeStream(resp.body!, "msgpack")) {
  process.stdout.write(detok.render(frame.ids, { partial: !frame.done }));
}

Full walkthrough: TypeScript guide.

Python

pip install codecai

import asyncio, httpx
from codecai import Detokenizer, decode_msgpack_stream, load_map

async def main():
    m = await load_map(
        url="https://cdn.jsdelivr.net/gh/wdunn001/codec-maps/maps/qwen/qwen2.json",
        hash="sha256:887311099cdc09e7022001a01fa1da396750d669b7ed2c242a000b9badd09791",
    )
    detok = Detokenizer(m)
    async with httpx.AsyncClient() as client:
        async with client.stream(
            "POST", "http://localhost:8000/v1/completions",
            json={
                "model": "Qwen/Qwen2.5-7B-Instruct",
                "prompt": "Explain entropy in one paragraph.",
                "stream_format": "msgpack",
                "max_tokens": 256,
            },
        ) as resp:
            async for frame in decode_msgpack_stream(resp.aiter_raw()):
                print(detok.render(frame.ids, partial=not frame.done), end="", flush=True)

asyncio.run(main())

Full walkthrough: Python guide.

.NET

dotnet add package Codec.Net

using System.Net.Http.Json;
using Codec;

var map = await MapLoader.LoadAsync(new LoadOptions {
    Url  = "https://cdn.jsdelivr.net/gh/wdunn001/codec-maps/maps/qwen/qwen2.json",
    Hash = "sha256:887311099cdc09e7022001a01fa1da396750d669b7ed2c242a000b9badd09791",
});

using var http = new HttpClient();
using var req = new HttpRequestMessage(HttpMethod.Post, "http://localhost:8000/v1/completions") {
    Content = JsonContent.Create(new {
        model = "Qwen/Qwen2.5-7B-Instruct",
        prompt = "Explain entropy in one paragraph.",
        stream_format = "msgpack",
        max_tokens = 256,
    }),
};
using var resp = await http.SendAsync(req, HttpCompletionOption.ResponseHeadersRead);

var detok = new Detokenizer(map);
await using var body = await resp.Content.ReadAsStreamAsync();
await foreach (var frame in StreamDecoder.DecodeMsgpackStreamAsync(body)) {
    Console.Write(detok.Render(frame.Ids, new DetokenizeOptions { Partial = !frame.Done }));
}

Full walkthrough: .NET guide.

C

# CMakeLists.txt
include(FetchContent)
FetchContent_Declare(codec
  GIT_REPOSITORY https://github.com/wdunn001/Codec.git
  GIT_TAG        main
  SOURCE_SUBDIR  packages/c
)
FetchContent_MakeAvailable(codec)
target_link_libraries(your_app PRIVATE codec::codec)

#include <codec/codec.h>
/* See packages/c/examples/stream_decode.c for an end-to-end runnable program. */

Full walkthrough: C guide.

What you just did

In every language, the recipe is the same four steps:

Load a vocab map — tells your client which tokenizer the server’s IDs belong to. Maps are sha256-content-addressed and cached.
POST a completion request — identical to your normal /v1/completions call, with one extra field: stream_format: "msgpack" (or "protobuf").
Decode the binary stream — helper functions yield one CodecFrame per {ids, done, finish_reason}.
Detokenize at the edge — only when a human is going to read it. Internal hops keep the IDs.

That’s the whole API. The same four-step shape appears in every binding.