.NET — Codec.Net

ASP.NET / console-friendly binding. .NET 8+, IAsyncEnumerable streams, full nullability, AOT-friendly.

Codec.Net is the .NET binding. It targets .NET 8 and newer, ships full nullable annotations, and the streaming API is IAsyncEnumerable<CodecFrame>. Everything is await foreach-shaped.

Install

dotnet add package Codec.Net

The four-step shape

using Codec;

// 1. fetch + verify the vocab map
// 2. (and 4.) IDs → text via Detokenizer
// 3. Stream → IAsyncEnumerable<CodecFrame> via StreamDecoder

1. Load the vocab map

var map = await MapLoader.LoadAsync(new LoadOptions {
    Url  = "https://cdn.jsdelivr.net/gh/wdunn001/codec-maps/maps/qwen/qwen2.json",
    Hash = "sha256:887311099cdc09e7022001a01fa1da396750d669b7ed2c242a000b9badd09791",
});

MapLoader.LoadAsync fetches, verifies the hash, and caches by hash for future calls.

2. Send a request

using System.Net.Http;
using System.Net.Http.Json;

using var http = new HttpClient {
    DefaultRequestHeaders = { { "Accept-Encoding", "gzip" } },
};

using var req = new HttpRequestMessage(HttpMethod.Post, "http://localhost:8000/v1/completions") {
    Content = JsonContent.Create(new {
        model = "Qwen/Qwen2.5-7B-Instruct",
        prompt = "Explain entropy in one paragraph.",
        stream_format = "msgpack",
        max_tokens = 256,
    }),
};

using var resp = await http.SendAsync(req, HttpCompletionOption.ResponseHeadersRead);
resp.EnsureSuccessStatusCode();

HttpCompletionOption.ResponseHeadersRead is the .NET equivalent of “don’t buffer the body” — without it, HttpClient will read the entire response into memory before handing it to you.

3. Decode the binary stream

await using var body = await resp.Content.ReadAsStreamAsync();

await foreach (var frame in StreamDecoder.DecodeMsgpackStreamAsync(body)) {
    // frame.Ids: ReadOnlyMemory<uint>
    // frame.Done: bool
    // frame.FinishReason: string?
}

Use DecodeProtobufStreamAsync if your server is configured for protobuf.

4. Detokenize at the edge

var detok = new Detokenizer(map);

await foreach (var frame in StreamDecoder.DecodeMsgpackStreamAsync(body)) {
    var text = detok.Render(frame.Ids, new DetokenizeOptions { Partial = !frame.Done });
    Console.Write(text);
}

Detokenizer is stateful — the same instance persists UTF-8 buffer state across calls so split sequences render correctly. Set Partial = true while the stream is open.

Encoding (sending IDs, not text)

var tok = new BPETokenizer(map);
int[] ids = tok.Encode("System: be concise.\nUser: what's BPE?");

using var req = new HttpRequestMessage(HttpMethod.Post, "http://localhost:8000/v1/completions") {
    Content = JsonContent.Create(new {
        model = "Qwen/Qwen2.5-7B-Instruct",
        prompt = ids,            // int[] — server reads as token IDs
        stream_format = "msgpack",
        max_tokens = 256,
    }),
};

Watching for tool calls

var watcher = new ToolWatcher(map, "<tool_call>", "</tool_call>");

await foreach (var frame in StreamDecoder.DecodeMsgpackStreamAsync(body)) {
    foreach (var ev in watcher.Feed(frame.Ids)) {
        switch (ev.Kind) {
            case WatcherEventKind.Passthrough:
                Forward(ev.Ids);
                break;
            case WatcherEventKind.Captured:
                var text = detok.Render(ev.Ids);
                var (tool, args) = ParseToolCall(text);
                await Dispatch(tool, args);
                break;
        }
    }
}

Single uint compare per token, no detokenization on the hot path. See Tool calling.

Translating across vocabularies

var qwen  = await MapLoader.LoadAsync(new LoadOptions {
    Url  = "https://cdn.jsdelivr.net/gh/wdunn001/codec-maps/maps/qwen/qwen2.json",
    Hash = "sha256:887311099cdc09e7022001a01fa1da396750d669b7ed2c242a000b9badd09791",
});
var llama = await MapLoader.LoadAsync(new LoadOptions {
    Url  = "https://cdn.jsdelivr.net/gh/wdunn001/codec-maps/maps/meta-llama/llama-3.json",
    Hash = "sha256:79b707aea8c2b41c2883ec7913b0c4a0c880044ac844d89a9a03e779eb92db04",
});

var tr = new Translator(qwen, llama);

await foreach (var frame in StreamDecoder.DecodeMsgpackStreamAsync(qwenBody)) {
    var llamaIds = tr.Translate(frame.Ids, partial: !frame.Done);
    ForwardToLlamaAgent(llamaIds);
}

See Translator.

ASP.NET integration

A typical pattern: an ASP.NET endpoint proxies a Codec stream from an internal model server out to a browser as SSE for a chat UI — doing the detokenization at the edge.

app.MapPost("/chat", async (HttpContext ctx, ChatRequest req, IHttpClientFactory http) => {
    ctx.Response.ContentType = "text/event-stream";
    var detok = new Detokenizer(_map);

    using var inner = http.CreateClient();
    using var inboundReq = new HttpRequestMessage(HttpMethod.Post, _modelServer + "/v1/completions") {
        Content = JsonContent.Create(new {
            model = req.Model, prompt = req.Prompt,
            stream_format = "msgpack", max_tokens = req.MaxTokens,
        }),
    };
    using var inbound = await inner.SendAsync(inboundReq, HttpCompletionOption.ResponseHeadersRead);
    await using var inBody = await inbound.Content.ReadAsStreamAsync();

    await foreach (var frame in StreamDecoder.DecodeMsgpackStreamAsync(inBody)) {
        var text = detok.Render(frame.Ids, new DetokenizeOptions { Partial = !frame.Done });
        await ctx.Response.WriteAsync($"data: {JsonSerializer.Serialize(new { text, done = frame.Done })}\n\n");
        await ctx.Response.Body.FlushAsync();
    }
});

Production checklist

Pin the map hash.
HttpCompletionOption.ResponseHeadersRead — without it, HttpClient buffers the whole response. Bypasses streaming entirely.
Reuse HttpClient and Detokenizer — both are designed for reuse. New Detokenizer per stream isn’t required, but its buffer should be reset at stream boundaries (call Reset() between streams when reusing).
Cancellation tokens. All async APIs accept CancellationToken; pass yours through so timeouts and disposal propagate cleanly.