Java — codec
JDK 17+ binding. java.net.http.HttpClient, Iterator + Flow.Publisher streams, Jackson JSON, msgpack-java for frames. Maven-ready.
codec (Maven artifact io.github.wdunn001:codec) is the Java binding. Targets JDK 17+ so it has access to records, sealed types, and the modern java.net.http.HttpClient. Streaming is exposed both as Iterator<CodecFrame> (synchronous, the canonical shape) and as Flow.Publisher<CodecFrame> (reactive callers).
Install
<dependency>
<groupId>io.github.wdunn001</groupId>
<artifactId>codec</artifactId>
<version>0.1.0</version>
</dependency>
(Maven Central publish queued. Until then, mvn install from packages/java/ to a local repo.)
The library brings two transitive dependencies: com.fasterxml.jackson.core:jackson-databind for map JSON and org.msgpack:msgpack-core for frame parsing. Everything else is JDK built-in — sha256 via java.security.MessageDigest, regex with Pattern.UNICODE_CHARACTER_CLASS, HTTP via java.net.http.HttpClient.
The four-step shape
import ai.codec.*;
// 1. fetch + verify map → MapLoader.load(...)
// 2. send → HttpClient (JDK)
// 3. stream → frames → StreamDecoder.decodeMsgpackStream(...)
// 4. IDs → text → Detokenizer.render(...)
1. Load the vocab map
TokenizerMap map = MapLoader.load(new LoadOptions.Builder()
.url("https://cdn.jsdelivr.net/gh/wdunn001/codec-maps/maps/qwen/qwen2.json")
.hash("sha256:887311099cdc09e7022001a01fa1da396750d669b7ed2c242a000b9badd09791")
.build());
The hash is verified against the bytes on the wire before the JSON parses. Mismatch throws TokenizerMapHashMismatchException. For an async fetch, use MapLoader.loadAsync(...) — returns a CompletableFuture<TokenizerMap>.
2. Send a request
import java.net.URI;
import java.net.http.HttpClient;
import java.net.http.HttpRequest;
import java.net.http.HttpResponse;
import java.time.Duration;
HttpClient http = HttpClient.newBuilder()
.connectTimeout(Duration.ofSeconds(10))
.build();
String body = """
{"model":"Qwen/Qwen2.5-7B-Instruct",
"prompt":"Explain entropy in one paragraph.",
"stream_format":"msgpack",
"max_tokens":256}
""";
HttpRequest req = HttpRequest.newBuilder()
.uri(URI.create("http://localhost:8000/v1/completions"))
.header("Content-Type", "application/json")
.header("Accept-Encoding", "gzip")
.POST(HttpRequest.BodyPublishers.ofString(body))
.build();
HttpResponse<java.io.InputStream> resp = http.send(req,
HttpResponse.BodyHandlers.ofInputStream());
HttpResponse.BodyHandlers.ofInputStream() is the JDK equivalent of “don’t buffer the body” — exactly what streaming needs.
3. Decode the binary stream
import java.util.Iterator;
Iterator<CodecFrame> frames = StreamDecoder.decodeMsgpackStream(resp.body());
while (frames.hasNext()) {
CodecFrame frame = frames.next();
// frame.ids(): int[]
// frame.done(): boolean
// frame.finishReason(): String (nullable)
}
For reactive callers there’s a Flow.Publisher<CodecFrame> variant:
import java.util.concurrent.Flow;
Flow.Publisher<CodecFrame> publisher =
StreamDecoder.publishMsgpack(resp.body());
publisher.subscribe(mySubscriber);
decodeProtobufStream / publishProtobuf cover the length-prefixed protobuf wire mode.
4. Detokenize at the edge
Detokenizer detok = new Detokenizer(map);
while (frames.hasNext()) {
CodecFrame frame = frames.next();
String text = detok.render(frame.ids(),
new DetokenizeOptions(/* partial */ !frame.done(), /* renderSpecial */ false));
System.out.print(text);
}
Detokenizer is stateful — partial multi-byte UTF-8 sequences buffer across render() calls when partial=true. A 4-byte 🚀 split across two frames round-trips identically. Call detok.reset() between unrelated streams.
Encoding (sending IDs, not text)
BPETokenizer tok = new BPETokenizer(map);
int[] ids = tok.encode("System: be concise.\nUser: what's BPE?");
ObjectMapper json = new ObjectMapper();
String body = json.writeValueAsString(Map.of(
"model", "Qwen/Qwen2.5-7B-Instruct",
"prompt", ids, // int[] — the server reads as token IDs
"stream_format", "msgpack",
"max_tokens", 256
));
Greedy by merge priority — not left-to-right. Bit-identical to the upstream tokenizer across all reference bindings.
Watching for tool calls
ToolWatcher watcher = new ToolWatcher(map, "<tool_call>", "</tool_call>");
while (frames.hasNext()) {
CodecFrame frame = frames.next();
for (WatcherEvent ev : watcher.feed(frame.ids())) {
switch (ev.kind()) {
case PASSTHROUGH -> forward(ev.ids());
case CAPTURED -> {
int[] ids = toIntArray(ev.ids()); // long[] → int[]
String text = detok.render(ids, new DetokenizeOptions(false, false));
ToolCall call = parseToolCall(text);
dispatch(call.name(), call.args());
}
}
}
}
The watcher’s IDs are long[] because Java has no unsigned 32-bit primitive — long losslessly carries the full uint32 range. A single compare per token; no detokenization on the hot path. See Tool calling.
Translating across vocabularies
TokenizerMap qwen = MapLoader.load(new LoadOptions.Builder()
.url("https://cdn.jsdelivr.net/gh/wdunn001/codec-maps/maps/qwen/qwen2.json")
.hash("sha256:887311099cdc09e7022001a01fa1da396750d669b7ed2c242a000b9badd09791")
.build());
TokenizerMap llama = MapLoader.load(new LoadOptions.Builder()
.url("https://cdn.jsdelivr.net/gh/wdunn001/codec-maps/maps/meta-llama/llama-3.json")
.hash("sha256:79b707aea8c2b41c2883ec7913b0c4a0c880044ac844d89a9a03e779eb92db04")
.build());
Translator tr = new Translator(qwen, llama);
while (frames.hasNext()) {
CodecFrame frame = frames.next();
int[] llamaIds = tr.translate(frame.ids(), !frame.done());
forwardToLlama(llamaIds);
}
See Translator.
Spring / reactive integration
For a Spring WebFlux endpoint that proxies a Codec stream out as SSE:
@PostMapping(value = "/chat", produces = MediaType.TEXT_EVENT_STREAM_VALUE)
public Flux<String> chat(@RequestBody ChatRequest req) {
Detokenizer detok = new Detokenizer(map);
return Flux.from(StreamDecoder.publishMsgpack(openCodecStream(req)))
.map(frame -> detok.render(frame.ids(),
new DetokenizeOptions(!frame.done(), false)));
}
Flow.Publisher adapts to Reactor Flux via Flux.from(...). RxJava and Mutiny work the same way through their respective Publisher adapters.
Production checklist
- Pin the map hash in
LoadOptions. HttpResponse.BodyHandlers.ofInputStream()— anything else buffers the whole response.- Reuse
HttpClientandDetokenizer. Both are designed for reuse; newDetokenizerinstances mid-stream drop the UTF-8 buffer. Callreset()at stream boundaries instead. - Maven Central is the target — until publish, install locally with
mvn installfrompackages/java/.
See also
- Tool calling
- Translator
- Maven Central listing (publish queued)
- packages/java/ on GitHub