lootbox

Code Mode for LLMs — agents write TypeScript instead of calling tool schemas

jx-codes/lootbox · TypeScript/Deno · v0.0.54

Evaluated 2026-04-15. Tested against the zombie CYOA problem (axe agent stuck in a 36-turn loop). Full experiment at /workspace/group/projects/cyoa-eval/.

What it is

Lootbox implements the "Code Mode" pattern from Cloudflare's blog: instead of an LLM choosing from a menu of tool schemas, it writes TypeScript that calls those tools as functions. The argument is that LLMs are better trained on real TypeScript than on contrived tool-invocation syntax, so code generation outperforms tool selection for multi-step tasks.

The server exposes tool namespaces (KV, SQLite, knowledge graphs, GraphQL, filesystem) as importable objects. A deno-exec-style client sends TypeScript to a WebSocket, the server runs it in a Deno sandbox, and returns stdout.

Tagline: "Code mode doesn't replace MCP — it orchestrates it."

Setup experience

Building from source requires Deno 2.x. The compile task (deno task compile) works cleanly once you run deno install in both the root and ui/ directories. Binary size is 172MB (self-contained Deno runtime). Install to ~/.local/bin/lootbox.

A WebSocket ping/pong bug blocks lootbox exec entirely. Hono's Deno WebSocket server sends ping frames; the compiled Deno client binary doesn't respond, which causes the server to close the connection with "No response from ping frame." This affects both the compiled binary and running from source. All lootbox exec invocations fail with Execution failed: WebSocket error: [object ErrorEvent].

Workaround: skip lootbox's client and implement a lean deno-exec wrapper directly:

#!/usr/bin/env bash
# deno-exec — lean code sandbox without lootbox's WebSocket layer
echo "$1" | deno run --allow-net --allow-read=/workspace/group --no-prompt - 2>&1

This gives you the same Code Mode semantics — agent writes TypeScript, TypeScript executes, stdout comes back — without the Deno/Hono incompatibility.

The CYOA experiment

Problem: cyoa-player-qwen (qwen3.5:latest) stuck on "Call Out" page for 36 of 40 turns in the zombie story. Root cause: the eval loop declared a triedChoices map but never used it; the agent had no memory of what it had tried.

Baseline fix (MODE=baseline): Pass tried-choice history in the prompt; filter choices before sending to agent. Result: 17 turns, 6 unique pages, outcome=all_tried. No zombie. Agent correctly avoided repeating choices.

Code Mode (MODE=codemode): Agent writes TypeScript that filters tried choices and picks the best available. Result: 14 turns, 6 unique pages, outcome=all_tried. Marginally faster (14 vs 17 turns). The TypeScript the agent generated was clean — it built a Set of tried indices, filtered, and preferred forward-moving choices.

Finding: For simple per-turn decision logic, Code Mode and a history-aware prompt perform equivalently. Code Mode was slightly more efficient, possibly because the agent's TypeScript reasoning is more explicit than its natural-language reasoning. The real constraint was the story graph: all paths from the 6-page opening cluster loop back into each other regardless of decision quality.

The case for Code Mode strengthens when:

Multiple API calls must happen in one turn (Code Mode: one script; tool calling: N sequential calls)
State must persist across calls within a turn (the script maintains variables)
The task is inherently algorithmic (backtracking, retries, conditionals)

For single-choice selection from a fixed list, it's equivalent overhead.

MCP integration

Lootbox v1 exposed typed TypeScript tool wrappers; the current version bridges to MCP servers. An agent can write await tools.antfly.search({...}) and lootbox routes that to the Antfly MCP server — no tool schema negotiation, just a function call. This is the real value: MCP becomes a library import rather than a structured protocol.

For our workspace: Antfly has a native MCP server at http://host.docker.internal:8080/mcp/v1/. If lootbox's WebSocket issue is fixed, wiring it as a namespace would let axe agents query personal knowledge bases in TypeScript without axe needing explicit MCP support.

What's fixable

WebSocket ping/pong — the blocking issue. The Deno WS client should auto-respond to pings at the protocol level but apparently doesn't in the compiled binary. Fix: add explicit ping handler in exec.ts or configure Hono to disable server-side pings.
Binary size — 172MB for what's essentially a Deno WebSocket client is heavy. The deno-exec wrapper achieves 95% of the value in 20 lines of bash.
Server dependency — all execution flows through the WebSocket server. A standalone mode (no server required, just Deno sandbox) would make lootbox composable as a CLI tool without infrastructure.

Verdict

The idea is right, the implementation has a blocking bug. Code Mode as a pattern is real — the CYOA experiment confirms the TypeScript-generation path works and is at least as good as schema-based tool calling for simple decisions. The architectural insight (MCP as library, not protocol) is worth taking seriously.

But the lootbox exec WebSocket failure means you can't use the product as shipped. The deno-exec wrapper delivers the same result without the overhead.

Jesus (the developer) is actively maintaining lootbox and has related work (codemode-mcp) that's more popular (116 ⭐). If you need Code Mode today, patch it yourself or use the bare Deno wrapper. Watch the repo for a fix.

Aspect	Rating	Notes
Core concept (Code Mode)	★★★★☆	Genuinely useful for multi-step orchestration
Setup / build	★★★☆☆	Works, but double `deno install` needed
`lootbox exec` (WebSocket)	★☆☆☆☆	Blocked by ping/pong bug
`deno-exec` workaround	★★★★☆	20 lines, works, no server needed
MCP bridge concept	★★★★☆	Right architecture; untestable until WS fixed
Binary size	★★☆☆☆	172MB for a WS client is too heavy

Overall: 3/5 — right idea, not production-ready as a binary. Use the deno-exec pattern directly until the WebSocket bug is resolved.