axe

Local LLM agent runner. TOML-defined agents, CLI invocation, Ollama backend.

Goals

Run multi-step LLM agent workflows locally — task decomposition, diary retrieval, podcast scripting — without cloud API calls and without writing a custom LLM runner.

Effectiveness

Solid. Handles all current agents reliably once configured correctly. The TOML agent format is low-friction and the sub-agent composition model maps cleanly onto the workflows we actually run (decomposer spawns a researcher, podcaster delegates to pipeline, etc.).

What made it effective

Sub-agent composition with [sub_agents_config] — one agent can fan out to others without orchestration code
[retry] config with backoff = "fixed" handles transient tool-call errors within an agent without any client-side retry logic
--dry-run flag shows the resolved context without calling the LLM — invaluable for debugging prompt assembly

Bonus utility

The run_command tool inside agents lets them call shell commands mid-generation. inertia-decomposer uses this to retrieve diary context before decomposing — the agent fetches its own context rather than requiring the caller to pre-fetch it.

Friction / pain points / surprises

Default timeout is 120s — too short for any serious agent. inertia-decomposer (two 27b calls) needs --timeout 540; the podcaster agent (full pipeline run via run_command) needs --timeout 1800 or more. The TOML has no timeout field — it must be passed at every call site. Silent context deadline exceeded is the only indication something timed out.

Batch endpoint doesn't recover from ConnectionClosed. axe uses Ollama's non-streaming endpoint. Long generations on a loaded model drop the socket. The whole request fails with no retry. We absorb this via per-task fault isolation (failed tasks retry on the next nightly run), but streaming would eliminate the failure mode entirely.

50-turn conversation limit breaks long-running run_command agents. The podcaster agent's single job is to run bun src/pipeline.ts and print the result. The pipeline's verbose output causes the agent to exceed axe's 50-turn limit before the command finishes. Workaround: run the pipeline directly rather than wrapping it in an axe agent.

LLM agents split multi-part shell commands across separate tool calls, losing environment state. The podcaster system prompt included export NO_PROXY=... && bun src/pipeline.ts. The LLM would sometimes run source .secrets in one tool call and bun ... in a separate subprocess, so the NO_PROXY export never reached bun. Environment setup that must survive into child processes belongs in .secrets or an equivalent persistent env file, not in a system prompt command.

No --help flag. axe --help attempts to list agents and errors if the config directory is missing. Discovering available flags required reading dist/cli/args.js in the source tree.

Exit code 3 (transient provider error) has no built-in retry. axe exits 3 on Ollama connection drops or generation timeouts, but does not retry internally. Any caller that doesn't handle exit 3 explicitly propagates the failure immediately. Fix: wrap every runAxe call in a retry loop (3 attempts, 15s backoff). The [retry] TOML config applies to tool-call failures inside the agent conversation, not to provider-level transport failures.

context deadline exceeded from Ollama looks identical to an axe timeout. Both print the same string to stderr and exit 3. The distinction matters for diagnosis: one is Ollama's internal generation timeout (caused by a large prompt or loaded GPU), the other is axe's own request timeout. OTEL traces are the only reliable way to tell them apart.

axe's stderr is swallowed when the caller pipes stdout only. If you spawn axe with stdio: ["pipe", "pipe", "inherit"], stderr flows to the container's log but never reaches the OTEL span. Exit 3 errors appear in traces as "axe essay-outliner failed (exit 3)" with no underlying cause. Fix: stdio: ["pipe", "pipe", "pipe"], capture stderr, attach it to the span as axe.stderr, and include it in the thrown Error message. Doing this surfaced "unable to load model / CUDA out of memory" errors that had been invisible in the trace explorer.

Zod schemas validating axe output drift silently from agent TOML constraints. When essay-outliner.toml was updated to target 7–10 sections at 600–800 words, the Zod schema validating its output still had max(7) / max(550). axe exits 0, Zod throws, the checkpoint never advances, and the pipeline retries indefinitely with no progress. The agent TOML and the caller's validator are two halves of a contract with no shared source of truth.