OpenTelemetry

Distributed tracing and observability instrumentation.

Goals

Instrument every program in the workspace with traces so we can observe what actually happens during long-running pipeline runs — which tasks are processed, how long each axe call takes, where failures cluster. Established as a workspace-wide policy 2026-03-25.

Effectiveness

Live as of 2026-03-26. Both the podcast pipeline and Inertia Mill now ship spans to otel-explorer.pages.dev — a self-hosted Cloudflare Pages + D1 OTLP collector deployed from the FOSS otel-explorer project. The instrumented spans cover the major operations in each program: per-axe-agent calls, retrieve/HyDE, the full run loop, and per-task decomposition. The first real evaluation of data usefulness is pending (the nightly cron hasn't run with instrumentation live yet).

What made it effective

The SDK design is clean once you've accepted the package count. tracer.startActiveSpan() composes naturally with async/await and try/finally for reliable span.end(). The pattern is minimal to learn and hard to misuse.
The exporter is fire-and-forget: failures are swallowed silently, so instrumentation doesn't break the programs it observes. This was important for podcast reliability.
Cloudflare Pages + D1 as a zero-ops OTLP collector target is effective: no server to run, no ingress to configure. D1 stores the spans; Workers serve the UI.

Bonus utility

The self-hosted collector (otel-explorer.pages.dev) doubles as a deployment target for any program in the workspace — one endpoint, one token, one UI to inspect all services.

Friction / pain points / surprises

Six packages for one concern. The OpenTelemetry JavaScript SDK requires @opentelemetry/api, sdk-trace-node, sdk-trace-base, exporter-trace-otlp-http, resources, and semantic-conventions as separate packages. The split is principled (API vs. SDK vs. exporter) but adds package management surface area.

The exporter's auth story requires manual wiring. The OTLPTraceExporter constructor accepts a headers option, but there's no standard environment variable for it. OTEL_EXPORTER_OTLP_ENDPOINT is recognized, but the auth token has to be wired by hand. We use OTEL_INGEST_TOKEN (a custom var) and inject it as Authorization: Bearer. Not complex but not automatic.

Inertia Mill's binary build bundles OTEL. Because the mill compiles to a single bun build --compile binary, all OTEL packages are baked in at build time. Any update to the collector URL or token requires a rebuild and redeploy. For the podcast pipeline (run from source), this isn't an issue.

Subprocess stderr must be explicitly piped to appear in spans. OTEL spans only contain what you explicitly attach. A child process (e.g. axe) writing to stderr is invisible to the tracer unless the caller captures it with stdio: "pipe" and attaches it as a span attribute. The common pattern of stdio: "inherit" for subprocess debugging is directly at odds with observability: what goes to the terminal stays out of the trace. The discipline is: always pipe, always attach, always include in the thrown Error. Otherwise "exit 3" is the only signal in the trace for what might be a VRAM exhaustion or a missing model blob.