Podcast Pipeline

Automated essay-to-audio pipeline. Fetches bookmarks from Raindrop.io, synthesises a long-form essay via Ollama/qwen, generates TTS audio with KittenTTS, and publishes to Cloudflare Pages + R2.

Goals

Produce a weekly podcast episode from saved articles without manual writing or recording — outline generation, section drafting, aside injection, intro/outro, TTS, upload, deploy, all in one bun src/pipeline.ts run.

Effectiveness

Works when it runs. The essay quality is good; the TTS is passable; the OTEL tracing via otel-explorer gives a clear view of where time is spent and where failures occur. The pipeline has shipped multiple episodes.

The gap is reliability: the first containerised run required eight separate failure/fix/relaunch cycles before a clean publish. The pipeline now runs but has not yet run clean end-to-end without intervention.

After-Action Report — 2026-04-04 to 2026-04-05

Eight launches. One clean publish. Here is what broke and why.

1. `npx` not available in the Docker container

What happened: publish.ts and feed.ts called npx wrangler and npx marss. The Docker image has Bun, not Node/npm. npx does not exist; the process exits with status undefined (SIGKILL from the OS, not a normal exit code).

Why it took three launches to fix: Each call site was in a different file (deploy.ts, then feed.ts). The first failure was fixed in isolation. The second call site only surfaced on the next launch. A single grep -rn "npx" src/ on the first failure would have caught both.

Fix: npx → bunx in both files. The pattern: when a failure reveals a wrong assumption, search for every instance of that assumption before writing any fix.

2. `publish.ts` hardcoded the espeak-ng path

What happened: loadEnvironment() always set PHONEMIZER_ESPEAK_LIBRARY to /home/node/.local/espeak-ng/... — the NanoClaw-specific path. Inside the Docker container, espeak-ng is installed via apt at /usr/lib/x86_64-linux-gnu/. The hardcoded path doesn't exist; phonemizer fails to initialise; TTS crashes.

Fix: Only override the espeak paths if the custom directory exists. When it doesn't, trust the environment (which the Dockerfile sets correctly).

3. Raindrops tagged before publish succeeded

What happened: The pipeline tagged source articles as podcasted in Raindrop.io at step 9, then attempted to publish at step 10. When publish failed (which it did repeatedly), the articles were already hidden from future runs. Eight articles were silently consumed by failed runs and never made it into a published episode.

Fix: Swap steps 9 and 10. Tag after a successful publish, not before. The cost of re-using an article in two episodes (if publish fails between tag and a second run) is lower than permanently losing it.

Debt left: The eight leaked articles were manually untagged. There is no automated detection for this drift.

4. `PER_ARTICLE_CAP` not updated when raindrop count doubled

What happened: The per-article character cap was set to 40,000 when the pipeline fetched 3 articles (3 × 40K = 120K chars ≈ 30K tokens — within the 32K token context). When the raindrop count was raised to 6, the cap was not adjusted. 6 × 40K = 240K chars ≈ 60K tokens — nearly double the context window. Ollama rejected the request; axe essay-outliner exited with code 3 on every attempt including all three retries; the pipeline spent 21 minutes failing before giving up.

Why the mistake happened: The cap was a hardcoded constant whose correct value depended on another constant (raindrop count) in a different file. The relationship was documented in a comment, not enforced in code.

Fix: Derive PER_ARTICLE_CAP from the actual article count at runtime: (CONTEXT_CHARS - OUTPUT_RESERVE) / articleCount. Changing the raindrop count now automatically adjusts the cap.

5. Zod schema not updated when outliner TOML limits changed

What happened: The outliner TOML was updated to produce 7–10 sections with word quotas of 600–800. The Zod schema in pipeline.ts still enforced the old limits (5–7 sections, max 550 words). The outliner produced valid JSON that the schema rejected.

Why the mistake happened: Same class as #4: two places encoding the same constraint, maintained independently.

Fix: Updated Zod schema to match TOML. Long-term fix: the schema should be the single source of truth, or the TOML format field should be generated from the schema.

6. Transient exit-3 on `essay-outliner` with no retry

What happened: Ollama returned a 500 (model loading, OOM pressure, or context overflow) on the essay-outliner call. axe exits 3 on transient provider errors. The pipeline had no retry logic and crashed immediately.

Fix: Added retry loop (3 retries, 15s backoff) to both runAxe and runAxeAsync on exit 3. This surfaces in the OTEL trace as repeated axe:essay-outliner spans within the same step:outline span.

Systemic observations

Configuration constants that depend on each other are the main source of silent bugs. Incidents 4 and 5 are the same mistake. The fix in both cases is structural: derive, don't duplicate.

The OTEL traces are the only reliable record of what happened. The pipeline log captures high-level step names but not input sizes, error details, or timing. The traces captured the 21-minute retry loop on incident 4 precisely. Every future incident diagnosis started at otel-explorer.

The checkpoint is both the recovery mechanism and an attack surface. Because failed steps don't checkpoint, a failure in publish means the entire essay (5–10 minutes of compute) survives in the checkpoint and the next run resumes from there. But a failure in tagging (incident 3) means articles are consumed without producing a checkpoint entry. The pipeline treats the checkpoint as a progress log, not a transaction log — there is no rollback.

Exit code semantics are load-bearing. axe exit 3 means "transient provider error, safe to retry." Exit 1 means "bad request, do not retry." The pipeline originally treated all non-zero exits identically. The retry logic now distinguishes them.