Claude Code
AI coding assistant by Anthropic. Operates as an agentic pair programmer with tool use, file editing, and long-horizon task execution.
Goals
Replace the slow loop of reading code → thinking → writing code → running tests with a faster loop where the agent does the mechanical parts: file search, reading, editing, running commands, interpreting output. The goal wasn't autocomplete — it was autonomous completion of multi-step engineering tasks.
Effectiveness
High, for well-scoped tasks. The Inertia Mill ports/adapters refactor, the podcast pipeline, the reviews site — all done through Claude Code, all substantially faster than solo. The agent handles the context-gathering and cross-file consistency that makes refactoring tedious. For open-ended tasks with clear acceptance criteria it's excellent.
What made it effective
- Tool use is the core differentiator. The agent reads files before editing, runs tests to verify changes, searches the codebase before proposing structure. It doesn't hallucinate file contents — it reads them.
- The memory system (
MEMORY.md+ topic files) lets the agent maintain context across sessions: secrets locations, SSH key preferences, architectural decisions, lessons learned. The agent builds on previous sessions rather than starting cold. - Skills extend the base capability with domain-specific agents (inertia-mill, antfly, podcaster) installable as tools.
- Plan mode surfaces the approach before writing any code — useful for architectural decisions where the wrong approach is costly to undo.
Bonus utility
The agent notices things. Branding drift (inertia- instead of regular-), loose untracked files, expired patterns. The Jeeves dynamic — restoring order without being asked — has been more valuable than expected.
Friction / pain points / surprises
Context limits cause compaction, which loses nuance. Long sessions get summarised; the summary is accurate but lossy. Architectural reasoning from early in a session ("why we chose X over Y") can disappear. The memory system partially mitigates this but requires discipline to populate.
The agent sometimes over-engineers. Asked to fix a bug, it may refactor the surrounding code. Asked to add a feature, it may introduce abstraction for a single use case. Requires active steering: "just fix this, don't clean up the surrounding code."
Tool call latency accumulates. Each file read, grep, or bash invocation adds round-trip time. On tasks requiring many reads before any writes, the pace feels slow. Speculative parallel reads (reading 4 files at once) help but require the agent to anticipate what it needs.
Permission prompts interrupt flow. Every new type of action (new bash command, writing a new file path) requires approval. Reasonable as a safety model; friction in practice when doing exploratory work that touches many different commands.
Fixes the presenting symptom, not the class. The npx → bunx incident: three separate deploy failures on three separate pipeline runs, each exposing a different call site (deploy.ts, then feed.ts), each fixed individually as it surfaced. The right move on the first failure was grep -rn "npx" src/ — one command, all instances, one fix. The agent diagnosed correctly each time but scoped the repair too narrowly. The pattern: when a failure reveals a wrong assumption ("npx is available in this container"), the fix should invalidate that assumption everywhere it appears, not just where the error happened to land.
On this review itself
This review is opinion backed by memory. The memory is unreliable — compacted, selective, self-serving in the way all self-assessments are. What would make it credible is a corpus of session logs: timestamped records of what was attempted, what succeeded, what failed, and what had to be retried. The OTEL traces from the podcast pipeline are close to this — structured, immutable, searchable. The gap is that traces exist only for instrumented pipelines, not for the agent's own actions. A session where the agent makes three wrong assumptions before converging looks, in retrospect, like competence. The trace would show the thrashing.
The question is what to call records that capture both failures and successes. "Incident reports" connotes outages. "Postmortems" connotes blame. "After-action reports" — the military term — is closer: a structured debrief of what was attempted, what worked, what didn't, and what changes. These reviews would be better as summaries derived from after-action reports than as impressions written from memory.