html-to-markdown

HTML → Markdown converter written in Go.

Goals

Convert extracted article HTML to clean Markdown for storage in Antfly and consumption by the essay writer. Would pair with a content extractor (Readability, Defuddle, etc.) in a two-step pipeline: fetch + extract HTML body → convert to Markdown.

Verdict

Excellent converter, wrong layer for our problem. html-to-markdown is a formatting tool, not an extractor — it faithfully converts whatever HTML it's given, including nav bars, footers, and cookie banners. It doesn't know what the article is. We need something that first identifies the main content, and defuddle does both steps in one pass. Use this only if we already have clean article HTML from another source.

What makes it effective

Friction / pain points / surprises

Go dependency. The library is Go-only. Calling it from a Bun/TypeScript pipeline means shelling out or using the REST API — neither is clean for an inline sources.ts function.

Not an extractor. This point is worth repeating: it converts the whole page's HTML without filtering. The output will include navigation, headers, footers, and every other element present in the source. A separate extraction step is mandatory.

Overkill for our use case. The podcast pipeline stores plain prose; we don't need table-of-contents conversion, image links, or footnote formatting. Defuddle's built-in Markdown output handles the content we care about without a second pass.

When to reach for it