Pydoll

Async Chromium automation library for Python — no WebDriver required.

Goals

Potential scraper for paywalled or JavaScript-rendered articles that plain HTTP fetch can't reach. If Raindrop sources increasingly live behind JS gates, a headless browser becomes necessary.

Verdict

Not the right tool for this use case right now. Pydoll is a full browser automation framework — the right answer when you need to log in, click through, fill forms, or extract data from a SPA. Our current sources (two Substacks and a blog) are server-rendered and fetchable without a browser. The weight of a Chromium binary + async Python process inside a Bun pipeline doesn't pay off for content that defuddle can handle with a single HTTP GET.

Keep it in mind if sources shift toward paywalled or heavily JS-rendered sites.

What makes it effective

Friction / pain points / surprises

Python in a Bun/TypeScript pipeline. Calling Pydoll from sources.ts means subprocess or microservice overhead. A Python scraping sidecar is maintainable but adds a process boundary and restart surface.

Chromium binary weight. A full Chromium install is 200–300 MB and requires specific system libraries. The container would need provisioning in ensure-deps.sh, and cold start time for a headless browser launch is measured in seconds per URL.

Stealth features are probably unnecessary for our sources. Substack and ACOUP are not actively blocking scrapers. The stealth machinery is overhead for sites that don't require it.

6.7k stars but young. Active development is good; a young project also means API churn. Check changelog before pinning a version.

When to reach for it