Antfly

Distributed search engine: BM25 full-text, vector, and graph search in one service.

Goals

A single search backend for three distinct retrieval patterns: BM25 keyword search over the diary (SQI pattern), semantic vector search over GitHub stars, and full-text search over the benchristel wiki. We wanted one service rather than Elasticsearch + a vector DB + a separate BM25 index.

Effectiveness

Effective for everything we've actually used. BM25 search over the diary and wiki works well. The SQI pattern — indexing synthetic queries per chunk rather than the chunks themselves — significantly improves recall on short, context-dependent diary entries. Vector search on GitHub stars works but hasn't been load-tested.

What made it effective

Single /api/v1/query endpoint handles BM25, vector, and hybrid in one request shape — less per-backend code
Table-level index configuration: some tables have embeddings, some don't; each table is its own retrieval context
Batch insert at POST /api/v1/tables/<table>/batch with a plain JSON dict is low-friction to populate

Bonus utility

Graph search is available but unused. Once the diary and wiki have relationship edges, traversal-based retrieval (e.g. "what tasks relate to this concept?") is possible without adding another service.

Friction / pain points / surprises

/api/v1/search returns HTML, not search results. The URL looks like it should be the query endpoint. It's the dashboard. Use /api/v1/query. This has bitten us more than once.

benchristel-wiki has no embeddings index. HyDE search against the wiki falls back to extracting key terms from the hypothetical document and running BM25. Semantic search on the wiki requires re-indexing with an embeddings index — not done yet, so recall depends entirely on keyword overlap.

Response shape is nested. { "responses": [{ "hits": { "hits": [...] } }] } — two levels of wrapping before you reach the actual hits. Every consumer needs the same unwrap boilerplate.