LangAlpha
Financial research agent with PTC (Programmatic Tool Calling) — the LLM writes and runs Python code in a Daytona sandbox to call MCP-backed financial data tools.
Repo: ginlix-ai/LangAlpha · Python/FastAPI · LangGraph · React frontend
Evaluated 2026-04-14. Three models tested: qwen3.5:latest (9.7B), qwen3.5:27b (128K), Gemma 4 31B. Full experiment log at /workspace/group/projects/langalpha/LAB_NOTEBOOK.md.
What it is
LangAlpha is a self-hostable AI research assistant for stock analysis. The core idea — PTC — is genuinely novel: instead of calling financial tools directly, the agent writes Python code that runs in a Daytona sandbox. This means you can ask for a DCF model and get back executable code + charts, not just prose.
Flash mode skips the sandbox for quick answers: web search, market data, SEC filings, direct tool dispatch.
The market data stack is real and clean: Yahoo Finance (no API key), real-time quotes, SEC EDGAR integration, analyst ratings, revenue breakdowns.
What we ran
- Docker Compose deployment (custom
docker-compose.local.yml, no bind mounts for DooD compatibility) SANDBOX_PROVIDER=memory(no Daytona),OPENAI_BASE_URLpointed at Ollama- Three models, 14 experiments across three phases
What we found
Market data API: ★★★★☆
The /api/v1/market-data/stocks/{symbol}/overview endpoint is solid. Real-time prices, PE ratios, analyst ratings, cash flow, revenue by segment — all from Yahoo Finance with no key. Worth extracting as a standalone library.
Flash agent on Ollama: ★★★☆☆ (Gemma 4) / ★☆☆☆☆ (qwen3.5)
Three models, two distinct failure modes:
qwen3.5 9B (32K ctx): Secretary skill onboarding loop. Every query — "AAPL price", "use get_company_overview for AAPL" — intercepted by the secretary skill and replaced with a canned "I'm your research secretary, ready to help..." greeting. Zero financial tool calls across all trials.
qwen3.5 27B (128K ctx): Same loop. Also introduced hallucination: when asked for AAPL, returned ASML data; invented an "AI chip market leaders" topic from nothing. Expanding num_ctx to 131072 via POST /api/create provided no benefit and worsened instruction-following.
Gemma 4 31B: Makes tool calls, correctly populates schemas, handles the 24-tool flash setup. The persistent NVDA substitution we observed was not a weight bias — it was the shared-flash-workspace checkpointer loading prior wrong conversation turns as context. Once the postgres checkpointer was cleared, Gemma 4 correctly routed "What is AAPL trading at?" → get_company_overview(symbol="AAPL"). Direct Ollama tests with the full flash system prompt and 24 tools confirmed this: AAPL returned correctly on a clean context.
The remaining real problems are the secretary skill onboarding loop (qwen3.5-specific) and silent PTC failure on memory sandbox.
PTC mode: broken on memory sandbox
With SANDBOX_PROVIDER=memory, PTC dispatch silently falls back to flash behavior. No error. The workspace is created but never initialized. This is the core value proposition of LangAlpha — and it requires Daytona.
What's fixable (and what isn't)
NVDA anchor — root cause: shared flash workspace checkpointer contamination. All flash queries for a given user share a single deterministic flash workspace ID (
uuid5(namespace, user_id)). LangGraph's postgres checkpointer accumulates every prior conversation turn in that workspace. After ~30 NVDA-returning exchanges, the model was loading its own prior wrong outputs as in-context examples and pattern-matching forward. Not a weight bias, not a prompt issue. Fix: clear thecheckpointsandcheckpoint_writestables, or scope flash workspace IDs per-session rather than per-user.We patched NVDA out of all 5 locations in prompts and skills anyway (flash_identity.md.j2, plan_mode.md.j2, sec/tool.py, SKILL.md, onboarding.md) — that's correct hygiene for any model — but the ticker confusion would have stopped once the checkpointer was cleared regardless of the prompt patches.
Verified: with a clean checkpointer and the full 24-tool flash setup, Gemma 4 correctly routes "What is AAPL trading at?" →
get_company_overview(symbol="AAPL").Secretary skill priority — the onboarding state machine overrides user queries. A
"skip_onboarding": truepreference flag or a message-count check would fix repeat-user UX.Token budget — 120K summarization threshold in agent_config.yaml is wrong for 32K models. Should be model-aware.
get_user_dataschema bug — secretary skill callsget_user_data(file='.watchlist.md')missing the requiredentityfield. Gemma 4 avoids this; qwen3.5 trips over it.
What requires Daytona
PTC (the differentiator) is inert without a live sandbox. The memory provider exists for testing but produces no observable difference in output. Budget ~$30/mo for Daytona cloud or run it self-hosted to unlock the actual product.
Deployment notes
Dockerfile.backendis missingalembic.iniandmigrations/— add bothCOPYlines beforeuv sync- Run
uv run alembic upgrade head(withcd /app) before first backend start - No Supabase required for local OSS mode — all requests attributed to
local-dev-user - Frontend build fails (node_modules issue); backend API is usable standalone via curl
Verdict
Usable on Ollama with Gemma 4, but PTC requires Daytona. Flash mode works correctly on Gemma 4 once the checkpointer is clean. qwen3.5 is blocked by the secretary skill loop regardless of context size. With Daytona + Gemma 4 or a frontier model it would be ★★★★☆ — the PTC pattern is genuinely novel and the market data layer is production-quality. Without Daytona (memory sandbox), PTC silently falls back to flash and the differentiator is inert.