LangAlpha

Financial research agent with PTC (Programmatic Tool Calling) — the LLM writes and runs Python code in a Daytona sandbox to call MCP-backed financial data tools.

Repo: ginlix-ai/LangAlpha · Python/FastAPI · LangGraph · React frontend

Evaluated 2026-04-14. Three models tested: qwen3.5:latest (9.7B), qwen3.5:27b (128K), Gemma 4 31B. Full experiment log at /workspace/group/projects/langalpha/LAB_NOTEBOOK.md.

What it is

LangAlpha is a self-hostable AI research assistant for stock analysis. The core idea — PTC — is genuinely novel: instead of calling financial tools directly, the agent writes Python code that runs in a Daytona sandbox. This means you can ask for a DCF model and get back executable code + charts, not just prose.

Flash mode skips the sandbox for quick answers: web search, market data, SEC filings, direct tool dispatch.

The market data stack is real and clean: Yahoo Finance (no API key), real-time quotes, SEC EDGAR integration, analyst ratings, revenue breakdowns.

What we ran

What we found

Market data API: ★★★★☆

The /api/v1/market-data/stocks/{symbol}/overview endpoint is solid. Real-time prices, PE ratios, analyst ratings, cash flow, revenue by segment — all from Yahoo Finance with no key. Worth extracting as a standalone library.

Flash agent on Ollama: ★★★☆☆ (Gemma 4) / ★☆☆☆☆ (qwen3.5)

Three models, two distinct failure modes:

qwen3.5 9B (32K ctx): Secretary skill onboarding loop. Every query — "AAPL price", "use get_company_overview for AAPL" — intercepted by the secretary skill and replaced with a canned "I'm your research secretary, ready to help..." greeting. Zero financial tool calls across all trials.

qwen3.5 27B (128K ctx): Same loop. Also introduced hallucination: when asked for AAPL, returned ASML data; invented an "AI chip market leaders" topic from nothing. Expanding num_ctx to 131072 via POST /api/create provided no benefit and worsened instruction-following.

Gemma 4 31B: Makes tool calls, correctly populates schemas, handles the 24-tool flash setup. The persistent NVDA substitution we observed was not a weight bias — it was the shared-flash-workspace checkpointer loading prior wrong conversation turns as context. Once the postgres checkpointer was cleared, Gemma 4 correctly routed "What is AAPL trading at?" → get_company_overview(symbol="AAPL"). Direct Ollama tests with the full flash system prompt and 24 tools confirmed this: AAPL returned correctly on a clean context.

The remaining real problems are the secretary skill onboarding loop (qwen3.5-specific) and silent PTC failure on memory sandbox.

PTC mode: broken on memory sandbox

With SANDBOX_PROVIDER=memory, PTC dispatch silently falls back to flash behavior. No error. The workspace is created but never initialized. This is the core value proposition of LangAlpha — and it requires Daytona.

What's fixable (and what isn't)

  1. NVDA anchor — root cause: shared flash workspace checkpointer contamination. All flash queries for a given user share a single deterministic flash workspace ID (uuid5(namespace, user_id)). LangGraph's postgres checkpointer accumulates every prior conversation turn in that workspace. After ~30 NVDA-returning exchanges, the model was loading its own prior wrong outputs as in-context examples and pattern-matching forward. Not a weight bias, not a prompt issue. Fix: clear the checkpoints and checkpoint_writes tables, or scope flash workspace IDs per-session rather than per-user.

    We patched NVDA out of all 5 locations in prompts and skills anyway (flash_identity.md.j2, plan_mode.md.j2, sec/tool.py, SKILL.md, onboarding.md) — that's correct hygiene for any model — but the ticker confusion would have stopped once the checkpointer was cleared regardless of the prompt patches.

    Verified: with a clean checkpointer and the full 24-tool flash setup, Gemma 4 correctly routes "What is AAPL trading at?" → get_company_overview(symbol="AAPL").

  2. Secretary skill priority — the onboarding state machine overrides user queries. A "skip_onboarding": true preference flag or a message-count check would fix repeat-user UX.

  3. Token budget — 120K summarization threshold in agent_config.yaml is wrong for 32K models. Should be model-aware.

  4. get_user_data schema bug — secretary skill calls get_user_data(file='.watchlist.md') missing the required entity field. Gemma 4 avoids this; qwen3.5 trips over it.

What requires Daytona

PTC (the differentiator) is inert without a live sandbox. The memory provider exists for testing but produces no observable difference in output. Budget ~$30/mo for Daytona cloud or run it self-hosted to unlock the actual product.

Deployment notes

Verdict

Usable on Ollama with Gemma 4, but PTC requires Daytona. Flash mode works correctly on Gemma 4 once the checkpointer is clean. qwen3.5 is blocked by the secretary skill loop regardless of context size. With Daytona + Gemma 4 or a frontier model it would be ★★★★☆ — the PTC pattern is genuinely novel and the market data layer is production-quality. Without Daytona (memory sandbox), PTC silently falls back to flash and the differentiator is inert.