feat(toolrouter): usage-based auto-preload (preload.auto_preload_count)

@zen-agentchecks n/achecks…zen/toolrouter-auto-preload-86zin6 → master38 files · +3010 −38review: @abir-taheerupdated 1mo ago

GitHub

▸Description· 1 comment · 6 noise

Description

Adds preload.auto_preload_count: N to tool-router session create. When set, the server resolves the top-N most-used tools for the (projectId, userId) pair from past tool-router usage and adds them to session.tools alongside the manual preload.tools list. Behind LaunchDarkly toolrouterAutoPreloadEnabled (default off).

What's in this PR

API surface (snake_case wire field):

preload.auto_preload_count: number on POST / PATCH session (v3 + v3.1)
Resolved slugs ride through session.tools / tool_router_tools alongside the manual preload.tools — no new top-level field on the public response, no internal metadata (computed_at / expires_at / reason) leaks to the wire

Architecture:

Per-(project, user) ranked usage pool lives in a new user_tool_usage_pool Postgres table. One row per pair; full ranked pool in a single JSONB column (we always read/write it together).
Refreshed from ClickHouse tool_execution_logs once per LD refreshAfterHours (default 24h) under a row-level refreshLockedUntil lock that other workers see and skip.
Per-session snapshot lives in a new auto_preload_snapshot JSONB column on tool_router_sessions so background TTL refresh never bumps configVersion. PATCH optimistic concurrency stays clean.
Selector pipeline (pure): dedup against manual preload + helper slugs → drop slugs deprecated since pool computed → re-apply session filters → drop oversize tools (chars/N heuristic) → take top-N by score → sort alphabetically (prompt-cache stability).
Stale-preservation: if fresh ClickHouse query is empty AND existing pool is non-empty, the row is preserved (computedAt bumped, data untouched). Quiet users keep their preferences. Hard ceiling at maxStalePreservationDays (default 30d).

ClickHouse query (filters justified against prod data, 2026-05-12):

TR-originated, non-meta, successful calls only
tool_router_session_id != '', provider != 'composio', errorRequest = ''
No sandbox filter: verified that sandbox_id is session-scoped (set after workbench provisioning), not call-scoped, so filtering it would over-drop legitimate direct MCP calls. Sub-tool rows from COMPOSIO_MULTI_EXECUTE_TOOL carry source = 'mcp' (identical to direct MCP) — no way to distinguish today, and both are agentic signals anyway.

Failure modes are non-blocking:

ClickHouse timeout / error → session is created without auto-preload (snapshot null), logged.
Pool empty after filters → snapshot has resolved: [], reason: 'auto_preload_count_zero_after_filters'.
Pool DB read fails → in-memory empty pool, snapshot null.
The only Err that propagates is auto_preload_count > maxCount (400 with explicit message).

DB migration (v86):

New column ToolRouterSession.autoPreloadSnapshot (JSONB nullable)
New table user_tool_usage_pool with (projectId, userId) unique index + computedAt index
Liquibase + corresponding Prisma schema additions

LD flags (11 new, all project-scoped):

Flag	Default	Purpose
`toolrouterAutoPreloadEnabled`	`false`	Master kill switch
`toolrouterAutoPreloadMaxCount`	`25`	Cap on `auto_preload_count` (rejects 400 over)
`toolrouterAutoPreloadLookbackDays`	`30`	ClickHouse lookback (clamped to ≤30 in code)
`toolrouterAutoPreloadHalflifeDays`	`7`	Decay half-life
`toolrouterAutoPreloadPoolSize`	`50`	Per-user pool depth
`toolrouterAutoPreloadMinExecutionCount`	`2`	Floor to consider a tool
`toolrouterAutoPreloadRefreshAfterHours`	`24`	Pool refresh interval
`toolrouterAutoPreloadMaxStalePreservationDays`	`30`	Drop preserved pools older than this
`toolrouterAutoPreloadMaxTokensPerTool`	`2000`	Per-tool size guard
`toolrouterAutoPreloadCharsPerTokenEstimate`	`4`	Heuristic ratio
`toolrouterAutoPreloadClickhouseTimeoutMs`	`1500`	Pool query deadline

Known limitation (follow-up)

PATCH accepts auto_preload_count but does not recompute the snapshot on filter changes. The read path re-fetches per-tool schemas at request time and execution enforces the gate, so this degrades to "agent sees a tool but the call returns a filter error" — not a correctness issue. Full PATCH-triggered recompute is a planned follow-up.

How did I test this PR

Unit tests (pnpm vitest run src/lib/toolRouterV2/features/auto_preload/ src/lib/toolRouterV2/features/session/util/autoPreloadSnapshot.unit.test.ts src/lib/toolRouterV2/utils/preload.unit.test.ts): 53 tests pass. Coverage:
- Selector: alphabetical sort for prompt-cache stability, requestedCount cap, dedup against manual + helper slugs (case-insensitive), drops slugs missing tool details (deprecated since pool), respects toolkit allowlist, drops oversized tools, empty-pool clean exit, estimateToolTokens chars/N heuristic
- Snapshot orchestrator: fresh-pool serves as-is (no ClickHouse hit), stale-pool refresh under lock, contention path serves stale pool, stale-preservation on empty fresh, first-time empty persists empty, ClickHouse error falls through to existing pool, stale-beyond-ceiling drops + refreshes
- Snapshot serializer: parse/serialize round-trip, malformed JSON → null (graceful degradation), isSnapshotStale boundary
- Existing 28 preload tests still pass — no regression
Type check (pnpm check-types): exit 0
Lint (pnpm lint): 0 errors, 1 pre-existing warning unrelated to this change
ClickHouse query semantics verified against prod data via METABASE_POST_API_DATASET (1.82M sub-tool rows over 3 days, source distribution confirmed)

🤖 Generated with Claude Code

@zen-agent1mo ago

CI note: env-var-check is failing on every PR opened against this repo right now (verified across zen/*, cryo/*, and other recent PRs — workflow env-check.yml). The OpenAI API key the workflow uses returns 401 (invalid_api_key) on PR-triggered runs while master pushes pass. Pre-existing repo-level issue, not from this change.

▸ 6 bot/status comments hidden

@vercel[bot]1mo ago

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Actions	Updated (UTC)
apollo	Ready	Preview, Comment	May 12, 2026 1:52pm

Project	Deployment	Actions	Updated (UTC)
debby	Skipped		May 12, 2026 1:52pm

@github-actions[bot]1mo ago

🔍 Suggested Reviewers

Based on git blame analysis of 31 file(s):

Contributor	Contribution	Files
Zen	43%	31
Dhawal Upadhyay	17%

loading diff…

Files with missing lines	Patch %	Lines
...terV2/features/auto_preload/compute_for_session.ts	5.80%	146 Missing :warning:
...features/auto_preload/dbUtils/userToolUsagePool.ts	8.00%	138 Missing :warning:
...RouterV2/features/auto_preload/clickhouse_query.ts	11.11%	48 Missing :warning:
...llo/src/lib/toolRouterV2/features/session/patch.ts	38.15%	47 Missing :warning:
...pps/apollo/src/common/lib/external/launchDarkly.ts	20.93%	34 Missing :warning:
...lib/toolRouterV2/features/auto_preload/snapshot.ts	75.37%	33 Missing :warning:
...lo/src/lib/toolRouterV2/features/session/create.ts	29.54%	31 Missing :warning:
...c/lib/toolRouterV2/features/auto_preload/config.ts	6.89%	27 Missing :warning:
apps/apollo/src/lib/toolRouterV2/schema/config.ts	0.00%	10 Missing :warning:
...lo/src/pages/api/v3/tool_router/session_schemas.ts	75.60%	10 Missing :warning:
... and 7 more

Flag	Coverage Δ
e2e-tests	`5.79% <1.84%> (+0.03%)`	:arrow_up:
self-hosted-tests	`5.61% <1.84%> (+0.04%)`	:arrow_up:
thermos-unit-tests	`?`
unit-tests	`58.67% <45.45%> (+0.39%)`	:arrow_up:

Files with missing lines	Coverage Δ
...ib/toolRouterV2/features/auto_preload/constants.ts	`100.00% <100.00%> (ø)`
...rc/lib/toolRouterV2/features/auto_preload/types.ts	`100.00% <100.00%> (ø)`
...pollo/src/lib/toolRouterV2/features/session/get.ts	`95.30% <100.00%> (+0.13%)`	:arrow_up:
...lo/src/lib/toolRouterV2/features/session/schema.ts	`100.00% <ø> (ø)`
...terV2/features/session/util/autoPreloadSnapshot.ts	`100.00% <100.00%> (ø)`
apps/apollo/src/lib/toolRouterV2/schema/api.ts

feat(toolrouter): usage-based auto-preload (preload.auto_preload_count)

Description

What's in this PR

Known limitation (follow-up)

How did I test this PR

🔍 Suggested Reviewers

Description

What's in this PR

Known limitation (follow-up)

How did I test this PR

🔍 Suggested Reviewers

💡 Recommendation

🧪 Test Results Summary (E2E Tests)

🧪 Test Results Summary (Self-Hosted Tests)

🧪 Test Results Summary (Unit Tests)

Codecov Report