PR Custody

feat: add cost instrumentation to direct LLM call paths

@AgentWrapperchecks n/achecks…feat/instrument-direct-llm-cost-metrics → next2 files · +191 −4updated 3mo ago

▸Description

Why

~60% of actual Bedrock spend is invisible to Datadog because two code paths make direct AnthropicBedrock calls without emitting cost metrics. The agent SDK path (cortex.agents.run.cost_usd) only covers agent-based calls, missing:

call_generic_llm() — used by curl_tester summarization, repro classification, failure categorizer (~$4,000-4,500/day)
Watchdog's ClaudeLLMProvider — runs every 30min with Opus (~$135-234/day)

What

Added Bedrock pricing table (Sonnet 4.5, Haiku 4.5, Opus 4.5/4.6) with 10% markup over Anthropic direct pricing, including cached/uncached input token rates
Instrumented ClaudeLLMProvider.call() in cortex/generic/llm.py — emits cost after both structured output and standard text response paths
Instrumented ClaudeLLMProvider.call() in watchdog/clients/llm/client.py — emits cost after .parse() call
Emits cortex.llm.direct_call.{cost_usd, input_tokens, output_tokens} with model and caller tags

How to Test

Deploy to staging, verify metrics appear in Datadog under cortex.llm.direct_call.cost_usd
Check tags: model:claude-sonnet-4-5, caller:generic_llm for cortex; model:claude-opus-4-6, caller:watchdog for watchdog
Verify cost values are reasonable (compare with AWS Cost Explorer Bedrock line items)

Pre-Review Checklist

I have self-reviewed this PR

Notes

Uses distribution metric type (same as cortex.agents.run.cost_usd) for proper percentile aggregation
Pricing is hardcoded rather than fetched dynamically — Bedrock pricing changes are rare and a code change is safer than a runtime dependency
The caller tag allows filtering dashboard by source (generic_llm vs watchdog)

🤖 Generated with Claude Code

loading diff…