Description
Customer tool-execution responses were leaking the production ElastiCache hostname through a Redis ConnectionError:
{
"data": {
"message": "Error -2 connecting to prod-thermos-elasticache-tl43sv.serverless.use1.cache.amazonaws.com:6379. Name or service not known.",
"status_code": null
},
"successful": false,
"error": "Error -2 connecting to prod-thermos-elasticache-tl43sv.serverless.use1.cache.amazonaws.com:6379. Name or service not known.",
"log_id": "log_kz6egqY9f0zd"
}
The leak was a direct str(exception) forward in APITool.execute's generic Exception branch (mercury/tools/api/tool.py). The same pattern existed in poll / setup / refresh / fetch_authz_entities, and in the serverless-layer traceback paths (ToolFunction.invoke, RecipeFunction.invoke, handler.run). The captured log_recorder entries forwarded by Thermos as response.Logs were a second leak channel.
What changed
New module mercury/utils/sanitization.py:
sanitize_external_error(message, *, context, log_original): detects internal-infrastructure markers and replaces the entire message with "An internal error occurred. Please try again later."
sanitize_external_payload(payload): recursive scrub of arbitrary nested structures (dict / list / tuple) — string values are passed through sanitize_external_error, non-strings are preserved.
scrub_log_entries(entries): walks log_recorder-shaped entry lists; scrubs message strings and recursively scrubs extras (which is Any).
Patterns matched (deliberately narrow):
| Pattern | What it catches |
|---|
| Internal URI schemes | redis://, rediss://, memcached://, postgres(ql)?://, mysql://, mariadb://, mongodb(+srv)?://, amqp(s)?://, kafka:// |
| AWS managed-service hostnames | *.cache.amazonaws.com (ElastiCache), *.rds.amazonaws.com (RDS), *.redshift.amazonaws.com, *.compute.internal, *.ec2.internal, only internal-*.elb.amazonaws.com / internal-*.elasticloadbalancing.amazonaws.com (AWS naming convention for non-internet-facing ELBs) |
| Private DNS suffixes | *.svc.cluster.local, *.cluster.local, *.composio.internal, *.internal.composio.dev |
| RFC 1918 / loopback / link-local IPs | 10.0.0.0/8, 172.16.0.0/12, 192.168.0.0/16, 127.0.0.0/8, 169.254.0.0/16 (incl. AWS metadata 169.254.169.254) |
| Composio service-host patterns | prod-thermos-…, apollo-internal, mercury-worker-…, etc. |
Explicitly NOT matched (per codex review iter 1, #23476/pulls/comments): public internet-facing AWS endpoints — <name>.<region>.elb.amazonaws.com, *.elasticbeanstalk.com, ec2-*.compute-1.amazonaws.com. Customers' upstream services may sit on those; a 5xx from one of them is a legitimate third-party error and must remain readable.
Applied at every error-response site:
| File | Sites |
|---|
mercury/tools/api/tool.py | execute (HTTPError + ExecutionFailed + Exception, plus scrub of ExecutionFailed.extra recursively and filter out reserved keys), poll (HTTPError + Exception), setup (HTTPError + Exception), refresh (HTTPError + Exception), fetch_authz_entities (Exception) |
mercury/serverless/base.py | ToolFunction.invoke and RecipeFunction.invoke catch-all traceback branches |
mercury/serverless/handler.py | Top-level run() Exception and proto-serialization-error branches; defense-in-depth final-guard pass over result["error"], nested result["result"]["error"], data.message, data.http_error (gated on successfull is not True and isinstance(..., str) so successful responses with structured data.message payloads — e.g. Gmail send/draft — are not mutated); and scrub_log_entries(result["logs"]) to close the log_recorder side-channel |
The original message is still logged at error level server-side via the loguru sinks (Datadog/stdout) — internal-infra details remain visible for debugging — but sanitize_external_error temporarily suppresses the log_recorder ContextVar around its own log call so the diagnostic line never lands in the customer-facing result["logs"] list.
Reviewer feedback addressed
| Iter | Source | Finding | Fix commit |
|---|
| 1 | codex | AWS pattern over-matches public ELB / EBeanstalk / EC2-public-DNS | ccbe921fbc |
| 2 | codex / cursor-bugbot | Final-guard corrupts data.message on successful responses | 90f5890869 |
| 3 | codex | sanitize_external_error's own log call leaks raw text into result["logs"] (recorder side-channel); pre-existing log lines also leak | 7023a665fe |
| 4 | codex | ExecutionFailed.extra returned verbatim; scrub_log_entries doesn't recurse | 8f03a0d97f |
How did I test this PR
Scoped pytest (754 passed, 1 pre-existing deselected):
$ pytest tests/test_tools/ tests/test_utils/test_sanitization.py \
tests/test_utils/test_http.py tests/test_serverless/ \
--deselect tests/test_serverless/test_execute_recipe.py::TestExecuteRecipe::test_execute_recipe_with_run_composio_tool
===== 754 passed, 6 skipped, 1 deselected in 75.44s =====
Deselected test is a pre-existing master failure (Weathermap config.json missing) — verified by stashing my changes and re-running.
New tests (tests/test_utils/test_sanitization.py, tests/test_serverless/test_handler_final_guard.py, tests/test_tools/test_api_tool.py):
- Pattern coverage: ElastiCache / RDS / EC2-internal / k8s
svc.cluster.local / Redis/Postgres/MongoDB/AMQP URIs / RFC 1918 / loopback / link-local / AWS metadata 169.254.169.254 / Composio service-host patterns.
- Passthrough coverage: public IPs (
8.8.8.8, 1.1.1.1), public AWS customer endpoints (*.elb.amazonaws.com, *.elasticbeanstalk.com, ec2-*.compute-1.amazonaws.com), third-party HTTPS URLs, validation errors, edge of RFC 1918 (172.15.x, 172.32.x), None / empty input.
- Handler boundary: successful response with structured
data.message (Gmail-style) passes through verbatim; failure response with internal-infra leak is scrubbed end-to-end; outer-level error string is scrubbed; non-string outer error is left alone; log-recorder entries with nested internal-infra extras are scrubbed; sanitize_external_error's own log call does not land in result["logs"].
- End-to-end APITool: a
ConnectionError raising the exact bug-report message produces a response where the leaked hostname appears NOWHERE in the serialized output; ExecutionFailed.extra with leaky nested values is recursively scrubbed and extra["error"] cannot override the official error field.
- Regression:
TestRegressionBugReport::test_full_bug_report_message pins the exact response shape from the original report.
Lint / typecheck:
$ make chk # ruff format + lint + mypy: PASSED
$ make snt # all CI sanity checks: PASSED
E2E note: the local Apollo + Thermos stack routes to staging/prod Mercury Lambda; internal-infra connection errors are not injectable from the public API path. The unit + handler-boundary tests cover the actual code surface.
Triggered by: Srujan A srujan@composio.dev | Source: slack
Session: https://zen-api-production-4c98.up.railway.app/dashboard/#/chat/zen-8ec724feef3b
🤖 Generated with Claude Code