Description
Closes the easy half of the May 2026 SSRF audit on Mercury and adds a CI rule
that prevents new actions from re-introducing the same shape.
This PR ships three things:
- Deletes 20 SSRF-vulnerable actions that had ≤ 50 calls / 30 days. Most
had zero traffic; the largest had 50 calls / 12 projects.
- Adds
ci_checks/lint_caller_url_to_http_request.py — an AST lint that
blocks any new action where a caller-controlled URL or headers dict reaches
self.http_request(...) without going through a wrapper.
- Extends
ci_checks/action_deletion.py with an allowlist file so the
intentional security deletions don't fail the existing
“deployed-action-disappeared” check.
What was deleted (20 actions, all vulnerable, all <=50 calls in last 30 days)
Each entry has a corresponding row in ci_checks/artefacts/deletion_allowlist.json
with a one-line reason and the 30-day usage that justified deletion.
| Tier | Toolkit / action | 30d calls | Why |
|---|
| 1 | salesforce_service_cloud/retrieve_connected_app_private_key | 0 | Caller-named env var / file path → cred exfil |
| 1 | intelliprint/merge_files | 0 | Caller file_path opened → bytes POSTed to merge endpoint |
| 1 | brightdata/filter_dataset | 3 | Caller-supplied paths opened → multipart POST |
| 2 | fluxguard/fluxguard_webhook_notification | 0 | Caller webhook_url + body returned (incl. error path) |
| 2 | formdesk/formdesk_webhook_integration | 0 | Caller webhook_url + caller method (P0 RCE bridge in self-host) |
| 2 | docsbot_ai/docsbot_upload_file_to_cloud_storage | 0 | Caller upload_url + caller headers |
| 2 | dock_certs/retrieve_credential | 0 | Caller id falls back to HTTP fetch |
| 2 | truvera/retrieve_credential | 0 | Same shape (likely fork) |
| 2 | imgix/imgix_blend | 0 | Caller base_image_url body returned |
| 2 | pdfmonkey/preview_template | 0 | Caller preview_url body returned |
| 2 | pdfmonkey/download_document_file | 21 | Caller download_url body returned |
| 2 | zendesk/create_attachments | 0 | Caller file.url (file:// gated only to /tmp) |
| 2 | zendesk/create_custom_object_record_attachment | 0 | Same shape |
| 2 | excel/upload_workbook | 50 | Caller source_url, only http(s):// gate |
| 2 | googlephotos/upload_photo (slug GOOGLEPHOTOS_UPLOAD_MEDIA) | 26 | Caller url zero validation |
| 2 | deepgram/speech_to_text_pre_recorded | 49 | Caller audio_url; transcript leaked via STT |
| 2 | scrape_do/proxy_mode | 1 | Caller url + hardcoded verify=False |
| 4 | nango/proxy_get | 6 | Caller custom_headers proxied unrestricted |
| 4 | scrapfly/scrapfly_scrape | 13 | Caller url + caller headers |
| 4 | scrapingbee/scrapingbee_proxy_mode | 0 | Caller url + caller headers |
Total surface removed: 20 actions across 17 toolkits. The toolkits themselves
remain functional (every one has other, non-vulnerable actions still in place).
What was NOT deleted (residual risk — read this before merging)
Four actions in the audit have >50 calls / 30 days and are NOT touched in this
PR. They are still SSRF-shaped and remain exploitable until follow-up work:
| Slug | 30d calls | 30d projects | 30d connections | Class |
|---|
googledrive_upload_from_url | 43,983 | 633 | 1,329 | Tier 3 + Tier 4 amplifier (caller URL + source_headers + verify_ssl=False opt-in) |
supabase_invoke_edge_function | 1,673 | 49 | 59 | Tier 4 amplifier (caller headers merged into outbound request) |
share_point_upload_from_url | 1,162 | 20 | 31 | Tier 2 + Tier 4 (caller file_url + caller source_headers) |
supabase_deploy_function | 914 | 42 | 55 | Tier 2 (caller file_url fetched, deployed, body leaked on error) |
Two follow-up paths to close these without Apollo changes:
- (A) In-place fix: add
mercury/utils/url_validation.py (DNS-resolve +
reject RFC1918 / 127.0.0.0/8 / 169.254.0.0/16 / IPv6 link-local; mirror of
apps/apollo/src/common/utils/url/ssrf.ts), wrap the 4 callsites, strip
caller-supplied Authorization / Metadata-* headers. ~150 LOC + 4 callsite
edits.
- (B) Egress proxy in front of Mercury with a per-toolkit allowlist sourced
from each
apps/<x>/config.ts baseUrl(). Architecturally cleaner backstop
for all SSRF, including the 60+ unaudited candidates the new lint surfaces
in --all mode (sendbird, epic_games, etc.).
The Tier 5 master amplifier (metadata['base_url'] overridable via
custom_auth_params.base_url, ~29k callsites in Mercury) is still open and
must be closed at Apollo's ZCustomAuthParams.base_url Zod schema. That's
explicitly out of scope here — this PR is Mercury-only.
How the new CI check prevents future SSRF actions
ci_checks/lint_caller_url_to_http_request.py is an AST lint that runs in
--diff mode on every Mercury PR (changed apps/<x>/actions/*.py files only).
The single load-bearing rule:
A URL or headers dict that traces back to a request.<field> MUST NOT reach
self.http_request(...) unless it has been wrapped in validate_external_url(...)
(or the URL is pinned to metadata['base_url'] + path).
Detection logic (intra-procedural, no false positives on path-segment IDs)
For each apps/<x>/actions/*.py:
- Collect tainted fields from every Pydantic
BaseModel declared in the file:
- URL fields = field name in
{url, uri, endpoint, webhook, callback} OR ends in _url|_uri|_endpoint|_webhook|_callback OR annotated as HttpUrl|AnyUrl|AnyHttpUrl|Url.
- Header fields = field name
headers or ending in _headers AND annotated as Dict|dict|Mapping.
- Track aliases: a single-statement assignment
local = request.<tainted_field> propagates taint to local. Sufficient for every audit shape.
- Walk every
self.http_request(...) call: flag if the url= kwarg references a tainted URL field (directly or via alias) AND the expression doesn't pass through a safe wrapper (validate_external_url / safe_external_url); flag if headers= references a tainted header field.
Why this is the right level of strictness
The first cut substring-matched webhook / endpoint and false-positived on every webhook-CRUD action in the codebase (webhook_id is a path segment, not a URL). The strict-suffix heuristic shipped here:
- Catches every Tier 2/3/4 shape from the audit (file_url, webhook_url, audio_url, source_url, base_image_url, source_headers, custom_headers, ...).
- Allows the canonical safe pattern
metadata['base_url'] + path even when the path contains request.<id>.
- Allows explicit URL validation through
validate_external_url(...).
What happens on a future PR that introduces SSRF
Author writes:
class FooRequest(BaseModel):
target_url: str = Field(...)
class Foo(ApiAction[...]):
def execute(self, request, metadata):
return self.http_request(method="get", url=request.target_url)
CI fails on that PR with:
- apps/foo/actions/foo.py:7 > SSRF: caller-controlled URL passed to
self.http_request(url=...). Wrap with validate_external_url(...) or pin the
URL to metadata['base_url'] + path.
The author has three options:
- Pin to
metadata['base_url'] — the toolkit's allowed host(s) are declared in apps/<x>/config.ts. Canonical safe pattern.
- Wrap in
validate_external_url(...) — explicit allowlist of public IPs / public DNS / scheme.
- Add
# ci-skip: caller-url-to-http-request (<reason + ticket>) — ack residual risk in code, with a tracking ticket for follow-up. Requires PR review to land.
Coverage scope and known limitations
- Mercury-local only. The Apollo Tier 5 amplifier is closed at the schema layer in
apps/apollo/src/lib/tool_execution/common/schema.ts, not here. This PR doesn't touch Apollo.
- Intra-procedural. A helper function that builds the URL from
base_url + segment then returns it is opaque to the lint. Authors who go through such helpers may need to baseline via # ci-skip until the helper is recognized as safe (or until we add the helper to a _SAFE_BUILDERS list).
- Existing violations are not auto-fixed. Running the lint with
--all surfaces ~67 candidates beyond the 20 deleted in this PR (sendbird/epic_games/etc., plus the 4 high-traffic survivors). They are not annotated in code; PR CI runs in --diff mode so they don't block unrelated PRs. They will fire when someone touches those files — the right time to address them.
request.headers.update(local) patterns can slip past the alias tracker (e.g. supabase/invoke_edge_function). This is intentional residual risk, documented above.
Why one lint, not five
Tiers 2/3/4 of the audit collapse to “caller-controlled URL or headers reach outbound HTTP without validation.” That's one rule. Tier 1 (env-var / file reads with caller key) is a different vulnerability class and is already covered by existing lint_local_file_open.py + lint_direct_env_access.py. Tier 5 is an Apollo concern. Five linters would mean five places to keep current with how http_request is invoked, five sets of false-positive ci-skip annotations, and five maintenance burdens for the same security signal.
Action deletion CI — allowlist mechanism
ci_checks/action_deletion.py would have failed loudly on every deleted enum
because each is registered in Thermos. Rather than disable the check, this PR
extends it to read ci_checks/artefacts/deletion_allowlist.json.
The allowlist is append-only in spirit — every entry has the deleted enum,
date, PR, and a one-line reason. Future deprecations / security deletions add a
new entry rather than mutating the rule. If an entry is silently removed, the
CI starts failing again, which is the correct behavior.
How did I test this PR
-
Unit tests for the new lint (tests/test_lints/test_caller_url_to_http_request.py, 12 cases):
pytest tests/test_lints/test_caller_url_to_http_request.py -q
# 12 passed in 0.04s
Coverage: direct caller URL, alias-via-local-var, caller headers, HttpUrl
annotation, base_url-pinned (allowed), validate_external_url wrapper
(allowed), wrapper through alias (allowed), metadata-only headers (allowed),
no URL/header fields (allowed), bool field with _headers suffix (not flagged),
caller URL concat with literal path (flagged), path-segment IDs webhook_id
/ endpoint_id / app_id (NOT flagged — false-positive regression test).
-
Action-deletion CI passes with the allowlist:
make config-json
python -m ci_checks.action_deletion
# Found 18 changed app(s); Checked 18 app(s), 0 violation(s) in 0 file(s)
-
New lint passes in --diff mode on this PR (no new SSRF actions
introduced):
python -m ci_checks.lint_caller_url_to_http_request
# No action files changed. Nothing to check.
-
Lint surveys the codebase in --all mode — surfaces 67 candidates across
22 toolkits including the 4 high-traffic survivors and several unaudited
shapes (sendbird, epic_games, givebutter). Tracked as residual risk; not in
scope for this PR.
-
AST validation of every modified tool.py:
python -c "import ast; ast.parse(open('apps/<x>/tool.py').read())"
# OK for all 18 tool.py files
-
Ruff format + check clean on all modified files; tests pass post-format.
Follow-up work tracked separately
- Patch the 4 high-traffic survivors with
validate_external_url (Path A above).
- Apollo
ZCustomAuthParams.base_url per-toolkit allowlist (Tier 5 master amplifier, ~29k callsites neutralised in one PR).
- Address the ~63 unaudited lint hits the new check surfaces with
--all.
- Egress proxy in front of Mercury with toolkit-allowlist sourced from
config.ts.
- Self-hosted hardening:
automountServiceAccountToken: false on Mercury pod, default-deny NetworkPolicy, onprem-testbed SSRF acceptance scenario.
🤖 Generated with Claude Code