Production Mercury logs (env production, service mercury, 2026-05-17/18 window) leaked 32 unique real user email addresses through Google API URL paths embedded in HTTPError tracebacks. Sample lines (from mercury-logs Datadog index):
Error executing GET https://www.googleapis.com/gmail/v1/users/escurel001%40gmail.com/history?startHistoryId=704851&...Error executing GET https://www.googleapis.com/gmail/v1/users/cyndyruzichpsyd%40gmail.com/history?...Error executing DELETE https://www.googleapis.com/calendar/v3/calendars/dr.rizqiaminia.raclinic%40gmail.com/events/...Forbidden for url: https://www.googleapis.com/gmail/v1/users/spencerpauly%40gmail.com/messages?q=fooURL patterns leaking emails (last 24h):
https://www.googleapis.com/gmail/v1/users/EMAIL/history?... (11x)https://www.googleapis.com/calendar/v3/calendars/EMAIL/events?... (11x)https://www.googleapis.com/gmail/v1/users/EMAIL/messages?... (8x)https://www.googleapis.com/calendar/v3/calendars/EMAIL/events?... (variants, 11x)https://www.googleapis.com/calendar/v3/calendars/EMAIL/events/{eventId} (2x)mercury/utils/http.py::_sanitize_url decodes the query via parse_qsl and re-encodes it, applying _EMAIL_RE.sub to each value. After urlunsplit it runs _EMAIL_RE over the full URL — but _EMAIL_RE only matches a literal @, not the URL-encoded %40. Google's REST API embeds the account email as escurel001%40gmail.com in the path segment (which is never decoded), so the encoded email survives sanitisation and lands in Datadog through requests.exceptions.HTTPError's for url: <url> rendering.
This is the inverse half of PR #22997 / #23358 (which generalised embedded-URL handling in _sanitize_traceback): those PRs caught access_token=… inside the URL, but the URL-encoded email vector in the path was still unguarded.
Add _EMAIL_PCT_RE (user%40domain.tld) and _mask_email_pct alongside the existing _EMAIL_RE / _mask_email. _sanitize_url and _sanitize_traceback now run both passes:
user@domain (existing).user%40domain (new) — masks the encoded form to u***%40domain so the URL is still well-formed and obviously a redacted email when read in logs.Per-change justification:
_sanitize_url: adds the %40 pass after the existing _EMAIL_RE.sub. Necessary because parse_qsl does not touch path segments, and _EMAIL_RE does not match %40. Without this pass, every Gmail / Calendar / Drive HTTPError keeps leaking the account email._sanitize_traceback: routes embedded URLs through _sanitize_url (so the path-level fix flows through automatically), then runs both literal and %40 email passes on the remaining free-floating text — defends against %40 emails outside URL fragments (e.g. pydantic validation errors echoing URL-encoded payload values).Scoped tests covering the existing sanitisation surface plus 5 new regression tests against the prod-observed vectors:
cd /workspace/mercury && source .venv/bin/activate
pytest tests/test_utils/test_http.py -v
Result: 162 passed in 0.94s. New cases:
TestSanitizeUrl::test_masks_url_encoded_emails_in_path — /gmail/v1/users/escurel001%40gmail.com/history reproducer.TestSanitizeUrl::test_masks_url_encoded_emails_in_calendar_path — /calendar/v3/calendars/dr.rizqiaminia.raclinic%40gmail.com/events/... reproducer.TestSanitizeUrl::test_masks_url_encoded_emails_in_query_value — Drive ?q=fullText+contains+%27brandon%40gmail.com%27.TestSanitizeTraceback::test_strips_url_encoded_email_inside_embedded_url — HTTPError ... for url: .../users/spencerpauly%40gmail.com/messages.TestSanitizeTraceback::test_strips_url_encoded_email_outside_url — defense in depth for non-URL contexts.Manual repro before/after with the exact prod log line:
>>> _sanitize_url("https://www.googleapis.com/gmail/v1/users/escurel001%40gmail.com/history?startHistoryId=704851")
# before: '... /users/escurel001%40gmail.com/history?startHistoryId=704851' # LEAK
# after: '... /users/e***%40gmail.com/history?startHistoryId=704851'
Lint: ruff check mercury/utils/http.py tests/test_utils/test_http.py — clean.
Origin: cron-a3f8b1896f41 / zen-cron-2d865c43b920 Triggered by: rahul.lingala@composio.dev | Source: cron Session: https://zen-api-production-4c98.up.railway.app/dashboard/#/chat/zen-cron-2d865c43b920