Description
TINYFISH_MCP_RUN_WEB_AUTOMATION_ASYNC (and any MCP tool whose schema combines a JSON-Schema pattern with a typed format) crashed Mercury Lambda. The user-reported burst was 26 × HTTP 500 in 2 s for org ok_l9XRaH2lMBCf / project pr_D7nmB72h_IdY / session trs_7s7Z61_nKTGn. Two distinct Pydantic v2.11 strictness issues hit in sequence:
Bug 1 — import time: regex kwarg removed.
File "<string>", line 236, in RunWebAutomationAsyncRequest
File "/var/lang/lib/python3.10/site-packages/pydantic/fields.py", line 1090, in Field
raise PydanticUserError('`regex` is removed. use `pattern` instead', code='removed-kwargs')
datamodel-code-generator still emits Field(..., regex=...) for JSON-Schema pattern constraints; Pydantic v2.11 hard-removed the kwarg.
Bug 2 — validation time: pattern= rejected on non-string types.
TypeError: Unable to apply constraint 'pattern' to supplied value <UUID> for schema of type 'uuid'
For schemas combining format: uuid with pattern, datamodel-code-generator emits session_id: UUID = Field(..., pattern='...'). Pydantic v2 refuses to apply the string pattern constraint to a UUID-typed field. Same applies to date, datetime, time, AnyUrl, Decimal, bytes, timedelta, IPv4Address, IPv6Address, IPvAnyAddress, etc.
Both fixes live in mercury/mcp/list_tools.py::_generate_pydantic_model_from_schema, mirroring the existing const=True strip from PR #15999.
# Bug 1 fix — translate kwarg, only when in real Field() arg context
generated_code = re.sub(r"([,(])(\s*)regex=", r"\1\2pattern=", generated_code)
# Bug 2 fix — strip pattern= from Field() calls whose annotation is non-str
_NON_STR_TYPE_RE = re.compile(
r":\s*(?:Optional\[\s*)?"
r"(?:UUID|AnyUrl|datetime|date|time|timedelta|Decimal|bytes|"
r"IPv4Address|IPv6Address|IPvAnyAddress|"
r"IPv4Interface|IPv6Interface|IPvAnyInterface|"
r"IPv4Network|IPv6Network|IPvAnyNetwork)\b"
)
_PATTERN_KWARG_RE = re.compile(r",\s*pattern=(?:'[^']*'|\"[^\"]*\")")
stripped_lines = []
for _line in generated_code.split("\n"):
if "Field(" in _line and _NON_STR_TYPE_RE.search(_line):
_line = _PATTERN_KWARG_RE.sub("", _line)
stripped_lines.append(_line)
generated_code = "\n".join(stripped_lines)
The bug-1 regex requires , or ( before regex= so it only touches kwargs and not literal regex= text inside string descriptions (caught by Codex review). pattern= on plain str fields is preserved — only non-string typed fields are stripped, and the format already enforces the value class.
Future-proofing notes
I checked the other v1 → v2 deprecations that datamodel-code-generator can emit:
min_items= / max_items= for arrays — deprecated in v2.11, still allowed (DeprecationWarning).
allow_mutation=False — deprecated, still allowed.
Only regex= is hard-removed. If a future Pydantic release removes the others, the same re.sub pattern can be extended.
How did I test this PR
Unit tests (tests/test_serverless/test_mcp_model_generation.py):
test_schema_with_pattern_no_regex_kwarg — basic regex= → pattern= translation, validates pattern still enforced.
test_tinyfish_async_workflow_schema — exact TINYFISH session_id pattern, regression guard.
test_pattern_translation_with_extra_kwargs — covers all Field() kwarg orderings (incl. first-arg case).
test_regex_in_description_is_not_corrupted — guards Codex-review concern that broad substitution would rewrite regex= text inside descriptions.
test_regex_substitution_does_not_match_quoted_text — covers paren+newline + missing-space orderings, documents residual risk.
test_pattern_stripped_when_field_type_is_uuid — bug 2 minimal repro.
test_tinyfish_run_web_automation_async_schema — full prod TINYFISH schema (uri + uuid + pattern), import + instantiation with real-shape data.
test_pattern_preserved_on_str_field — ensures we don't strip pattern from str fields.
pytest tests/test_serverless/test_mcp_model_generation.py -v → 33 passed
nox -s chk (ruff format/lint, mypy diff) → ALL CHECKS PASSED
End-to-end MCP-protocol test (real network, real Pydantic v2.11):
The agent sandbox cannot complete TINYFISH OAuth (Cloudflare bot detection on accounts.tinyfish.ai), so I stood up a local MCP server that mirrors the prod TINYFISH session_id schema (format:uuid + pattern) and ran the full Mercury MCP path against it:
list_mcp_tools_from_url('http://127.0.0.1:9876/mcp') — streamablehttp_client connected, listed the tool, generated the bundle via _bundle_mcp_tool → no regex=, no pattern= on UUID field.
- Loaded the bundle module under Pydantic v2.11 — class definitions imported.
- Built
RunWebAutomationAsyncRequest(url=..., goal=..., session_id='5399ba3b-...') — validation passed.
RunWebAutomationAsync().execute(request, {}) — issued the real MCP tools/call over HTTP, got back the response, extracted via _extract_mcp_result.
PASS: tool returned: run_id=run_5399ba3b status=RUNNING echo_from=mock_mcp_server
Negative controls (re-injecting the bugs into the generated module to confirm the fix is what's saving us):
- Inject a
str field with regex= → PydanticUserError: 'regex' is removed. use 'pattern' instead at import (matches prod stack trace).
- Inject
pattern= on the existing session_id: UUID → TypeError: Unable to apply constraint 'pattern' to supplied value 5399ba3b-... for schema of type 'uuid' at validation.
Live execute against prod TINYFISH was attempted but blocked twice — 1811 ConnectedAccountEntityIdRequired (no connected account) and Cloudflare bot detection during browser-based OAuth. Documented as a follow-up smoke test to run from a non-sandbox browser once the partner connection is available.
Triggered by: dhawal@composio.dev | Source: slack
Session: https://zen-api-production-4c98.up.railway.app/dashboard/#/chat/zen-d27a1b180ed0