Why
Production create-app workflows (e.g., 8jacktro/microsoft_teams, 1fpk8sx7/outlook) are failing with "Control request timeout: initialize" despite 91-95% of actions passing test-and-fix. No PRs are created because the Tagger/ScopeMapper exception propagates and fails the entire workflow.
The root cause is that ThreadPoolExecutor(max_workers=2) runs Tagger and ScopeMapper in parallel, and ScopeMapper spawns up to 40 parallel Claude agents for scope discovery. The concurrent Claude SDK client initializations cause timeout errors, and the unprotected .result() calls propagate the exception.
What
- Wrap
tagger_future.result() and scope_future.result() in try/except blocks
- On failure, log the error, record it in
run_summary, and default to empty tags/scopes
- Add imports for
TagClassifierResponse and MapActionScopesResponse to construct fallback objects
- PR creation proceeds normally since empty metadata is a no-op
How to Test
make fmt && make chk — ruff passes (pyrefly failure is pre-existing)
uv run pytest cortex/tests/test_workflows/ -v — 2/2 pass
- The fix is safe because:
- Empty
tags_by_action / scopes_by_action → empty metadata dict → batch_update_action_metadata() is a no-op
success = len(passed_actions) > 0 is unaffected
- PR creation proceeds normally
Pre-Review Checklist
Notes
- Tagger and ScopeMapper are non-critical metadata enrichment (tags + OAuth scopes). Actions are already built, tested, and passing at this point.
- A follow-up could add retry logic or stagger the parallel agents, but graceful degradation is the immediate priority.
🤖 Generated with Claude Code