The simple-tracing integration tests have been failing in CI for ~1 week with:
Failed: Workflow did not complete within 120s
The 120s timeout is misleading. Looking at the runner-side logs in the failing run (example), the runner container is exiting with code 1 in ~2s inside infra/setup_mercury.py:
WORKFLOW_CONFIG provided - using environment variable
No INTEGRATOR_BRANCH specified, skipping integrator setup
Found pre-built Mercury at /tmp/mercury
Already on branch master, pulling latest changes...
Error: Failed to pull latest changes from master
The test then waits the full 120s for a DB state transition that will never come — cortex.workflow_dispatcher (which writes terminal state to cortex_execution) is never reached.
The captured stderr from the failing git pull was being silently dropped by the wrapper try/except, so we can't tell which subprocess call failed or why. This PR doesn't fix the underlying bug — it adds enough diagnostics to root-cause it from the next CI run.
test_agent_traces.py, 9× test_docker_entrypoint.py, test_health_endpoint) → passtest_api_creates_execution_record, test_poller_service_starts_without_import_errors) → passtest_full_workflow_execution_and_state_transitions, test_workflow_with_custom_branch, test_workflow_updates_state_transitions) → failSame skip_agent_execution=True config; the difference is whether the test depends on the runner getting past setup_mercury.py.
infra/setup_mercury.py — diagnostic-only:
stdout/stderr/exit-code on every git pull (was silently captured + discarded).[setup_mercury] step: <name> breadcrumb before each subprocess call so we know which one fails even if its output is lost._diagnostic_dump() helper prints runtime context — GITHUB_ACCESS_TOKEN presence + length (never the value), git version, mercury HEAD, submodule config, protos remote URL — sanitised. Called before pull and after failure.main() so we see the underlying CalledProcessError chain, not just the wrapper RuntimeError message.tests/integration/test_simple_tracing_workflow.py — small behavioral change:
WORKFLOW_TIMEOUT 120s → 600s so 6 reruns of a fast-failing runner don't burn 12 minutes of CI wall time, and so legitimately slow runs (cold caches, custom-branch checkout of 47K files) have headroom.This is diagnostic-only. The actual root-cause fix (whatever's wrong with git pull/submodule setup at runtime) is intentionally deferred to a follow-up PR once the next CI run shows us:
_configure_submodule_auth, _set_submodule_remote, git pull, _update_submodules, …)GITHUB_ACCESS_TOKEN is present at runtime and well-formed🤖 Generated with Claude Code