Ships Stream 4 (action-slug freshness audit) and Stream 5 (fix-action workflow lifecycle) from the bug-analysis orchestration brief, plus the class-completeness audit + prevention layers for both halves.
This PR addresses two cleanup classes: (a) action-slug drift in the bug analysis pipeline (114 bugs across 48 analyzed slugs whose slugs no longer resolved to live actions in the prod API), and (b) the absence of any visibility into the fix-action workflow lifecycle for the 5 RUN_FIXER bugs the orchestrator earmarked. The bugs trace to specific production failures in our test runs; the evidence links below take a reviewer from PR → cluster → spreadsheet row → per-log HTML payloads.
Source data: int-1/bug_logs/all_logs.json (1,259 ClickHouse log records, ~5MB) and dashboard_backup_v6/action_level_analysis.json (624 unique action slugs, 1,498 total bugs).
Scope: 579 of the 624 slugs were analyzed (the remainder lacked usable data). Of those:
Bugs whose evidence links now resolve correctly after the in-place patches: 114 bugs across 48 analyzed slug rows (out of 1,498 total bugs in the corpus). Sample evidence:
| canonical mapping | tool | sample log | sample TC | per-log HTML | rationale |
|---|---|---|---|---|---|
HEYGEN_ADD_NEW_ASSET → HEYGEN_UPLOAD_ASSET | heygen | log_FzZ6dgIu-fN_ | TC-435 Heygen | /tmp/bug_logs_html/log_FzZ6dgIu-fN_.html | verb ADD→UPLOAD rename; the only Stream 5 SUCCESS depended on this Stream 4 mapping |
MICROSOFT_TEAMS_TEAMS_LIST_PEOPLE → MICROSOFT_TEAMS_LIST_PEOPLE | microsoft_teams | log_fTpCIgalywbF | TC_MICROSOFTTEAMS_014 | /tmp/bug_logs_html/log_fTpCIgalywbF.html | duplicate-segment collapse |
TODOIST_DELETE_SECTION → TODOIST_DELETE_SECTION2 | todoist | log_Jr3QAU3c-xUK | TC_TODOIST_290 | /tmp/bug_logs_html/log_Jr3QAU3c-xUK.html | trailing 2 suffix added (pattern: 4 todoist actions like this) |
BOOQABLE_GET_ORDERS → BOOQABLE_GET_ORDER | booqable | log_07FOUy8UFOOj | — | /tmp/bug_logs_html/log_07FOUy8UFOOj.html | plural→singular rename |
CRAWL_API → BRIGHTDATA_CRAWL_API | brightdata | — | TC_BRIGHTDATA-003 | — | toolkit prefix added (pattern: 4 brightdata slugs missing the prefix) |
BOTPRESS_CREATE_CONVERSATION (mis-attributed) | api_sports → botpress | log_jBX-bbhdaKy1, log_dx2TZra1FW_J | — | /tmp/bug_logs_html/log_jBX-bbhdaKy1.html | source-data quality bug — these BOTPRESS_* slugs are real, current actions under the botpress toolkit, but the analysis attributed them to api_sports |
DOPPLER_SECRETOPS_ENVIRONMENTS_CREATE (unprobeable) | doppler_secretops | log_TOO6Kg42Nk_f | TC_DOPPLER_032 | /tmp/bug_logs_html/log_TOO6Kg42Nk_f.html | toolkit exists in live catalog (29 tools) but lacks a connected account in our envs, so the validator short-circuits before action validation. Plausible candidate: DOPPLER_ENVIRONMENTS_CREATE under the newer doppler toolkit (62 tools, created 2026-01-10) |
Spreadsheet rows: the test cases above are in the integrator bug-analysis sheet (filter by TC ID).
Canonical artifacts in this PR:
bug_logs/stale_action_slugs.json — the canonical mapping (46 auto-accepted + 4 mis-attributed + 9 needs-human + 1 unprobeable, with full disposition rationale)bug_logs/stale_action_slugs_report.md — human-readable reportbug_logs/live_action_slugs_by_toolkit.json — raw probe cache (1024 toolkits across the live catalog; 85 with bug data)bug_logs/slug_patch_report.json — substitution audit trail with backup pathsSource data: the 5 RUN_FIXER bugs from int-1/bug_logs/dashboard_backup_v6/fixer_dive.html (10 bugs total in the dive; 5 had RUN_FIXER verdicts).
| bug | tool | action | TC | log id | per-log HTML | workflow_id | terminal state | mercury PR | reviewer |
|---|---|---|---|---|---|---|---|---|---|
| #2 | fireflies | FIREFLIES_FETCH_AI_APP_OUTPUTS2 | TC_FF_001 | log_2MZj1WXiJ2wf | /tmp/bug_logs_html/log_2MZj1WXiJ2wf.html | 2ukwrpfk | FIX_REJECTED | — | reject (instruction described non-existent bug; useful negative signal) |
| #3 | zendesk | ZENDESK_CREATE_ZENDESK_USER | TC_ZENDESK_064 | log_YwezMRkKTvcc | /tmp/bug_logs_html/log_YwezMRkKTvcc.html | dtn3qmpb | FIX_REJECTED | — | reject (no code changes were made; useful negative signal about fix-brief generation quality) |
| #4 | zoho_books | ZOHO_BOOKS_CREATE_USER | TC_ZOHOB_038 | log_l2u1Yr76yZTH | /tmp/bug_logs_html/log_l2u1Yr76yZTH.html | pcgrymzh | FAILED_AUTH_ISSUE | — | n/a (env issue, not instruction defect) |
| #6 | googlesheets | GOOGLESHEETS_ADD_SHEET | TC_38 | log_dRTD64JwvFSz | /tmp/bug_logs_html/log_dRTD64JwvFSz.html | 3wt9r8l7 | STALLED_INDEFINITELY (8h+ silent at Reviewer step; will hit 36h TIMED_OUT budget by 2026-04-11T03:47Z) | — | n/a |
| #9 | heygen | HEYGEN_UPLOAD_ASSET (rewritten from HEYGEN_ADD_NEW_ASSET via Stream 4) | TC-435 Heygen | log_FzZ6dgIu-fN_ | /tmp/bug_logs_html/log_FzZ6dgIu-fN_.html | 83mxzcdu | SUCCESS | #20702 | high (the "questionable" one — test sent a .txt as a video — actually fixed by the agent inferring the right mimetype) |
Cluster: these 5 bugs are the RUN_FIXER verdict cluster from fixer_dive.html (the 10-bug deep dive of FIXER_AGENT-classified bugs from /tmp/ci_classifications_final.json). Cluster size: 5 RUN_FIXER + 5 reclassified (DONT_RUN_FIXER: 2 NEEDS_HUMAN, 2 NOT_A_BUG, 1 TEST_AND_FIX). All 5 RUN_FIXER triggered, 0 missing, 0 extra.
Lifecycle artifacts in this PR:
bug_logs/fix_action_runs.json — schema 1.1 record per workflow with workflow_id, original_action, slug_rewritten, triggered_at, terminal_state_reached_at, mercury_branch_name, audit_state, full evidence fieldsbug_logs/stream5_phase2_payloads/bug{3,4,6,9}_*.json — the exact instruction payloads sent to each workflow (audit trail)Action slugs drift over time (e.g. FIREFLIES_FETCH_AI_APP_OUTPUTS → ..._OUTPUTS2, HEYGEN_ADD_NEW_ASSET → HEYGEN_UPLOAD_ASSET) and our analysis data accumulates stale references that break dashboard links and confuse remediation workflows. Stream 4 produces a canonical slug mapping (46 verified renames covering 114 stale-slug bug citations), patches the affected analysis files in-place, and ships a weekly cron that detects new drift the moment it happens. Stream 5 separately probes the fix-action workflow lifecycle by triggering the 5 RUN_FIXER bugs end-to-end: 1 SUCCESS (heygen → mercury #20702), 2 FIX_REJECTED (the fixer correctly refused bad proposals — useful negative signal about fix-brief generation quality), 1 AUTH issue, 1 STALLED_INDEFINITELY. Together with the new check_stalled_runs.py lifecycle monitor, we now have both retroactive remediation and forward-looking prevention for both classes.
Walks every unique action slug from the 1,403-bug analysis and checks it against the live prod API.
POST /workflows/fix-action/run (using a deliberately bogus action_name + placeholder instruction). The Pydantic validator returns 422 with Available actions: [...]. A 200/201/202 would mean a real workflow was triggered — the audit aborts on first 2xx, never silently retried.5-rule structural matcher → noun-stemming + verb-compatible fuzzy fallback:
| rule | example |
|---|---|
| prefix added | CRAWL_API → BRIGHTDATA_CRAWL_API |
| version suffix added | TODOIST_DELETE_SECTION → TODOIST_DELETE_SECTION2 |
| version suffix stripped | TODOIST_CREATE_COMMENT → TODOIST_CREATE_COMMENT_V1 |
| duplicate-segment collapsed | MICROSOFT_TEAMS_TEAMS_LIST_PEOPLE → MICROSOFT_TEAMS_LIST_PEOPLE |
| article removed | WRIKE_CREATE_A_FOLDER → WRIKE_CREATE_FOLDER |
| verb+noun preserving fuzzy | BOOQABLE_GET_PRODUCTS → BOOQABLE_GET_PRODUCT |
| human verified | WRIKE_GET_GROUP_BY_ID → WRIKE_QUERY_SPECIFIC_GROUP (and 12 more) |
Verb-compatible pairs allow safe substitutions like GET ↔ FETCH, UPDATE ↔ PATCH ↔ MODIFY, ADD ↔ UPLOAD, but reject GET ↔ DELETE, CREATE ↔ LIST, etc.
Patches applied (in-place against int-1's worktree, originals archived under int-4/bug_logs/backups/):
dashboard_backup_v6/action_level_analysis.json — 83 substitutionsdashboard_backup_v6/log_id_only_analysis.json — 39 substitutionsdashboard_backup_v6/remarks_bugs_analysis.json — 132 substitutionsdashboard_backup_v6/bug_analysis.html — 96 substitutionsdashboard_backup_v6/fixer_dive.html — 7 substitutions + run-status injection| check | result |
|---|---|
| Doppler "orphan" diagnosis | Reclassified as unprobeable_no_connected_account. The toolkit exists in the live catalog (29 tools, created 2025-10-17) but has no connected account in our prod/stg envs, so the validator short-circuits with HTTP 404 before reaching action validation. There is also a newer doppler toolkit (62 tools) with a DOPPLER_ENVIRONMENTS_CREATE candidate. Both are recorded. |
| Full live-catalog comparison | Fetched the entire prod catalog (1,024 toolkits via dashboard/postman-dashboard/list-toolkits, paginated). All 85 toolkits referenced by the bug analysis are present in the live catalog (0 missing). The remaining 939 live toolkits aren't in the bug data — out of scope for this audit, in scope for the weekly drift check below. |
| 26 needs-review entries dispositioned | All 26 entries now have explicit verdicts: 13 human-verified (added to auto-accepted, confidence 1.0), 4 mis-attributed in source (BOTPRESS_* slugs incorrectly attributed to api_sports — they're real, current actions under the botpress toolkit), 9 needs-human with specific category (action_removed, action_split, renamed_uncertain). |
bug_logs/check_slug_drift.py re-probes every toolkit in the committed baseline and surfaces three drift categories (new_actions, removed_actions, unprobeable). Reuses the same 2xx-abort safety guard as the audit.
.github/workflows/slug-drift-check.yml wires this into a weekly cron (Sunday 06:00 UTC). On drift, the workflow opens a bug-triage issue with the diff. On a safety abort (exit 2 — validator returned 2xx, baseline missing, etc.), it opens a separate priority-high issue. The final step fails on any non-zero exit code so neither drift nor a safety abort can pass silently.
Workflow 2ukwrpfk already reached terminal state before handoff:
FIREFLIES_FETCH_AI_APP_OUTPUTS2 — FIX_REJECTED (reviewer_score=reject)ai_filters nor sentences — the instruction described a non-existent bug. Likely meant get_transcript_by_id instead.Used the Stream 4 mapping to substitute HEYGEN_ADD_NEW_ASSET → HEYGEN_UPLOAD_ASSET. All instruction payloads follow the fireflies template (Core issue / Code pointers / Evidence / Our understanding / Open questions).
| bug | tool | action | workflow | state | PR |
|---|---|---|---|---|---|
| #2 | fireflies | FIREFLIES_FETCH_AI_APP_OUTPUTS2 | 2ukwrpfk | FIX_REJECTED | — |
| #3 | zendesk | ZENDESK_CREATE_ZENDESK_USER | dtn3qmpb | FIX_REJECTED | — |
| #4 | zoho_books | ZOHO_BOOKS_CREATE_USER | pcgrymzh | FAILED_AUTH_ISSUE | — |
| #6 | googlesheets | GOOGLESHEETS_ADD_SHEET | 3wt9r8l7 | STALLED_INDEFINITELY (audit-side; execution_state still STARTED) | — |
| #9 | heygen | HEYGEN_UPLOAD_ASSET | 83mxzcdu | SUCCESS | #20702 (reviewer score: high) |
The "questionable" heygen one (test sent a .txt as a video) actually succeeded with a high reviewer score, vindicating the slug rewrite from Stream 4.
| check | result |
|---|---|
| RUN_FIXER coverage | All 5 RUN_FIXER bugs in fixer_dive.html (idx {2, 3, 4, 6, 9}) were triggered. 0 missing, 0 extra. |
| Schema completeness | fix_action_runs.json normalised to schema 1.1: every entry now has workflow_id (as a field), original_action, slug_rewritten boolean, triggered_at, terminal_state_reached_at, mercury_branch_name (parsed from run_log), audit_state, tc_id, log_id, file, task_arn. Fireflies metadata backfilled from the brief. |
| googlesheets re-poll | Re-polled 3wt9r8l7 once. Still STARTED with updated_at unchanged at 2026-04-09T15:56:33Z (8h+ silent at the Reviewer step). Marked audit_state: STALLED_INDEFINITELY with stalled_since timestamp. The actual execution_state field still reads STARTED because we never received a state-change event from the DB; the new audit_state is the int-4 audit's authoritative classification. The workflow will hit the 36-hour TIMED_OUT budget by 2026-04-11T03:47Z if it doesn't recover. |
bug_logs/check_stalled_runs.py walks fix_action_runs.json and flags any non-terminal workflow whose updated_at hasn't moved in --threshold-hours (default 6). Already-classified STALLED_INDEFINITELY entries are acknowledged separately so a stuck run doesn't keep alerting forever. Exits 1 on any stall — intended to be wired into either the existing poller loop or a small standalone alert script.
bug_logs/| script | purpose |
|---|---|
audit_slugs.py | Stream 4 probe sweep (raw fuzzy matcher) |
refine_stale_mapping.py | Stream 4 second-pass smarter matcher |
finalize_stale_mapping.py | Stream 4 third-pass: human-verified verdicts for the 26 needs-review entries |
apply_slug_patches.py | In-place dashboard/JSON patcher with backups |
check_slug_drift.py | Weekly drift detector (prevention layer) |
trigger_stream5_phase2.py | Builds + POSTs the 4 fixer payloads |
poll_fix_action_runs.py | Polls all 5 workflows to terminal state |
check_stalled_runs.py | Stall detector (prevention layer) |
update_fixer_dive_status.py | Injects "Run Status" cards into fixer_dive.html |
doppler_secretops (renamed to doppler) and the BOTPRESS-in-api_sports rows are examples — the slug audit only handles per-toolkit action lists, not toolkit-level renames. Recorded as a discovery in the int-4 inbox for either a Stream 8 expansion or a separate follow-up task.action_removed, action_split, renamed_uncertain) need product-side decisions, not pattern matching.mcp__rube__* tools — stdlib urllib only.fix-action/run (200/201/202 all treated as fatal abort).bug_logs/backups/ (not committed; 4.6 MB) before any modification.int-1 bug_logs/orchestrator_log.md.apply_slug_patches.py is a no-opworkflow_idscheck_stalled_runs.py correctly detects the googlesheets stall; check_slug_drift.py parses + executesmake chk is a no-op for this PR (no app_tester/ files touched)🤖 Generated with Claude Code