PR Custody

[LOW PRIORITY] docs: Manus auth feasibility analysis - 82% of auth failures automatable

@AgentWrapperchecks n/achecks…docs/manus-auth-analysis-report → next8 files · +1255 −0updated 4mo ago

▸Description

Why

PR #942 concluded auth failures are "only 1% of production failures" and recommended LOW PRIORITY for the Manus auth management service. However, that analysis counted auth failures within the FAILED state rather than the separate FAILED_AUTH_ISSUE state.

This analysis corrects that finding:

Actual auth failure rate: 12.3% of all workflow executions (not 1%)
Auth failures exceed regular code failures (12.3% vs 8.8%)
82% of auth failures are automatable with the proposed Manus service

What

Feasibility Analysis (`docs/manus-auth-feasibility-analysis.md`)

Analyzed the latest 100 production auth failures individually:

Category	Count	Manus Solvable?
Invalid/Revoked Credentials	31	Yes
Code Fix Unverifiable	28	Yes
Insufficient Permissions/Scope	13	Yes
Other/Unclear	11	No
Token/Key Expired	8	Yes
Paid Plan Required	2	No
IP Whitelist Restriction	2	Yes
Code Bug (misclassified)	2	No
Rate Limiting (misclassified)	2	No
Master/Enterprise Key Required	1	No

Result: 82/100 (82%) solvable by Manus service

Auth Issues Runbook (`infra/runbooks/auth-issues/`)

Added agent-usable scripts for investigating auth failures:

Script	Purpose
`auth-stats.sh`	Overall stats: failure rates, workflow types, tag distribution
`toolkit-breakdown.sh`	Per-toolkit failure counts with assessments
`error-details.py`	Extract detailed error messages for specific toolkits
`list-auth-failures.sh`	List individual failures with filters

AI Agent Guide Update

Added Runbooks section to infra/AI_AGENT_GUIDE.md referencing the new auth-issues runbook.

Key Findings

Token refresh (39 failures): Simplest win - Manus logs in, generates new key
Code fixes unverifiable (28 failures): Service accounts with proper scopes would enable testing
Permission/scope gaps (13 failures): Manus reconfigures OAuth app permissions
ROI: ~227 failures/week recoverable, reducing total failure rate by ~10%

Revised Recommendation

MEDIUM-HIGH priority (up from LOW in PR #942). Start with token refresh automation (highest volume, simplest implementation).

Test plan

All runbook scripts tested against prod DB
auth-stats.sh produces correct stats
toolkit-breakdown.sh shows per-toolkit breakdown
error-details.py extracts detailed errors
list-auth-failures.sh filters work correctly

🤖 Generated with Claude Code

loading diff…

Why

What

Feasibility Analysis (docs/manus-auth-feasibility-analysis.md)

Auth Issues Runbook (infra/runbooks/auth-issues/)

AI Agent Guide Update

Key Findings

Revised Recommendation

Test plan

Why

What

Feasibility Analysis (docs/manus-auth-feasibility-analysis.md)

Auth Issues Runbook (infra/runbooks/auth-issues/)

AI Agent Guide Update

Key Findings

Revised Recommendation

Test plan

Feasibility Analysis (`docs/manus-auth-feasibility-analysis.md`)

Auth Issues Runbook (`infra/runbooks/auth-issues/`)

Feasibility Analysis (`docs/manus-auth-feasibility-analysis.md`)

Auth Issues Runbook (`infra/runbooks/auth-issues/`)