Simplification pass + prod no-op tests
Per review feedback that the surface area was too big for an observe-only POC, ~1,100 lines removed (+818 / −1,786) without losing the integrity properties Codex/Bugbot validated. Commit aefd2bafc.
What got cut
- Dropped
UserLearningsStoreError and its raise/swallow contract. The master flag means prod is a no-op, so distinguishing "deliberate skip" from "transient Redis outage" was not worth the extra branching for the POC. All Redis errors now log + swallow at the store layer; the extractor doesn't try to translate them. Stale test_redis_store_errors.py removed.
- Collapsed the ~20 rejection counters in the applier into broader buckets (
rejected_create / rejected_increment / rejected_evidence_tool_mismatch / etc.). Tests now assert on outcomes (created? incremented? retired?) rather than specific counter names, so they survive future bucket churn.
inspect_user_learnings.py slimmed from 373 → 150 lines: dropped JSON output mode and most CLI flags. POC analytics doesn't need that surface yet.
- Dropped fallback paths in the multi-execute parsing that weren't carrying their own weight (duplicate slug/index fan-out loops, the
_extract_per_tool_status_code candidate cascade).
- Pipeline hook simplified back to a single helper that lazy-imports the user_learnings package, runs the extractor, and swallows exceptions. No bool return, no error-message round-trip.
Bugbot Low addressed (0bc438cf follow-up)
_extract_per_tool_status_code previously accepted any int as a status code, including "code": 0 ("no error code"). That made _classify treat it as a definitive non-transient HTTP status, bypassing the body-substring fallback. Now only ints in 100–599 count as HTTP status codes.
Truncation comparison (response to "are we sending session data to LLM, what truncation?")
| Field | This PR | Existing pipeline (workflow + error analysis) |
|---|
Per-call params | 5,000 chars | SESSION_LOG_REQUEST=5000 / WORKFLOW_LOG_REQUEST=8000 |
Per-call error_body | 5,000 chars | SESSION_LOG_ERROR=5000 / WORKFLOW_LOG_ERROR=12000 |
| Strings inside JSON | max_str_len=400 / risky=200 via sanitize_for_llm | Workflow uses 2000; session_log uses 200 |
prior_observations_for_review | 8 per tool (cap) | n/a |
Same mask_pii + wrap_untrusted + sanitize_for_llm chain as the rest of rube_learning.
Prod no-op tests added (response to "in prod path is fully no-op right?")
New test_pipeline_integration.py, 6 cases, each verifying that with USER_LEARNINGS_ENABLED unset:
- No LLM call is made
- No Redis traffic is generated
- The extractor module's expensive imports (Presidio, OpenAI, etc.) aren't paid
- Force mode (
--user-learnings-only) bypasses the env gate
- Exceptions in the extractor stay caught at the boundary
test_stage_no_op_for_falsy_env_flag_values parametrizes over ["false", "0", "no", "", "off"]; test_stage_runs_extractor_when_env_flag_truthy parametrizes over ["true", "1", "yes"].
CI
| Check | Status |
|---|
Lint - Integrator | ✅ success |
Test - Learning Pipeline | ✅ success |
43 unit tests all pass.