Description
Third branch in the polling-trigger v2 stack (#10001 → #10002 → this). Adds the orchestration runtime — the BatchPoller Temporal workflow — that drives a batch of polling-trigger runs through the claim → auth → poll → broadcast → commit pipeline pinned by the prior branches' state-machine spec + data model. Every activity ships in its own commit on top of a skeleton commit that introduces the workflow shell and stubbed activity methods.
Coexists with the legacy workflow
The workflow is intentionally named BatchPoller — distinct from the legacy workflow's registered name — so it occupies a separate slot in the Temporal registry and the new and old workers can be registered side-by-side without colliding. Storage is disjoint too: BatchPoller leases and commits against the workerdb polling_trigger_runs table introduced in #10001, while the legacy workflow operates on the registrydb tables it always has. Disjoint storage + disjoint workflow names means both pipelines can run simultaneously without interference — that's the migration knob.
Out of scope for this PR
No worker wiring yet. This PR only adds the workflow definition + activity methods + tests — it does not register BatchPoller (or its activities) on any Temporal worker, and does not add an fx provider / Start* entry point under workers/. As a result, merging this PR alone has zero runtime effect: nothing will dispatch BatchPoller executions and no worker will pick them up. The registration, fx wiring, task-queue plumbing, and timer-shard dispatch hook-up will land in a follow-up PR on top of this stack.
How to review
Please review commit-by-commit, not via the PR-level diff. The skeleton commit lands the workflow shell with five stubbed activity bodies (errors.New("not implemented")); each follow-up commit replaces exactly one stub with its real body and ships that activity's helpers, tests, and any new metric. The PR-level diff folds all of that together and reads like a single ~4k-line drop. Each commit individually compiles (go build ./... + go vet -tags service_isolation ./... verified at every commit) so you can also check out a single commit and poke at it in isolation if it helps.
The stack so far
- #10001 — workerdb
polling_trigger_runs schema + claim/commit store
- #10002 — state-machine spec + Go data shapes + ObjectStore abstraction
- this — workflow + 5 activities
Commits
| Commit | Contents |
|---|
test(thermos): add uber/mock-based mocking for Apollo HTTP client | Pre-existing test infra. Used by the activity tests in this PR. |
feat(thermos): add BatchPoller workflow skeleton | Workflow + dependency bundle + 5 stubbed activity methods, plus S3 blob plumbing, the per-trigger Mercury invocation extracted into lib/polling, and the shared service-isolation activity harness. |
feat(thermos): wire BatchPoller ClaimAndSnapshot activity | Bootstrap activity — leases workerdb rows and writes the in-S3 blob; the retry / CAN-successor path picks up the existing blob and resets failed entries back to claimed. |
feat(thermos): wire BatchPoller FetchAuth activity | Apollo /conn/batch fan-out with toolkit-grouped sub-chunks and Vercel-payload-too-large halve-and-retry; new polling.trigger.fetch_auth_error metric. |
feat(thermos): wire BatchPoller PollMercury activity | Bounded-concurrency Mercury Lambda fan-out (500), per-trigger trigger_version resolution kept co-atomic with the new poll state; new polling.trigger.platform_error metric. |
feat(thermos): wire BatchPoller Broadcast activity | Per-chunk /state POST loop with shrink-on-success durability, 410-aware disable path, transient errors park remaining chunks for the next retry. |
feat(thermos): wire BatchPoller Commit activity | The only PG-writing activity — single batched UPDATE, DB-first then timer-arm ordering, ForceCommit on the final CAN hop so no lease leaks past MaxRetryCounter. |
How did I test this PR
go build ./... and go vet -tags service_isolation ./... pass at every commit in the stack (verified by checking out each commit in a temporary worktree).
- Per-activity in-memory tests under
apps/thermos/workers/polling/pollwf/*_test.go — verdict-projector unit tests + the Apollo wire-layer paths (halve-on-payload-too-large for FetchAuth, the 200/410/transient classifier for Broadcast) against the uber/mock HTTPClient.
- End-to-end activity smoke tests under
apps/thermos/service_isolation/workflows/pollwf/ against a real workerdb + in-memory ObjectStore + in-memory TimerManager via the shared harness. Run via service_isolation/run_service_isolation_tests.sh.
Because no worker registers BatchPoller yet (see "Out of scope" above), there is no production / dev-environment runtime test in this PR. Runtime validation will land with the worker-wiring follow-up.