fix(thermos): raise polling cleaner per-call DB timeout to 120s

@zen-agentchecks n/achecks…zen/polling-cleaner-db-timeout-72dg9l → master1 files · +5 −1review: @lingalarahul7review: @anshugarg15review: @rohanprabhureview: @acsrujanreview: @sarthakupdated 1mo ago

GitHub

▸Description· 1 comment · 4 noise

Description

The hourly PollingTriggerCleanerWorkflow activity CleanOldPollingTriggerRunsActivity has been failing on prod_cluster (service thermos-polling-triggers) with context deadline exceeded every run for at least the past 6 hours — surfaced today by the cron-dd-errors Datadog poll (50+ errors in 6h, caller workers/batched_polling_trigger_cleaner.go:64, message Failed to get count of old polling trigger runs).

Root cause: GetOldPollingTriggerRunsInfo issues two upfront COUNT(*) queries against polling_trigger_runs (the 37.8M-row / 43GB-bloat table from the Feb 2026 incident). The activity's per-call DB timeout was 15s, far too tight for those scans. Recent timeout reconciliations (#9572 and b694e3a435 from 2026-05-11) only adjusted StartToCloseTimeout, leaving the per-call DB budget unchanged.

Fix: bump per-call DB timeout from 15s to 120s, matching webhook_trigger_cleaner.go which already uses 120s for its analogous cleanup COUNTs. StartToCloseTimeout stays at 300s.

How did I test this PR

cd apps/thermos && go build ./workers/... — clean
go vet ./workers/... — clean
go test -count=1 -run TestPollingTriggerCleaner -timeout 90s ./workers/ — ok github.com/composio/hermes/apps/thermos/workers 0.291s

Runtime verification on local thermos isn't feasible — this activity is driven by a production Temporal schedule against the 37M-row prod table; the local registry DB has no equivalent data. The change is a one-line per-call timeout bump aligned with an existing precedent in the sibling cleaner.

Origin: cron-dd-errors / zen-cron-80a85847aa8b Triggered by: Cron: Datadog Error Polling | Source: unknown Session: https://zen-api-production-4c98.up.railway.app/dashboard/#/chat/zen-cron-80a85847aa8b

@zen-agent1mo ago

CI status: all thermos checks green (run-thermos-unit-tests, run-thermos-tests, lint-thermos, run-thermos-service-isolation-tests), all Apollo unit/lint checks green. The only red is env-var-check, which is failing with an OpenAI API 401 (Incorrect API key provided: sk-proj-...4O0A) — infra-side issue with the workflow's OPENAI_API_KEY secret, unrelated to this change (one-line Go timeout constant; no env vars touched). Reran it once and got the same 401. Per CLAUDE.md's pre-existing-failure rule, moving on; the remaining 9 pending checks are Apollo integration tests + CodeQL Go analyze and aren't materially affected by a thermos worker timeout bump.

▸ 4 bot/status comments hidden

@vercel[bot]1mo ago

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Actions	Updated (UTC)
apollo	Ready	Preview, Comment	May 12, 2026 6:10am

Project	Deployment	Actions	Updated (UTC)
debby	Ignored		May 12, 2026 6:10am

@github-actions[bot]1mo ago

🔍 Suggested Reviewers

Based on git blame analysis of 1 file(s):

Contributor	Contribution	Files
lingalarahul7	78%	1
Sarthak Agrawal

loading diff…

Description

Fix: bump per-call DB timeout from 15s to 120s, matching webhook_trigger_cleaner.go which already uses 120s for its analogous cleanup COUNTs. StartToCloseTimeout stays at 300s.

How did I test this PR

cd apps/thermos && go build ./workers/... — clean

go vet ./workers/... — clean

go test -count=1 -run TestPollingTriggerCleaner -timeout 90s ./workers/ — ok github.com/composio/hermes/apps/thermos/workers 0.291s

Project

Deployment

Actions

Updated (UTC)

apollo

Ready

Preview, Comment

May 12, 2026 6:10am

Project

Deployment

Actions

Updated (UTC)

debby

Ignored

May 12, 2026 6:10am

Contributor

Contribution

Files

lingalarahul7

78%

Sarthak Agrawal

Flag	Coverage Δ
e2e-tests	`?`
self-hosted-tests	`5.58% <ø> (ø)`
thermos-service-isolation-tests	`65.62% <ø> (?)`
thermos-unit-tests	`7.14% <0.00%> (ø)`
unit-tests	`?`

fix(thermos): raise polling cleaner per-call DB timeout to 120s

Description

How did I test this PR

🔍 Suggested Reviewers

Description

How did I test this PR

🔍 Suggested Reviewers

💡 Recommendation

Codecov Report

🧪 Test Results Summary (Self-Hosted Tests)