feat(connected-accounts): data split — bytea volatile column + reader chokepoint + planner

@2nishantgchecks n/achecks…feat/connected-accounts-volatile-split-v2 → master40 files · +2504 −425review: @rohanprabhureview: @sohamganatrareview: @acsrujanreview: @kaavee315review: @abir-taheerreview: @anshugarg15updated 1w ago

GitHub

▸Description

Description

connected_accounts.dataTypedEncrypted stores the entire ConnectionData blob, so every OAuth refresh — which only rotates a couple of token fields — rewrites the whole encrypted payload. The blob is large enough to be TOAST-resident, so each refresh also rewrites the TOAST chain and adds churn that shows up in autovacuum load and dead tuples.

This PR lays the foundation for splitting that payload across two columns:

dataTypedEncrypted — repurposed as the low-churn static half. Stays text / hex (legacy encoding; we can't change the on-disk format without a backfill).
volatileDataTypedEncrypted — new column for the refresh-rotating fields. bytea from day one (raw iv || ct || tag), so storage isn't paying the 2x hex bloat on every refresh write.

A later PR will populate the classifier and switch the OAuth refresh path over; once that lands, refreshes will only rewrite the volatile column and leave the static TOAST chain alone.

This PR is behavior-preserving. No row layout changes, no writer rewires. We're landing the column, the read contract, and the write contract so the follow-up PR can flip the switch without touching every call site at once.

Semantic contract

The volatile column doubles as the format flag: an empty Buffer means "legacy row, decrypt the static column alone"; non-empty means "split row, decrypt both halves and shallow-merge val with volatile-wins precedence". Empty is safe as a marker because an AES-GCM envelope is at minimum 32 bytes (16 B IV + 16 B tag), so it can never be zero-length.

Every row at deploy time is a legacy row. The default value of the new column is '\x' (empty bytea), so legacy rows decode identically to today.

Forward/backward compatibility. The split is decided at write time; reads always merge both halves into one val. So any code version can read any other version's writes, and the volatile-field list can change in later PRs without coordination — a given key may sit on either side of the split in any particular row, and the merge produces the same effective val either way.

Don't read dataTypedEncrypted alone. Application code is steered through the single chokepoint — the decryption path lives there, so callers can't easily reach the static column on its own.

The reader also stamps every returned ConnectionData with a symbol-keyed set of the keys actually decrypted from volatileCT (∅ on the legacy path). The planner reads that marker off prevData and forces rewrite_both whenever the set differs from the current VOLATILE_VAL_KEYS — closes the legacy→split, classifier-grew, and classifier-shrank (data-loss) failure modes that would trigger the moment the classifier becomes non-empty.

Marker = keys present in volatileCT, NOT the classifier at write time. The write-time classifier isn't persisted anywhere; only the actual on-disk shape is. Trade-off: a row that legitimately lacks an optional classifier key always force-rewrites both columns.

The symbol key keeps the marker invisible to JSON.stringify and Object.keys, so it cannot leak into the encrypted payload. Missing marker on prevData (any bypass-the-reader path) defaults to ∅ and forces rewrite_both.

What landed

Migration. New volatile_data_typed_encrypted BYTEA NOT NULL DEFAULT '\x' on connected_accounts. Idempotent ALTER, no backfill. The legacy non-volatile column stays text (hex).
Bytea storage encoding. securityProvider.encryptString / decryptToString gained an optional encoding: 'hex' | 'bytes' overload (default 'hex', type-system-enforced: Buffer through the hex path or string through the bytes path is a compile error). The volatile column writes via 'bytes'; the legacy non-volatile column stays on 'hex' since no on-disk format change is possible there without a backfill.
Single read chokepoint. Every decrypt of connection data goes through one function that takes both encrypted columns and returns one ConnectionData. All reader call sites now select and pass both columns; downstream code is untouched. The status-update helper re-encrypts the static half with the updated status and threads the volatile half through unchanged.
Write-planner shape. A single planner decides, given previous and new ConnectionData, whether the static column needs rewriting and what to persist. In this PR the volatile classifier is empty, so the planner always rewrites the full static payload — preserving today's behavior. The "skip static rewrite" branch is wired but guarded against running until the classifier is populated, so an empty classifier can't accidentally leave stale static data on a legacy row.

No writer is migrated to the planner's volatile_only branch yet — that's the next PR.

How did I test this PR

Reader tests cover three shapes: legacy single-column rows, double-encrypted rows from new writes, and synthetic split rows where the volatile half wins the merge.
Planner tests cover the PR configuration (empty classifier → always rewrite static) plus the future split-write decisions through a volatileKeys DI seam.
Reader-marker tests: ∅ for legacy rows, decrypted-volatile keys for split rows; invisible to JSON.stringify and Object.keys; no leak through encrypt→decrypt round-trip.
Planner-marker tests: forces rewrite_both on grew / shrank / marker-absent / legacy-row-readback; emits volatile_only only when classifier matches AND non-volatile halves are byte-equal.
DB column-wiring integration tests (writePlanner_dbWiring.test.ts) against a freshly-migrated DB: volatile_only leaves dataTypedEncrypted byte-identical; rewrite_both flips both columns; reader round-trips the merged payload. Plus two bytea-pinning tests — legacy-marker (empty Buffer) round-trip with reader fallback to non-volatile-alone, and a raw-byte round-trip that catches future text-encoding regressions in the Prisma write/read path (AES envelopes contain \x00, \xff, and invalid UTF-8 — a string-cast would silently corrupt them).
securityProvider.test.ts covers both encodings, cross-encoding rejections, and oversized/empty-buffer guards.
Existing call-site tests updated to mock the new chokepoint.

loading diff…