Durable Work in PostgresPart 4
Partitioning at scale
Consistent hash, heartbeats, rebalance windows, and multi-worker simulation with ring ownership.
You add a fourth worker during peak. With hash(key) % N, almost every open row changes owner and workers contend for the same claims.
Consistent hashing maps workers and keys onto a ring so only ~1/N keys move when the pool changes. Heartbeats tell every process who is alive right now.
Who owns order:9182?
Every row carries a partition_key. The producer sets it. Workers only claim rows whose bucket the hash ring assigns to them. Receipt and refund for the same order share a key, so they land on the same machine and can be processed in order.
partition_bucket vs ring owner
partition_bucket is a stable hash of partition_key (e.g. hash('order:9182') % 360) stored at enqueue time. It is an index-friendly filter for claim queries. Ring ownership decides which live worker may claim those buckets right now. They are related but not the same: bucket does not encode “owner at INSERT time.”
Given a partition_key, which worker owns it today, with N workers, and after a scale event? Consistent hashing maps workers and keys onto a ring so only ~1/N keys move when the pool changes.
Imagine a circle numbered 0 to 2³²−1. Every worker and every work item gets hashed to a position on that circle. To find the owner of a key, walk clockwise from the key’s position until you hit the first worker.
Virtual nodes (vnodes)
Real systems place each physical worker at multiple points on the ring (typically 100–200 virtual nodes). Each point is a hash like hash("worker-A#37"), so they land at different positions around the circle. A worker therefore owns many small arcs, not one big wedge. That scatters load evenly and limits how many keys move when the pool changes.
function hash(str) {
let h = 2166136261;
for (let i = 0; i < str.length; i++) {
h ^= str.charCodeAt(i);
h = Math.imul(h, 16777619);
}
return h >>> 0; // unsigned 32-bit
}
function lookupOwner(key, ring) {
// ring: sorted array of { hash, workerId }
const keyHash = hash(key);
for (const node of ring) {
if (node.hash >= keyHash) return node.workerId;
}
return ring[0].workerId; // wrap around
}
Ownership follows live ring membership, not a static config file. At higher vnodes per worker, each worker owns many small ranges interleaved around the ring.
Hash ring demo
Interactive Hash Ring
Click a sample key or enter your own, then Lookup Owner. The bright arc on the ring is the hash range that owns that key. Drag vnodes to see how arc size changes.
Heartbeat TTL and claim lease measure different things: process liveness vs in-flight work duration.
Heartbeat Timeline: Worker Failure & Recovery
Timing Relationships: Heartbeat, TTL, and Lease
Which workers are actually running?
The ring is only as good as its membership list. A config file that says “three workers” is a lie the moment one process dies or autoscale adds a fourth. Heartbeats turn the workers table into a live registry.
In my defaults: 10s heartbeat interval, 30s TTL (3× interval), 90s claim lease for handlers under a few hundred ms p99. Housekeeping every worker loop (~30s) with advisory lock election.
What gets recorded: Each worker periodically writes last_seen_at, its identity, and optionally load metrics to a shared registry (Postgres table).
Failure detection: If now() - last_seen_at > TTL, the worker is considered dead. Its partitions are reassigned to the next live node on the ring.
Heartbeat contract
| Parameter | Typical Value | Why |
|---|---|---|
heartbeat_interval | 5–15 seconds | Balance DB write load vs detection speed |
ttl (grace period) | 3× interval (e.g. 30s) | Absorb network jitter and avoid false positives |
claim_lease | 60–120 seconds | How long a claimed inbox row stays locked if worker dies mid-process |
housekeeping_interval | Every worker loop (~30s) | One worker wins advisory lock and runs lease cleanup + stale-worker detection |
T+0s: Worker-2 sends heartbeat — UPDATE workers SET last_seen_at = now() WHERE id = 'worker-2'
T+5s: Normal operation — Worker-2 claims and processes inbox rows for its partition only.
T+12s: Worker-2 crashes (no more heartbeats) — Other workers continue. Worker-2’s claimed rows sit until lease expires.
T+42s: TTL exceeded, worker marked dead — UPDATE workers SET status = 'dead' WHERE last_seen_at < now() - interval '30 seconds'
T+43s: Lease cleanup releases stale claims — Worker-3 (next on ring) picks up orphaned partition rows on next poll cycle.
Checklist
Before you merge the PR, walk through this list with your team:
- Idempotency key has a unique constraint
- Claims use row locks or atomic state transitions
- Leases expire and are reaped
- Workers heartbeat
- Zombie workers are fenced (
lease_generation) - Poison messages dead-letter
- Completed rows are archived
- Lag, retries, dead letters, and oldest pending age are monitored
Source
Use the article for explanation, then use these files when you want the complete SQL and TypeScript in one place.