Durable Work in PostgresPart 7
Cross-service handoffs
Transactional outbox, transport choices, webhooks, contracts, and when brokers fit. Work that leaves your database commit.
Inside one database, Postgres can make the business write and the work row atomic. Across services, that guarantee ends.
Use a transactional outbox at the producer, transport between services, and idempotent consumption at the receiver. The handoff is explicit: durable intent on your side, at-least-once delivery in the middle, dedupe on theirs.
When the commit boundary ends
Inside one database, the story is clean: one commit can update an order and enqueue the follow-up work. The moment work leaves that database or deployable unit, you no longer have one transaction. You have a handoff.
From that point on, you need transport: a way for intent to cross from your Postgres into someone else’s process. A boundary is anywhere ownership changes: a different database, a different deploy unit, a broker, or an external API.
Use an outbox relay on the producer and an inbox on the receiver. The transport in between only delivers bytes; it does not replace leases or claim loops on either side.
What each part can guarantee
Your database can be strongly consistent inside its boundary. The broker and search index cannot. “Paid” and “visible in search” are different guarantees with different systems behind them.
Transport and downstream consumers are always eventual. Only the producer database gets ACID on write.
| Layer | What holds | Consistency | Postgres tools |
|---|---|---|---|
| Inside your DB | Business row + outbox/inbox row commit together. Claims are exclusive | Strong (ACID, row locks) | Transactional outbox, SKIP LOCKED, leases |
| Your worker layer | Each row processed at least once. Optional per-key FIFO | At-least-once + idempotent effects | Hash ring, heartbeats, ordering guard, fencing |
| Transport | Message or HTTP delivery may retry, reorder (across keys), or lag | Eventual, at-least-once typical | Broker, webhook, outbox relay |
| Downstream consumer | Their read model, inbox, or side effect catches up later | Eventual: bounded lag is the SLO | Their idempotency + your contract |
Cross-boundary architecture
Guarantees stop at each service boundary. Make the handoff explicit: durable intent on your side, idempotent consumption on theirs, and transport in between that may retry.
Three zones: producer DB (ACID), transport (at-least-once), consumer (catches up later).
Each box is a separate consistency boundary. Handoffs need explicit contracts and idempotent consumers, not a shared transaction across services.
Transactional outbox for order:9182
Make the intent durable. When payment succeeds, you need search to know eventually. You cannot call search inside the transaction. You can insert an outbox row that says “publish OrderPaid” in the same commit as UPDATE orders.
The Transactional Outbox pattern is the bridge: write intent in the same transaction as the domain mutation, then let a relay worker publish to your transport. In this implementation, the relay is the same competing-consumers loop from earlier chapters, applied to outbound events. You get:
- Atomic intent — Order marked paid +
OrderPaidoutbox row: one commit - Relay — Worker claims outbox row, publishes to transport
- At-least-once publish — Crash after publish, before complete → retry (idempotent publish key)
- Consumer dedupes — Subscriber uses
event_id/ idempotency key
BEGIN;
UPDATE orders SET status = 'paid' WHERE id = 9182; -- business fact the API must not lie about
INSERT INTO outbox (partition_key, event_type, payload, idempotency_key)
VALUES (
'order:9182',
'OrderPaid',
'{"order_id":9182,"paid_at":"2026-07-01T12:00:00Z"}'::jsonb,
'evt-order-9182-paid-v1'
); -- durable publish intent, not the wire publish
COMMIT; -- both rows exist or neither. No ghost OrderPaid events
How search learns the order paid
Payment for order:9182 lives in your database. Search does not. Three ways teams bridge that:
- Sync HTTP in the request: a partner outage becomes your outage, and retries duplicate work without idempotency. Avoid this at the service boundary.
- Transactional outbox + relay: publish intent in the same transaction as
UPDATE orders. Delivery is at-least-once; the consumer dedupes; lag is measurable. Use when search or analytics must update after payment. - CDC from the WAL: no application change on the write path, but streams physical row changes rather than domain events and breaks easily on schema refactors. Use for legacy databases you cannot modify.
Choosing a transport
After the outbox row is durable, choose transport based on the consumer, the SLO, and who operates the middleware.
| Kind | Good when | Optimizes for | Breaks when |
|---|---|---|---|
| Message broker | Service-to-service delivery, routing keys, ops-owned middleware | Fast handoff and per-queue routing | You need long retention replay for many independent consumers |
| Event log / stream | Many subscribers, replay, high-volume shared log | Fan-out and retention | Point-to-point only with one consumer team |
| Managed queue | Cloud-native, minimal ops, push to one consumer group | Managed durability and scaling | Strict per-key ordering without paying for FIFO SKUs |
| HTTP / webhooks | One partner or SaaS exposes a URL | Simple integration surface | Partner outage blocks your relay unless you queue and retry |
| Poll / RPC pull | No push. Batch export or client-pull APIs | Receiver controls cadence | Low latency SLOs without aggressive polling |
| CDC | Downstream needs row changes, not domain events | Decoupling from app publish code | Schema refactors and domain event contracts |
Inbound: webhooks and broker consumers become inbox rows
When Stripe or a partner POSTs to you, acknowledge fast, persist durably, and process asynchronously. Their retries are your at-least-once delivery.
Return 200 after the row is inserted. Processing runs in the claim loop, not in the HTTP handler.
// POST /webhooks/stripe
await db.query(
`
INSERT INTO inbox (partition_key, payload, idempotency_key)
VALUES ($1, $2, $3)
ON CONFLICT (idempotency_key) WHERE idempotency_key IS NOT NULL DO NOTHING -- partner retry hits here, not in handler
`,
[event.accountId, body, event.id],
);
return res.status(200).send(); // ack after durable row. worker claims async
Verify signatures at the boundary. Return 2xx only after the row is durable. Their retries hit ON CONFLICT DO NOTHING on enqueue.
Contracts across boundaries
Shared transactions are gone. Shared vocabulary is not. When two teams own two databases, coordination becomes documentation plus stable ids. The table below is the minimum contract for OrderPaid on order:9182 to land safely in search.
| Contract piece | Producer owes | Consumer owes |
|---|---|---|
event_id / idempotency key | Stable, unique per logical event | Dedupe store or unique constraint |
partition_key / message key | Same key for all ordered events in a stream | Single consumer per key (or ordering guard) |
| Schema version | Backward-compatible changes. Bump version on breaks | Reject or dead-letter unknown versions |
| Delivery semantics | Document at-least-once. No silent drops | Idempotent handlers. Expose lag metrics |
| Replay | Retention policy or archive for re-publish | Safe to re-process historical events |
Sagas and long flows across services
Some workflows outlive one request. Charge card, reserve inventory, ship, notify search. That is many local commits linked by events, not one distributed lock. Sagas are durable rows chained over time. Each step must tolerate retry.
Do
Idempotent steps, explicit timeouts, compensations as first-class outbox events, dashboards on end-to-end lag (order paid → search indexed).
Avoid
Avoid synchronous chains across five HTTP calls in one request. Do not assume partner callbacks arrive exactly once, and do not block the user response on downstream indexing.
Explaining lag to product and leadership
Payment in your DB can be immediate while search and analytics catch up on a separate timeline. Use this distinction when someone asks why search lags payment by thirty seconds.
- User-facing write path. Strong in your DB: “Payment recorded” is true when API returns success.
- Derived views. Eventual. Search, analytics, recommendations catch up in seconds (or minutes: define SLO).
- External systems. Eventual with retries. A webhook may arrive twice. Partner ledger may lag. Contract defines max delay. Same mechanics whether the other party is another team or Stripe.
- Failure mode: At-least-once everywhere outside ACID. Duplicates are normal. Idempotency is not optional.
Where this implementation fits
Match the tool to the constraint. Postgres coordinates work inside a service. Brokers and queues help at the boundary when fan-out, retention, or cross-team delivery require them.
Strong fit
- Durable work with claims, leases, and retries
- Postgres already in the stack (or inbox/outbox tables are acceptable)
- Per-entity ordering or partition affinity matters
- Producers and workers share a database. Or outbox/inbox at each service boundary
- Backlog depth and stuck rows should be visible in SQL
Weaker fit: consider other tools
- Shared event log with long retention and many independent replay consumers → Kafka / Pulsar as platform infra
- Fan-out streaming is the product, not per-service claim coordination
- Jobs-only background work, no custom ordering → pg-boss, Graphile Worker, or River
- No database on either end can participate in inbox/outbox
- Postgres is saturated after tuning and coordination itself must move off-DB
Source
Use the article for explanation, then use these files when you want the complete SQL and TypeScript in one place.