Durable Work in PostgresPart 7

Cross-service handoffs

Transactional outbox, transport choices, webhooks, contracts, and when brokers fit. Work that leaves your database commit.

Inside one database, Postgres can make the business write and the work row atomic. Across services, that guarantee ends.

Use a transactional outbox at the producer, transport between services, and idempotent consumption at the receiver. The handoff is explicit: durable intent on your side, at-least-once delivery in the middle, dedupe on theirs.

When the commit boundary ends

Inside one database, the story is clean: one commit can update an order and enqueue the follow-up work. The moment work leaves that database or deployable unit, you no longer have one transaction. You have a handoff.

From that point on, you need transport: a way for intent to cross from your Postgres into someone else’s process. A boundary is anywhere ownership changes: a different database, a different deploy unit, a broker, or an external API.

Use an outbox relay on the producer and an inbox on the receiver. The transport in between only delivers bytes; it does not replace leases or claim loops on either side.

What each part can guarantee

Your database can be strongly consistent inside its boundary. The broker and search index cannot. “Paid” and “visible in search” are different guarantees with different systems behind them.

Transport and downstream consumers are always eventual. Only the producer database gets ACID on write.

LayerWhat holdsConsistencyPostgres tools
Inside your DBBusiness row + outbox/inbox row commit together. Claims are exclusiveStrong (ACID, row locks)Transactional outbox, SKIP LOCKED, leases
Your worker layerEach row processed at least once. Optional per-key FIFOAt-least-once + idempotent effectsHash ring, heartbeats, ordering guard, fencing
TransportMessage or HTTP delivery may retry, reorder (across keys), or lagEventual, at-least-once typicalBroker, webhook, outbox relay
Downstream consumerTheir read model, inbox, or side effect catches up laterEventual: bounded lag is the SLOTheir idempotency + your contract

Cross-boundary architecture

Guarantees stop at each service boundary. Make the handoff explicit: durable intent on your side, idempotent consumption on theirs, and transport in between that may retry.

Three zones: producer DB (ACID), transport (at-least-once), consumer (catches up later).

Producer outbox and consumer inbox

Producer service

API / Domain logic

business mutation

same transaction

orders + outbox

atomic commit

Relay worker

claim · lease · publish

claim loop

Transport (shared infra)

Message broker / stream

routing · delivery · fan-out

HTTP webhook

Partner REST API

at-least-once · retries ·

no distributed 2PC

handoff

Downstream service

Webhook endpoint

→ INSERT inbox

inbox

same pattern, their DB

Their workers

or stream consumer → inbox

Read model / index

eventually consistent

Each box is a separate consistency boundary. Handoffs need explicit contracts and idempotent consumers, not a shared transaction across services.

Transactional outbox for order:9182

Make the intent durable. When payment succeeds, you need search to know eventually. You cannot call search inside the transaction. You can insert an outbox row that says “publish OrderPaid” in the same commit as UPDATE orders.

The Transactional Outbox pattern is the bridge: write intent in the same transaction as the domain mutation, then let a relay worker publish to your transport. In this implementation, the relay is the same competing-consumers loop from earlier chapters, applied to outbound events. You get:

  1. Atomic intent — Order marked paid + OrderPaid outbox row: one commit
  2. Relay — Worker claims outbox row, publishes to transport
  3. At-least-once publish — Crash after publish, before complete → retry (idempotent publish key)
  4. Consumer dedupes — Subscriber uses event_id / idempotency key
BEGIN;
UPDATE orders SET status = 'paid' WHERE id = 9182;  -- business fact the API must not lie about
INSERT INTO outbox (partition_key, event_type, payload, idempotency_key)
VALUES (
  'order:9182',
  'OrderPaid',
  '{"order_id":9182,"paid_at":"2026-07-01T12:00:00Z"}'::jsonb,
  'evt-order-9182-paid-v1'
);  -- durable publish intent, not the wire publish
COMMIT;  -- both rows exist or neither. No ghost OrderPaid events

How search learns the order paid

Payment for order:9182 lives in your database. Search does not. Three ways teams bridge that:

  • Sync HTTP in the request: a partner outage becomes your outage, and retries duplicate work without idempotency. Avoid this at the service boundary.
  • Transactional outbox + relay: publish intent in the same transaction as UPDATE orders. Delivery is at-least-once; the consumer dedupes; lag is measurable. Use when search or analytics must update after payment.
  • CDC from the WAL: no application change on the write path, but streams physical row changes rather than domain events and breaks easily on schema refactors. Use for legacy databases you cannot modify.

Choosing a transport

After the outbox row is durable, choose transport based on the consumer, the SLO, and who operates the middleware.

KindGood whenOptimizes forBreaks when
Message brokerService-to-service delivery, routing keys, ops-owned middlewareFast handoff and per-queue routingYou need long retention replay for many independent consumers
Event log / streamMany subscribers, replay, high-volume shared logFan-out and retentionPoint-to-point only with one consumer team
Managed queueCloud-native, minimal ops, push to one consumer groupManaged durability and scalingStrict per-key ordering without paying for FIFO SKUs
HTTP / webhooksOne partner or SaaS exposes a URLSimple integration surfacePartner outage blocks your relay unless you queue and retry
Poll / RPC pullNo push. Batch export or client-pull APIsReceiver controls cadenceLow latency SLOs without aggressive polling
CDCDownstream needs row changes, not domain eventsDecoupling from app publish codeSchema refactors and domain event contracts

Inbound: webhooks and broker consumers become inbox rows

When Stripe or a partner POSTs to you, acknowledge fast, persist durably, and process asynchronously. Their retries are your at-least-once delivery.

Return 200 after the row is inserted. Processing runs in the claim loop, not in the HTTP handler.

// POST /webhooks/stripe
await db.query(
  `
  INSERT INTO inbox (partition_key, payload, idempotency_key)
  VALUES ($1, $2, $3)
  ON CONFLICT (idempotency_key) WHERE idempotency_key IS NOT NULL DO NOTHING  -- partner retry hits here, not in handler
`,
  [event.accountId, body, event.id],
);

return res.status(200).send(); // ack after durable row. worker claims async

Verify signatures at the boundary. Return 2xx only after the row is durable. Their retries hit ON CONFLICT DO NOTHING on enqueue.

Contracts across boundaries

Shared transactions are gone. Shared vocabulary is not. When two teams own two databases, coordination becomes documentation plus stable ids. The table below is the minimum contract for OrderPaid on order:9182 to land safely in search.

Contract pieceProducer owesConsumer owes
event_id / idempotency keyStable, unique per logical eventDedupe store or unique constraint
partition_key / message keySame key for all ordered events in a streamSingle consumer per key (or ordering guard)
Schema versionBackward-compatible changes. Bump version on breaksReject or dead-letter unknown versions
Delivery semanticsDocument at-least-once. No silent dropsIdempotent handlers. Expose lag metrics
ReplayRetention policy or archive for re-publishSafe to re-process historical events

Sagas and long flows across services

Some workflows outlive one request. Charge card, reserve inventory, ship, notify search. That is many local commits linked by events, not one distributed lock. Sagas are durable rows chained over time. Each step must tolerate retry.

Do

Idempotent steps, explicit timeouts, compensations as first-class outbox events, dashboards on end-to-end lag (order paid → search indexed).

Avoid

Avoid synchronous chains across five HTTP calls in one request. Do not assume partner callbacks arrive exactly once, and do not block the user response on downstream indexing.

Explaining lag to product and leadership

Payment in your DB can be immediate while search and analytics catch up on a separate timeline. Use this distinction when someone asks why search lags payment by thirty seconds.

  • User-facing write path. Strong in your DB: “Payment recorded” is true when API returns success.
  • Derived views. Eventual. Search, analytics, recommendations catch up in seconds (or minutes: define SLO).
  • External systems. Eventual with retries. A webhook may arrive twice. Partner ledger may lag. Contract defines max delay. Same mechanics whether the other party is another team or Stripe.
  • Failure mode: At-least-once everywhere outside ACID. Duplicates are normal. Idempotency is not optional.

Where this implementation fits

Match the tool to the constraint. Postgres coordinates work inside a service. Brokers and queues help at the boundary when fan-out, retention, or cross-team delivery require them.

Strong fit

  • Durable work with claims, leases, and retries
  • Postgres already in the stack (or inbox/outbox tables are acceptable)
  • Per-entity ordering or partition affinity matters
  • Producers and workers share a database. Or outbox/inbox at each service boundary
  • Backlog depth and stuck rows should be visible in SQL

Weaker fit: consider other tools

  • Shared event log with long retention and many independent replay consumers → Kafka / Pulsar as platform infra
  • Fan-out streaming is the product, not per-service claim coordination
  • Jobs-only background work, no custom ordering → pg-boss, Graphile Worker, or River
  • No database on either end can participate in inbox/outbox
  • Postgres is saturated after tuning and coordination itself must move off-DB