Skip to content

The saga

This content is for 0.1. Switch to the latest version for up-to-date documentation.

A Stripe API call cannot join your database transaction. A hand-rolled erasure that deletes local rows and then calls Stripe is one network failure away from a half-erased subject nobody can account for — locally gone, externally present, with no record of which. That failure mode is the reason effaced treats erasure as a saga, not a function call.

erase_subject writes one outbox entry per matched (resolver, ref) pair through your session, in the same transaction as the local deletion:

erase_subject(...)
├── one atomic DB transaction
│ ├── delete / anonymize in FK-safe order
│ ├── skip + record legally retained fields
│ └── enqueue outbox entries for external systems
└── saga runner (your worker/cron)
├── Stripe: delete customer ── retry w/ backoff, "already gone" = success
└── audit trail records every outcome, including abandonment

Either the local erasure and all its pending external follow-ups commit together, or none do. Each entry’s entry_id doubles as the idempotency key for the external call. Whatever happens afterwards — an API outage, a crashed worker — the system is always in a known, recorded state.

runner = SagaRunner(registry, outbox, audit, max_attempts=8, batch_size=50)
processed = await runner.run_once()

run_once claims one batch of due entries, awaits the resolver calls concurrently, and books every outcome. The runner owns no event loop and no schedule — drive it from whatever you already operate: a worker process, a cron job, or a background thread (wiring examples). It makes blocking database calls between awaits, so never run it on a serving event loop.

Concurrent runners are safe: claiming uses FOR UPDATE SKIP LOCKED, and a crashed runner’s claims heal via a lease — an IN_FLIGHT entry whose lease expired is simply re-claimed, and the idempotent resolver call converges on the same outcome.

What happens to a claimed entry depends on how the resolver call ends:

OutcomeEntry becomesAudited
ResolverErasure (incl. already_absent=True)SUCCEEDEDERASURE_STEP_SUCCEEDED
ResolverError (non-retryable by contract, or unknown resolver name)ABANDONED immediatelyERASURE_STEP_FAILED with abandoned: true
Any other exception, attempts below max_attemptsFAILED, retried on the backoff schedulenot audited; last_error on the row
Any other exception, attempts exhaustedABANDONEDERASURE_STEP_FAILED with abandoned: true

BackoffPolicy is deterministic exponential doubling — 30 s base, 1 h cap, 5 min claim lease by default, no jitter (SKIP LOCKED already spreads concurrent runners). Size the lease above your slowest resolver call: an expired lease mid-call means double execution — idempotent, but wasteful. attempts counts claims, not failures, so even an entry that crashes its runner every time converges to ABANDONED instead of being re-claimed forever.

When an entry succeeds and all of the subject’s outbox entries are now SUCCEEDED, the runner emits ERASURE_COMPLETED — the audit trail’s statement that the erasure finished everywhere, local and external. The check runs under lock so exactly one runner observes the transition; a crash at the wrong instant can duplicate the event but never lose it. Runner audit events are generally at-least-once: assert on state, not exact event counts.

An ABANDONED entry blocks completion permanently — its ERASURE_STEP_FAILED is the subject’s terminal record until an operator fixes the cause, clears the abandoned row, and re-runs erase_subject (re-runs enqueue fresh entries; the completion check spans all generations).

The claim query, failure taxonomy, backoff schedule, and completion condition are all erasure semantics — changing any of them is MAJOR under widened SemVer. Full signatures: API reference.