Skip to content

Erasure

Erasing a subject touches your database and external systems, and those two can’t share a transaction. effaced therefore splits erasure in two: the local phase runs atomically in your session, and external calls are enqueued durably in the same transaction, then fanned out by the saga runner. The system is always in a known, recorded state — never a half-erased mystery.

planner = ErasurePlanner(
data_map, graph, registry,
executor=ErasureExecutor(Base.metadata),
outbox=outbox,
audit_sink=audit,
)
result = planner.erase_subject(session, "42", refs=(stripe_ref,))

erase_subject never commits or rolls back your session: the row changes and the outbox entries become durable together when you commit, and a rollback undoes both. After it raises, do not commit the session.

planner.plan(subject_id, refs=...) computes the full programme without a session and without I/O — a pure function of the manifest, so you (and your tests) can assert exactly what an erasure will touch before anything happens. The row-level semantics (ADR 0007):

  • A whole row is deleted iff every annotated column on the table is DELETE and the table is fully PII-owned — every physical column is PII-annotated, a primary-key member, or a foreign-key member. Keys are structural plumbing; an unannotated payload column means row deletion would erase more than the manifest declares.
  • Otherwise the row survives and steps are column-level: one ANONYMIZE step for every non-RETAIN column, one RETAIN step for the retained columns. On a surviving row, even DELETE-declared columns are anonymized with a type-valid surrogate, never NULLNOT NULL and unique constraints keep holding, and an irreversible surrogate is content erasure. Surrogates come from the extensible SurrogateRegistry, consumed only at execution time.
  • Conflicts fail loudly before anything runs. If a surviving table’s path to the subject passes through a table planned for row deletion, the plan is unsatisfiable: RetentionViolationError when a retention duty is at stake, ManifestError when the manifest is merely incomplete.

This is the conservative direction throughout: the planner never deletes more than the manifest declares.

Local steps follow the subject graph’s deletion order — children before parents, the subject table last — so foreign keys never block a legitimate erasure. Fields declared ErasureStrategy.RETAIN are never deleted by any code path; the RETAIN step touches nothing and exists so the retention decision is recorded, not silently applied. The declaration itself requires a RetentionPolicy naming the legal reason — see annotations.

Each ref is routed to the resolver whose name equals the ref’s kind (ADR 0008). A ref kind matching no registered resolver raises ResolverError before any work — a typo must never silently drop an external system from an Art. 17 answer. A registered resolver with no matching ref is skipped, and that is a complete answer (“the subject has no identity in that system”), recorded in the completion event’s skipped_resolvers. Matched pairs become outbox entries written through your session; the saga takes it from there.

One local erasure leaves the sequence:

  1. ERASURE_REQUESTED before the first step — with the default DatabaseAuditSink each event commits independently, so the attempt stays recorded even if the erasure later rolls back.
  2. One ERASURE_STEP_SUCCEEDED per local step, including RETAIN steps — the RETAIN event is the auditable retention decision. The append is part of the step: an outcome that can’t be recorded counts as a failure.
  3. On the first failure, ERASURE_STEP_FAILED (exception class name only — messages can embed row values, and the trail stays PII-free), then the original exception re-raises.
  4. ERASURE_LOCAL_COMPLETED last, with totals. ERASURE_COMPLETED is the saga runner’s to emit, once every external call has succeeded.

Validation failures (missing wiring, plan conflicts, unmatched ref kinds) raise before any event — a malformed call never became a data-subject request, so it deliberately leaves no audit trace.

After an erasure commits, ErasureVerifier.verify_subject_erased(session, subject_id) reads the annotated surface back and records a verdict. It re-derives the plan’s table classification, counts the subject’s surviving rows per table with SELECT COUNT statements only — it is strictly read-only, writing no row — and appends one PII-free audit event (ERASURE_VERIFIED, or ERASURE_VERIFICATION_FAILED when a row-deleted table still holds rows). verified is true exactly when every row-deleted table is empty for the subject; the surviving anonymize/retain counts are reported for the record and never flip the verdict.

This proves execution fidelity — that a caller trigger, an FK cascade, an ORM event, or a partial commit did not resurrect rows the plan deleted. It is strictly narrower than “everything is gone”, and three boundaries are deliberately out of scope:

  1. It re-reads the same annotated surface the plan was built from, so PII that was never annotated is invisible by construction — this is not a discovery-completeness check.
  2. A row orphaned off the subject’s path (reachable by no hop chain to the subject) is unreachable by the scoping predicate, so it is invisible here too.
  3. Anonymized cell values are not verified. Surrogates are random, never NULL, so without a before-state a reader cannot distinguish a surrogate from an original; confirming a value was rewritten needs a before-state and is out of scope. The verifier never determines that a subject is fully erased, or that anyone is compliant.

Erasure is idempotent by contract: re-running for an already-erased subject succeeds. Row-deleting tables report zero; surviving rows still match by subject id and are re-anonymized with fresh surrogates; external work re-enqueues under fresh idempotency keys and converges at the resolvers (“already gone” is success). Each attempt appends a full audit sequence — every attempt is evidence.

Full signatures: API reference.