Skip to content

Erasure

Art. 17 erasure — atomic locally, saga-driven externally.

class ErasurePlan(BaseModel):
external_steps: tuple[ErasureStep, ...]
local_steps: tuple[ErasureStep, ...]
refs: tuple[SubjectRef, ...] = ()
steps: tuple[ErasureStep, ...] = ()
subject_id: ValidatedSubjectId

The full programme for erasing one subject.

Local steps run inside one atomic transaction in FK-safe order; external steps are enqueued durably and fanned out afterwards. Plans are inspectable so callers (and tests) can assert exactly what an erasure will touch before anything happens.

Fields:

  • external_steps (tuple[ErasureStep, ...]): Steps that run through the saga/outbox after commit.
  • local_steps (tuple[ErasureStep, ...]): Steps that run inside the local database transaction.
  • refs (tuple[SubjectRef, ...]): The external-system references the erasure will hand to resolvers — recorded for inspectability; the executor matches them to resolvers at execution time.
  • steps (tuple[ErasureStep, ...]): All steps in execution order (local first, then external).
  • subject_id (ValidatedSubjectId): The subject being erased — a single-column str or a composite CompositeSubjectId, echoed back from the call.
class ErasurePlanner:
def __init__(data_map: DataMap, graph: SubjectGraph, registry: ResolverRegistry | None = None, *, executor: StepExecutor | None = None, outbox: Outbox | None = None, audit_sink: AuditSink | None = None) -> None

Plans and executes subject erasure across local and external data.

The local deletion runs in one atomic transaction in FK-safe order, honouring per-field strategies (delete / anonymize / retain). External calls cannot join that transaction, so they are enqueued durably in the same transaction and fanned out afterwards by the saga runner — the system is always in a known, recorded state, even on partial failure.

The row-level semantics (when a whole row is deleted versus anonymized in place) are defined in ADR 0007; ref→resolver routing in ADR 0008; the execution and audit semantics of erase_subject in ADR 0009. Changing any of them changes what gets deleted and is MAJOR under widened SemVer.

def erase_subject(session: Session, subject_id: SubjectIdentifier, *, refs: tuple[SubjectRef, ...] = ()) -> ErasureResult

Execute the plan: atomic local phase + durable external enqueue.

Local steps run in FK-safe order and the external steps’ outbox entries are written through the same session, so the caller’s commit makes the whole erasure durable at once — and a rollback undoes every row change and every outbox entry together. This method never commits or rolls back the session itself; after it raises, do not commit the session.

Audit semantics (ADR 0009): ERASURE_REQUESTED is appended before the first step, one ERASURE_STEP_SUCCEEDED after each local step (RETAIN included — the retention decision is the record; the append is part of the step, so a step whose outcome cannot be recorded audits as failed), ERASURE_STEP_FAILED on the first failure (then the original exception re-raises), and ERASURE_LOCAL_COMPLETED last. With the default DatabaseAuditSink each event commits independently of the caller’s transaction, so the attempt stays recorded even when the erasure rolls back. Validation failures raise before any event — a malformed call never became a data-subject request.

Each ref is routed to the resolver whose name equals the ref’s kind (ADR 0008). A registered resolver with no matching ref is skipped — recorded in the completion payload’s skipped_resolvers, absent from enqueued_external — and a ref kind matching no resolver fails loudly.

Re-running for an already-erased subject is a no-op success: row-deleting tables report zero, surviving rows (anonymized in place or retained) re-match by subject id and are reported again, and matched external work is re-enqueued under fresh idempotency keys — resolvers treat “already gone” as success, so duplicates converge.

Args:

  • session (Session): An open database session; the local phase commits or rolls back as one unit together with the outbox entries.
  • subject_id (SubjectIdentifier): Identifier on the subject table.
  • refs (tuple[SubjectRef, ...]): External-system references, routed by kind (ADR 0008).

Returns:

  • ErasureResult — The local-phase outcome with per-table counts. A surviving row
  • ErasureResult — anonymized in some columns and retained in others counts in
  • ErasureResult — both anonymized and retained. External outcomes land
  • ErasureResult — in the audit trail asynchronously.

Raises:

  • ConfigurationError — If the planner was built without an executor, outbox, or audit sink.
  • ResolverError — If a ref’s kind matches no registered resolver — a typo must not silently drop an external system from the erasure.
def plan(subject_id: SubjectIdentifier, *, refs: tuple[SubjectRef, ...] = ()) -> ErasurePlan

Compute the erasure programme without executing anything.

A pure function of the manifest and refs: no session, no I/O, and calling it twice yields equal plans.

Args:

  • subject_id (SubjectIdentifier): The subject identifier — a single-column str or a composite CompositeSubjectId; echoed back unchanged on the plan.
  • refs (tuple[SubjectRef, ...]): External-system references, recorded on the plan for the resolver steps.

Returns:

  • ErasurePlan — The ordered, inspectable plan (local steps first, FK-safe).

Raises:

  • RetentionViolationError — If a table must keep rows under a retention duty while a table on its path to the subject is planned for row deletion.
  • ManifestError — If a table survives erasure only because the manifest declares nothing erasable on it, while a table on its path to the subject is planned for row deletion.
class ErasureResult(BaseModel):
anonymized: dict[str, int] = Field(default_factory=dict)
completed_at: datetime
deleted: dict[str, int] = Field(default_factory=dict)
enqueued_external: tuple[str, ...] = ()
retained: dict[str, int] = Field(default_factory=dict)
subject_id: ValidatedSubjectId

Outcome of the local phase of an erasure.

External steps complete asynchronously; their outcomes land in the audit trail as the saga runner processes the outbox.

Fields:

  • anonymized (dict[str, int]): Record counts anonymized, by table.
  • completed_at (datetime): When the local phase finished (UTC); durable once the caller commits.
  • deleted (dict[str, int]): Record counts deleted, by table.
  • enqueued_external (tuple[str, ...]): Resolver names whose erasure was enqueued.
  • retained (dict[str, int]): Record counts left in place under a retention duty, by table.
  • subject_id (ValidatedSubjectId): The subject that was erased — echoed back from the call (single-column str or composite CompositeSubjectId).
class ErasureStep(BaseModel):
columns: tuple[str, ...] = ()
external: bool = False
strategy: ErasureStrategy
target: str = Field(min_length=1)

One action the erasure will take, in execution order.

Local DELETE steps remove whole rows; local ANONYMIZE and RETAIN steps are column-level and must name the columns they touch (or, for RETAIN, deliberately leave untouched). External steps address a whole subject in a resolver, never columns — a validator makes any other shape unrepresentable.

Fields:

  • columns (tuple[str, ...]): The column names the step touches; empty for whole-row deletion and for external steps.
  • external (bool): True when the step is a resolver call that runs through the saga/outbox after the local transaction commits.
  • strategy (ErasureStrategy): What happens to the matched records.
  • target (str): Table name (local) or resolver name (external).
class ErasureVerification(BaseModel):
residual: dict[str, int] = Field(default_factory=dict)
subject_id: ValidatedSubjectId
surviving: dict[str, int] = Field(default_factory=dict)
verified: bool
verified_at: datetime

The verdict of reading the annotated surface back after an erasure.

A verification re-runs the manifest’s subject-scoping as a read and counts what is left for the subject, per table. It proves execution fidelity — that a caller trigger, an FK cascade, an ORM event, or a partial commit did not resurrect rows the plan deleted — and nothing wider. Three boundaries are load-bearing and deliberately not covered:

  1. It re-reads the same annotated surface the plan was built from, so PII that was never annotated is invisible by construction; this is not a discovery-completeness check.
  2. A row orphaned off the subject’s path (reachable by no hop chain to the subject) is unreachable by the scoping predicate, so it is invisible here too.
  3. Anonymized cell values are not verified — surrogates are random, never NULL, so without a before-state a reader cannot distinguish a surrogate from an original. Confirming a value was rewritten needs a before-state and is out of scope.

The hard assertion is therefore narrower than “everything is gone”: verified is true iff every row-deleted table holds zero subject-scoped rows. surviving is informational only — anonymize and retain tables keep rows by design — and never flips verified.

Fields:

  • residual (dict[str, int]): Per-table leftover subject-scoped row counts for tables the plan whole-row deletes; verified is the claim that all of these are zero.
  • subject_id (ValidatedSubjectId): The subject whose surface was read back — echoed back from the call (single-column str or composite CompositeSubjectId).
  • surviving (dict[str, int]): Per-table subject-scoped row counts for tables the plan anonymizes in place or retains — expected to be non-zero, reported for the record, never a failure.
  • verified (bool): True iff every row-deleted table is empty for this subject (residual is all zero).
  • verified_at (datetime): When the read-back ran (UTC).

Protocol — implement these members in your own class; do not subclass.

class StepExecutor(Protocol):
...

Anything that can execute one local erasure step for one subject.

The storage-specific half of erase_subject: it turns a step plus the subject graph’s hop chains into subject-scoped statements in the caller’s open transaction. The SQLAlchemy implementation is ErasureExecutor.

def execute(session: Session, graph: SubjectGraph, step: ErasureStep, subject_id: SubjectIdentifier) -> int

Run one local step scoped to one subject.

Implementations must never commit or roll back the session — the step is durable exactly when the caller’s transaction is. DELETE removes the matched rows, ANONYMIZE rewrites the step’s columns with irreversible surrogates, and RETAIN touches nothing and only counts what stays.

Args:

  • session (Session): The caller’s open session.
  • graph (SubjectGraph): Resolved hop chains from each table to the subject.
  • step (ErasureStep): The local step to run.
  • subject_id (SubjectIdentifier): The subject identifier (single-column str or composite CompositeSubjectId).

Returns:

  • int — The number of rows the step covered (deleted, anonymized, or
  • int — counted as retained).

Raises:

  • ConfigurationError — If the step is external — resolver calls never run inside the local transaction.