Skip to content

Annotations

Core annotation models — the data map vocabulary, storage-agnostic.

Authoring helpers that attach these to concrete ORMs live in effaced.adapters (SQLAlchemy first). The models here are pure data: they validate, serialize, and never import a database library.

def canonical_subject_id(identifier: SubjectIdentifier) -> str

Serialize a subject identifier to its canonical storage string.

A bare str is returned unchanged — it is already the one-element canonical form, so single-column subjects are byte-identical in storage, audit references, and SQL to how they were before composite keys existed.

A CompositeSubjectId joins its escaped element values on a reserved separator. The escaping guarantees the result is collision- free: distinct keys (including keys that differ only in where a separator- like character falls, such as ("a", "b:c") versus ("a:b", "c")) always serialize to distinct strings. Saga completion-grouping and cross- subject isolation both depend on that distinctness.

Args:

  • identifier (SubjectIdentifier): A single-column str or a multi-column CompositeSubjectId.

Returns:

  • str — The canonical string. Round-trips through parse_canonical
  • str — for the composite case; a bare str parses back to itself.
class CompositeSubjectId(BaseModel):
values: tuple[str, ...] = Field(min_length=1)

A data subject identified by an ordered tuple of column values.

Subjects whose identity spans several columns — the common multi-tenant (tenant_id, user_id) shape, or any natural composite key — carry one of these instead of a bare str (ADR 0025). The values are positional: their order aligns left-to-right with the columns the manifest declares in subject_id_columns. effaced always matches the whole ordered key, never a partial one, so the arity of this tuple must equal the number of declared columns at the call boundary.

A single-column subject is just a str — this model is only for the multi-column case. See SubjectIdentifier for the union every engine entry point accepts, and canonical_subject_id for the deterministic, collision-free serialization used in storage and the audit trail.

Fields:

  • values (tuple[str, ...]): The subject’s key-column values, in declared column order; at least one, none empty.
class Correction(BaseModel):
category: PiiCategory
value: str | int | float | bool

One Art. 16 correction: a category and the value it should hold.

Corrections are keyed by PiiCategory, never by column (ADR 0013): the category is the only vocabulary shared with external resolvers, and a category-wide write keeps denormalized copies of the same fact consistent. Values are JSON scalars so a correction round-trips losslessly through the outbox payload.

The value is personal data. It lives transiently in outbox rows while external rectification is in flight — cleared the moment the entry reaches a terminal status — and never appears in any audit event.

Fields:

  • category (PiiCategory): Which kind of personal data the correction targets.
  • value (str | int | float | bool): The corrected value, applied to every matching field.
def parse_canonical(serialized: str) -> CompositeSubjectId

Parse a canonical composite string back into its ordered values.

The exact inverse of canonical_subject_id for the composite case: it splits on unescaped separators and unescapes each element, so a value that itself contained the separator or escape character is restored intact.

Args:

Returns:

class PiiSpec(BaseModel):
category: PiiCategory
description: str | None = None
erasure: ErasureStrategy = ErasureStrategy.DELETE
legal_basis: LegalBasis | None = None
purpose: str | None = None
retention: RetentionPolicy | None = None

Full declaration for one personal-data field.

Built by the adapter authoring helpers (e.g. effaced.adapters.sqlalchemy.pii); read back by effaced.manifest.DataMap.

Fields:

  • category (PiiCategory): What kind of personal data this is.
  • description (str | None): Optional human note for audits and the PII linter.
  • erasure (ErasureStrategy): What happens on Art. 17 erasure. Defaults to DELETE.
  • legal_basis (LegalBasis | None): Why the data is processed at all (Art. 15 metadata).
  • purpose (str | None): Processing purpose, surfaced verbatim in export bundles.
  • retention (RetentionPolicy | None): Required when erasure is RETAIN (and allowed with ANONYMIZE to document why the record itself survives).
class RetentionPolicy(BaseModel):
anchor: str | None = None
basis: LegalBasis = LegalBasis.LEGAL_OBLIGATION
duration: timedelta | None = None
reason: str = Field(min_length=1)

Why and how long a value must outlive an erasure request.

A bounded duty needs a clock: duration is measured from the instant stored in the anchor column. Without an anchor, a duration cannot be evaluated — the retention sweep reports such columns as indeterminate, never guessed (see effaced.retention.RetentionSweeper).

Fields:

  • anchor (str | None): Name of a datetime column on the same table as the annotated column, holding the instant the retention clock starts (an invoiced_at, a closed_at). Cross-table anchors are out of scope. The SQLAlchemy adapter validates existence and datetime-ness at collection time (ADR 0012).
  • basis (LegalBasis): The lawful basis that overrides erasure.
  • duration (timedelta | None): How long the duty lasts, if bounded. None means indefinite / determined externally.
  • reason (str): Human-readable legal duty (e.g. "§147 AO invoice retention").
SubjectIdentifier = str | CompositeSubjectId

What every engine entry point accepts to name a data subject.

A bare str for a single-column subject, or a CompositeSubjectId for a multi-column one. The bare-str form is the one-element canonical case, so passing a string behaves exactly as it always has (ADR 0025).

class SubjectLink(BaseModel):
is_subject_table: bool
path: str
subject_id_columns: tuple[str, ...] = Field(default=('id',), min_length=1)

How a table’s records reach the data subject.

A dotted relationship path from the annotated table to the subject table, e.g. "order.user" for an order_items table whose records belong to the user owning the parent order. The subject table itself uses the empty path "".

Fields:

  • is_subject_table (bool): Whether this link marks the subject table itself.
  • path (str): Dotted relationship path; "" marks the subject table.
  • subject_id_columns (tuple[str, ...]): Ordered identifier columns on the subject table that callers’ SubjectIdentifier aligns to. Defaults to ("id",) — one column, the single-column case. A multi-column tuple declares a composite subject key (ADR 0025); effaced always matches the whole ordered key, never a partial one.
class SubjectRef(BaseModel):
extra: dict[str, str] = Field(default_factory=dict)
kind: str = Field(min_length=1, max_length=255)
value: str = Field(min_length=1, max_length=255)

Opaque reference to one data subject, passed to resolvers.

Resolvers receive references (e.g. a Stripe customer id), never the subject’s rich PII — the library moves identifiers, not data.

Fields:

  • extra (dict[str, str]): Additional identifiers a resolver may need (string-typed on purpose — refs must stay loggable and PII-light).
  • kind (str): Namespace of the identifier ("stripe", "email"). Refs are routed to the resolver whose name equals the ref’s kind (ADR 0008).
  • value (str): The identifier itself.