Skip to content

SQLAlchemy adapter

This content is for 0.1. Switch to the latest version for up-to-date documentation.

SQLAlchemy adapter — the first authoring layer for the effaced core.

The core (annotations, manifest, engine) is storage-agnostic; this package is the thin layer that knows SQLAlchemy: authoring helpers that ride the info dict, a collector that derives the manifest from metadata, a resolver that turns subject-link paths into a subject graph, the anonymizer surrogate registry, the erasure executor that runs local steps, the completeness linter that flags what the manifest does not cover, and the effaced-owned storage tables mounted via bind_tables.

def bind_tables(metadata: MetaData) -> EffacedTables

Mount the effaced-owned tables on the application’s MetaData.

Defines effaced_audit_events, effaced_consent_records and effaced_outbox so they live in your database and ride your migration tooling — no migration tool is assumed and no DDL is executed here. Calling it again on the same MetaData is a no-op returning the already-mounted tables, so module-level and app-factory setup styles both work.

With Alembic, call this where your env.py’s target_metadata is defined; alembic revision --autogenerate then picks the tables up like your own. New effaced releases may add tables, columns or indexes in MINOR versions — re-run autogenerate after upgrading. Without a migration tool, metadata.create_all(engine) creates them directly.

Example:

>>> from effaced import bind_tables
>>> tables = bind_tables(Base.metadata) # doctest: +SKIP
>>> tables.audit_events.name # doctest: +SKIP
'effaced_audit_events'

Args:

  • metadata (MetaData): The MetaData your migrations already manage (typically Base.metadata).

Returns:

  • EffacedTables — Handles to the three mounted tables.

Raises:

  • ValueError — If only some of the table names already exist on metadata — i.e. a table of your own collides with an effaced_-prefixed name.
def collect_data_map(metadata: MetaData) -> DataMap

Collect every effaced annotation from SQLAlchemy metadata.

Args:

  • metadata (MetaData): The MetaData holding your mapped tables (for the ORM, Base.metadata).

Returns:

  • DataMap — class:DataMap containing only tables with at least one
  • DataMap — annotation (a pii column or a subject_link).

Raises:

  • ManifestError — If an info entry under the effaced key is not a recognised annotation object.
def default_surrogate_registry() -> SurrogateRegistry

A registry covering the common SQLAlchemy scalar types.

Strings become unique opaque tokens (anon-…), numbers and booleans become zero-values, dates collapse to the Unix epoch, and UUIDs become fresh random UUIDs. The epoch datetime is naive and also resolves (via MRO) for DateTime(timezone=True) columns — register a tz-aware factory when your dialect rejects naive values there.

Returns:

  • SurrogateRegistry — A new, independently extensible registry.
class EffacedTables:
def __init__(audit_events: Table, consent_records: Table, outbox: Table) -> None

Handles to the three effaced-owned tables mounted on a MetaData.

Returned by effaced.bind_tables so downstream components (audit sink, consent ledger, outbox) can reference the tables directly instead of looking them up by name.

Fields:

  • audit_events (Table): The append-only audit trail table.
  • consent_records (Table): The append-only consent event table.
  • outbox (Table): The durable external-call outbox table.
class ErasureExecutor:
def __init__(metadata: MetaData, surrogates: SurrogateRegistry | None = None) -> None

Executes one local erasure step per call, scoped to one subject.

The SQLAlchemy implementation of StepExecutor: each table’s TableAccessPlan hop chain becomes nested IN subqueries down to the subject identifier, so a step only ever touches the one subject’s rows. Statements run in the caller’s session and are never committed here (ADR 0006).

Two ADR 0007 consequences surface at this layer: a foreign-key reference into a row-deleted table from outside the subject path (e.g. another subject’s comment replying to this subject’s) fails loudly with the database’s integrity error, and ANONYMIZE rewrites rows one by one so every cell gets a fresh surrogate — unique constraints keep holding.

def execute(session: Session, graph: SubjectGraph, step: ErasureStep, subject_id: str) -> int

Run one local step scoped to one subject (see StepExecutor).

Args:

  • session (Session): The caller’s open session; never committed here.
  • graph (SubjectGraph): Resolved hop chains from each table to the subject.
  • step (ErasureStep): The local step to run.
  • subject_id (str): Identifier on the subject table, coerced to the subject column’s python type for typed-parameter drivers.

Returns:

  • int — The number of rows deleted, anonymized, or counted as retained.

Raises:

  • ConfigurationError — If the step is external.
  • ManifestError — If the step targets a table or column missing from the bound metadata.
  • AnonymizationError — If an ANONYMIZE table has no primary key or a column type has no registered surrogate.
def lint_completeness(metadata: MetaData) -> tuple[CompletenessFinding, ...]

Find every place the metadata could hold undeclared personal data.

The exact complement of collect_data_map: every table is either in the data map, returned here as a whole-table finding, or an effaced-owned effaced_* table — and within a mapped table, every column is either annotated, a primary/foreign key, or returned here as a column finding. Nothing falls through silently.

Findings are questions, not verdicts — gate on them in CI with effaced.testing.assert_data_map_complete, which lets you exempt stores and fields you have consciously judged to hold no personal data.

Args:

  • metadata (MetaData): The MetaData holding your mapped tables (for the ORM, Base.metadata).

Returns:

  • tuple[CompletenessFinding, ...] — All findings, in deterministic table order, then column order.

Raises:

  • ManifestError — If an info entry under the effaced key is not a recognised annotation object — exactly the metadata collect_data_map rejects, so the complement contract holds even on malformed input.
def pii(category: PiiCategory, *, erasure: ErasureStrategy = ErasureStrategy.DELETE, retention: RetentionPolicy | None = None, legal_basis: LegalBasis | None = None, purpose: str | None = None, description: str | None = None) -> dict[str, Any]

Declare a column as personal data.

Returns an info dict fragment for mapped_column(info=...) / Column(info=...). Keeping this a function (not a bare dict) lets the manifest format evolve behind a stable call signature.

Args:

  • category (PiiCategory): What kind of personal data the column holds.
  • erasure (ErasureStrategy): Erasure behaviour; defaults to deletion.
  • retention (RetentionPolicy | None): Legal retention duty; required for RETAIN.
  • legal_basis (LegalBasis | None): Lawful basis, surfaced in Art. 15 exports.
  • purpose (str | None): Processing purpose, surfaced in Art. 15 exports.
  • description (str | None): Free-text note for audits.

Returns:

  • dict[str, Any] — A dict suitable for SQLAlchemy’s info parameter.
def resolve_subject_graph(data_map: DataMap, orm_registry: registry) -> SubjectGraph

Resolve every subject-link path in a data map into a subject graph.

Each table’s dotted relationship path is walked against the ORM mappers and flattened into foreign-key column pairs; the resulting accesses are ordered FK-safely for deletion (children before parents, subject table last). Joined-table inheritance is keyed by each mapper’s local table only.

Args:

  • data_map (DataMap): The collected manifest (see collect_data_map).
  • orm_registry (registry): The ORM registry holding the mapped classes — for declarative styles, Base.registry.

Returns:

  • SubjectGraph — The resolved, FK-safely ordered SubjectGraph.

Raises:

  • SubjectResolutionError — If no (or more than one) table declares subject_link(""), a table holds personal data without a subject link, a table is not ORM-mapped, a path segment is not a relationship, a path joins through a many-to-many secondary table, a path does not end at the subject table, the declared subject id column does not exist or is declared on a non-subject table, or the foreign keys between resolved tables form a cycle.
def subject_link(path: str, *, subject_id_column: str = 'id') -> dict[str, Any]

Declare how a table reaches the data subject.

Attach via Table.info or the mapped class’s __table_args__ info dict. The subject table itself declares subject_link("").

Args:

  • path (str): Dotted relationship path to the subject table.
  • subject_id_column (str): Identifier column on the subject table.

Returns:

  • dict[str, Any] — A dict suitable for SQLAlchemy’s table-level info parameter.
class SurrogateRegistry:
def __init__() -> None

Maps SQLAlchemy column types to surrogate-value factories.

Anonymization replaces a value with an irreversible surrogate instead of NULL — surrogates stay valid under NOT NULL and unique constraints, which is why factories are invoked once per cell (string and UUID surrogates are unique per call). Lookup walks the column type’s MRO, so registering String also covers Text and every other String subclass.

Unlike ResolverRegistry, re-registering a type silently overrides it (last wins): replacing a default surrogate with your own is the very point of extensibility, and nothing audited depends on which factory produced a surrogate.

The registry is consumed by the erasure executor, never by plan — plans carry no values, which keeps them deterministic and side-effect-free.

def register(sa_type: type[TypeEngine[Any]], factory: Callable[[], object]) -> None

Map one SQLAlchemy type (and its subclasses) to a factory.

Args:

  • sa_type (type[TypeEngine[Any]]): The type class to cover, e.g. sqlalchemy.String.
  • factory (Callable[[], object]): Zero-argument callable producing one surrogate value; called once per anonymized cell.
def surrogate_for(column_type: TypeEngine[Any]) -> object

Produce one surrogate value for a column of the given type.

Args:

  • column_type (TypeEngine[Any]): The column’s type instance, e.g. Text().

Returns:

  • object — A fresh, type-valid surrogate value.

Raises:

  • AnonymizationError — If neither the type nor any of its base classes has a registered factory.