SQLAlchemy adapter
This content is for 0.1. Switch to the latest version for up-to-date documentation.
SQLAlchemy adapter — the first authoring layer for the effaced core.
The core (annotations, manifest, engine) is storage-agnostic; this package
is the thin layer that knows SQLAlchemy: authoring helpers that ride the
info dict, a collector that derives the manifest from metadata, a
resolver that turns subject-link paths into a subject graph, the
anonymizer surrogate registry, the erasure executor that runs local steps,
the completeness linter that flags what the manifest does not cover, and
the effaced-owned storage tables mounted via bind_tables.
bind_tables
Section titled “bind_tables”def bind_tables(metadata: MetaData) -> EffacedTablesMount the effaced-owned tables on the application’s MetaData.
Defines effaced_audit_events, effaced_consent_records and
effaced_outbox so they live in your database and ride your
migration tooling — no migration tool is assumed and no DDL is executed
here. Calling it again on the same MetaData is a no-op returning the
already-mounted tables, so module-level and app-factory setup styles
both work.
With Alembic, call this where your env.py’s target_metadata is
defined; alembic revision --autogenerate then picks the tables up
like your own. New effaced releases may add tables, columns or indexes
in MINOR versions — re-run autogenerate after upgrading. Without a
migration tool, metadata.create_all(engine) creates them directly.
Example:
>>> from effaced import bind_tables>>> tables = bind_tables(Base.metadata) # doctest: +SKIP>>> tables.audit_events.name # doctest: +SKIP'effaced_audit_events'Args:
- metadata (
MetaData): TheMetaDatayour migrations already manage (typicallyBase.metadata).
Returns:
EffacedTables— Handles to the three mounted tables.
Raises:
ValueError— If only some of the table names already exist onmetadata— i.e. a table of your own collides with aneffaced_-prefixed name.
collect_data_map
Section titled “collect_data_map”def collect_data_map(metadata: MetaData) -> DataMapCollect every effaced annotation from SQLAlchemy metadata.
Args:
- metadata (
MetaData): TheMetaDataholding your mapped tables (for the ORM,Base.metadata).
Returns:
DataMap— class:DataMapcontaining only tables with at least oneDataMap— annotation (apiicolumn or asubject_link).
Raises:
ManifestError— If aninfoentry under the effaced key is not a recognised annotation object.
default_surrogate_registry
Section titled “default_surrogate_registry”def default_surrogate_registry() -> SurrogateRegistryA registry covering the common SQLAlchemy scalar types.
Strings become unique opaque tokens (anon-…), numbers and booleans
become zero-values, dates collapse to the Unix epoch, and UUIDs become
fresh random UUIDs. The epoch datetime is naive and also resolves (via
MRO) for DateTime(timezone=True) columns — register a tz-aware
factory when your dialect rejects naive values there.
Returns:
SurrogateRegistry— A new, independently extensible registry.
EffacedTables
Section titled “EffacedTables”class EffacedTables: def __init__(audit_events: Table, consent_records: Table, outbox: Table) -> NoneHandles to the three effaced-owned tables mounted on a MetaData.
Returned by effaced.bind_tables so downstream components (audit
sink, consent ledger, outbox) can reference the tables directly instead
of looking them up by name.
Fields:
- audit_events (
Table): The append-only audit trail table. - consent_records (
Table): The append-only consent event table. - outbox (
Table): The durable external-call outbox table.
ErasureExecutor
Section titled “ErasureExecutor”class ErasureExecutor: def __init__(metadata: MetaData, surrogates: SurrogateRegistry | None = None) -> NoneExecutes one local erasure step per call, scoped to one subject.
The SQLAlchemy implementation of
StepExecutor: each table’s
TableAccessPlan hop chain becomes nested IN
subqueries down to the subject identifier, so a step only ever touches
the one subject’s rows. Statements run in the caller’s session and are
never committed here (ADR 0006).
Two ADR 0007 consequences surface at this layer: a foreign-key
reference into a row-deleted table from outside the subject path
(e.g. another subject’s comment replying to this subject’s) fails
loudly with the database’s integrity error, and ANONYMIZE rewrites
rows one by one so every cell gets a fresh surrogate — unique
constraints keep holding.
ErasureExecutor.execute
Section titled “ErasureExecutor.execute”def execute(session: Session, graph: SubjectGraph, step: ErasureStep, subject_id: str) -> intRun one local step scoped to one subject (see StepExecutor).
Args:
- session (
Session): The caller’s open session; never committed here. - graph (
SubjectGraph): Resolved hop chains from each table to the subject. - step (
ErasureStep): The local step to run. - subject_id (
str): Identifier on the subject table, coerced to the subject column’s python type for typed-parameter drivers.
Returns:
int— The number of rows deleted, anonymized, or counted as retained.
Raises:
ConfigurationError— If the step is external.ManifestError— If the step targets a table or column missing from the bound metadata.AnonymizationError— If anANONYMIZEtable has no primary key or a column type has no registered surrogate.
lint_completeness
Section titled “lint_completeness”def lint_completeness(metadata: MetaData) -> tuple[CompletenessFinding, ...]Find every place the metadata could hold undeclared personal data.
The exact complement of collect_data_map: every table is either
in the data map, returned here as a whole-table finding, or an
effaced-owned effaced_* table — and within a mapped table, every
column is either annotated, a primary/foreign key, or returned here as
a column finding. Nothing falls through silently.
Findings are questions, not verdicts — gate on them in CI with
effaced.testing.assert_data_map_complete, which lets you exempt
stores and fields you have consciously judged to hold no personal data.
Args:
- metadata (
MetaData): TheMetaDataholding your mapped tables (for the ORM,Base.metadata).
Returns:
tuple[CompletenessFinding, ...]— All findings, in deterministic table order, then column order.
Raises:
ManifestError— If aninfoentry under the effaced key is not a recognised annotation object — exactly the metadatacollect_data_maprejects, so the complement contract holds even on malformed input.
def pii(category: PiiCategory, *, erasure: ErasureStrategy = ErasureStrategy.DELETE, retention: RetentionPolicy | None = None, legal_basis: LegalBasis | None = None, purpose: str | None = None, description: str | None = None) -> dict[str, Any]Declare a column as personal data.
Returns an info dict fragment for mapped_column(info=...) /
Column(info=...). Keeping this a function (not a bare dict) lets the
manifest format evolve behind a stable call signature.
Args:
- category (
PiiCategory): What kind of personal data the column holds. - erasure (
ErasureStrategy): Erasure behaviour; defaults to deletion. - retention (
RetentionPolicy | None): Legal retention duty; required forRETAIN. - legal_basis (
LegalBasis | None): Lawful basis, surfaced in Art. 15 exports. - purpose (
str | None): Processing purpose, surfaced in Art. 15 exports. - description (
str | None): Free-text note for audits.
Returns:
dict[str, Any]— A dict suitable for SQLAlchemy’sinfoparameter.
resolve_subject_graph
Section titled “resolve_subject_graph”def resolve_subject_graph(data_map: DataMap, orm_registry: registry) -> SubjectGraphResolve every subject-link path in a data map into a subject graph.
Each table’s dotted relationship path is walked against the ORM mappers and flattened into foreign-key column pairs; the resulting accesses are ordered FK-safely for deletion (children before parents, subject table last). Joined-table inheritance is keyed by each mapper’s local table only.
Args:
- data_map (
DataMap): The collected manifest (seecollect_data_map). - orm_registry (
registry): The ORM registry holding the mapped classes — for declarative styles,Base.registry.
Returns:
SubjectGraph— The resolved, FK-safely orderedSubjectGraph.
Raises:
SubjectResolutionError— If no (or more than one) table declaressubject_link(""), a table holds personal data without a subject link, a table is not ORM-mapped, a path segment is not a relationship, a path joins through a many-to-many secondary table, a path does not end at the subject table, the declared subject id column does not exist or is declared on a non-subject table, or the foreign keys between resolved tables form a cycle.
subject_link
Section titled “subject_link”def subject_link(path: str, *, subject_id_column: str = 'id') -> dict[str, Any]Declare how a table reaches the data subject.
Attach via Table.info or the mapped class’s __table_args__
info dict. The subject table itself declares subject_link("").
Args:
- path (
str): Dotted relationship path to the subject table. - subject_id_column (
str): Identifier column on the subject table.
Returns:
dict[str, Any]— A dict suitable for SQLAlchemy’s table-levelinfoparameter.
SurrogateRegistry
Section titled “SurrogateRegistry”class SurrogateRegistry: def __init__() -> NoneMaps SQLAlchemy column types to surrogate-value factories.
Anonymization replaces a value with an irreversible surrogate instead
of NULL — surrogates stay valid under NOT NULL and unique
constraints, which is why factories are invoked once per cell (string
and UUID surrogates are unique per call). Lookup walks the column
type’s MRO, so registering String also covers
Text and every other String subclass.
Unlike ResolverRegistry, re-registering a
type silently overrides it (last wins): replacing a default surrogate
with your own is the very point of extensibility, and nothing audited
depends on which factory produced a surrogate.
The registry is consumed by the erasure executor, never by
plan — plans carry no values,
which keeps them deterministic and side-effect-free.
SurrogateRegistry.register
Section titled “SurrogateRegistry.register”def register(sa_type: type[TypeEngine[Any]], factory: Callable[[], object]) -> NoneMap one SQLAlchemy type (and its subclasses) to a factory.
Args:
- sa_type (
type[TypeEngine[Any]]): The type class to cover, e.g.sqlalchemy.String. - factory (
Callable[[], object]): Zero-argument callable producing one surrogate value; called once per anonymized cell.
SurrogateRegistry.surrogate_for
Section titled “SurrogateRegistry.surrogate_for”def surrogate_for(column_type: TypeEngine[Any]) -> objectProduce one surrogate value for a column of the given type.
Args:
- column_type (
TypeEngine[Any]): The column’s type instance, e.g.Text().
Returns:
object— A fresh, type-valid surrogate value.
Raises:
AnonymizationError— If neither the type nor any of its base classes has a registered factory.