SQLAlchemy adapter

SQLAlchemy adapter — the first authoring layer for the effaced core.

The core (annotations, manifest, engine) is storage-agnostic; this package is the thin layer that knows SQLAlchemy: authoring helpers that ride the info dict, a collector that derives the manifest from metadata, a resolver that turns subject-link paths into a subject graph, the anonymizer surrogate registry, the erasure executor that runs local steps, the erasure verifier that reads the annotated surface back afterwards, the completeness linter that flags what the manifest does not cover, the reachability linter that flags annotated tables the planner cannot reach, the effaced-owned storage tables mounted via bind_tables, a reflection helper that lifts a live database’s schema into MetaData, and the EffacedStack facade that wires every engine from one schema source — an annotated base or a serialized manifest plus a reflected database.

bind_tables

def bind_tables(metadata: MetaData) -> EffacedTables

Mount the effaced-owned tables on the application’s MetaData.

Defines effaced_audit_events, effaced_consent_records, effaced_outbox and effaced_restriction_records so they live in your database and ride your migration tooling — no migration tool is assumed and no DDL is executed here. Calling it again on the same MetaData is a no-op returning the already-mounted tables, so module-level and app-factory setup styles both work.

With Alembic, call this where your env.py’s target_metadata is defined; alembic revision --autogenerate then picks the tables up like your own. New effaced releases may add tables, columns or indexes in MINOR versions — re-run autogenerate after upgrading. Owned-table changes are additive-only, and additive columns backfill populated tables via server defaults — the caller-owned migration contract is ADR 0021 (see the Alembic guide on the docs site). Without a migration tool, metadata.create_all(engine) creates them directly.

Example:

>>> from effaced import bind_tables
>>> tables = bind_tables(Base.metadata)  # doctest: +SKIP
>>> tables.audit_events.name  # doctest: +SKIP
'effaced_audit_events'

Args:

metadata (MetaData): The MetaData your migrations already manage (typically Base.metadata).

Returns:

EffacedTables — Handles to the four mounted tables.

Raises:

ValueError — If only some of the table names already exist on metadata — i.e. a table of your own collides with an effaced_-prefixed name.

collect_data_map

def collect_data_map(metadata: MetaData) -> DataMap

Collect every effaced annotation from SQLAlchemy metadata.

Args:

metadata (MetaData): The MetaData holding your mapped tables (for the ORM, Base.metadata).

Returns:

DataMap — class:DataMap containing only tables with at least one
DataMap — annotation (a pii column or a subject_link).

Raises:

ManifestError — If an info entry under the effaced key is not a recognised annotation object, or a retention policy names an anchor column that does not exist on the table or is not datetime-typed (ADR 0012 — fail loudly at assembly, before any sweep runs).

default_surrogate_registry

def default_surrogate_registry() -> SurrogateRegistry

A registry covering the common SQLAlchemy scalar types.

Strings become unique opaque tokens (anon-…), numbers and booleans become zero-values, dates collapse to the Unix epoch, and UUIDs become fresh random UUIDs. The epoch datetime is naive and also resolves (via MRO) for DateTime(timezone=True) columns — register a tz-aware factory when your dialect rejects naive values there.

Returns:

SurrogateRegistry — A new, independently extensible registry.

EffacedStack

class EffacedStack:
    def __init__(metadata: MetaData, data_map: DataMap, graph: SubjectGraph, tables: EffacedTables, session_factory: sessionmaker, registry: ResolverRegistry, audit_sink: AuditSink, outbox: Outbox, exporter: Exporter, planner: ErasurePlanner, rectifier: Rectifier, consent: ConsentLedger, restriction: RestrictionLedger, sweeper: RetentionSweeper, saga_runner: SagaRunner) -> None

Every effaced engine, wired once from one schema source.

The manual integration sequence — collect the data map, resolve the subject graph, mount the owned tables, construct the audit sink, the outbox, and each engine — is mechanical and identical in every application. The two classmethods perform it in one call and return the wired components as named handles, so a web layer (or your own glue) only decides when to call them, never how to build them.

Pick the entry point by where the schema comes from. from_base derives the manifest and graph from an annotated declarative base — the in-process ORM is the source of truth. from_manifest runs the same engines from a serialized manifest paired with a reflected live database, for callers whose models live elsewhere (a separate service, another language) but who export the manifest and connect to the same database. Both wire identical engines and are proven byte-identical on the same schema.

The stack adds no behaviour of its own: each handle is exactly the component you could have constructed by hand, governed by its own documented contract. Construction executes no SQL beyond the read-only reflection from_manifest performs — the owned tables ride your migrations (see effaced.bind_tables).

Fields:

audit_sink (AuditSink): The append-only trail every engine records into.
consent (ConsentLedger): The Art. 7 consent ledger.
data_map (DataMap): The manifest collected from the annotated models.
exporter (Exporter): The Art. 15 export engine.
graph (SubjectGraph): The resolved subject graph used to scope every operation.
metadata (MetaData): The application MetaData the stack was built from.
outbox (Outbox): The durable queue for external erasure/rectification calls.
planner (ErasurePlanner): The Art. 17 erasure engine, execution-ready.
rectifier (Rectifier): The Art. 16 rectification engine, execution-ready.
registry (ResolverRegistry): The resolver registry routing external refs.
restriction (RestrictionLedger): The Art. 18 restriction-of-processing ledger.
saga_runner (SagaRunner): The outbox drainer — drive it from a worker, never on a serving event loop (ADR 0006).
session_factory (sessionmaker): The application’s session factory, as provided.
sweeper (RetentionSweeper): The Art. 5(1)(e) retention sweeper (report-only).
tables (EffacedTables): Handles to the four effaced-owned tables.

EffacedStack.from_base

def from_base(base: type[DeclarativeBase], session_factory: sessionmaker, *, resolvers: Sequence[Resolver] = (), registry: ResolverRegistry | None = None, audit_sink: AuditSink | None = None) -> EffacedStack

Wire the full stack from an annotated declarative base.

Collects the data map from base.metadata, resolves the subject graph through base.registry, mounts the owned tables, and constructs every engine with the SQLAlchemy executors. Resolver registration stays explicit (never discovered): pass the resolver instances, or a prebuilt registry — e.g. from effaced.registry_from_settings — but not both.

Args:

base (type[DeclarativeBase]): The declarative base whose models carry the effaced.pii / effaced.subject_link annotations.
session_factory (sessionmaker): Factory producing sessions on the application database; used by the components that operate outside a caller transaction (audit sink, outbox claims).
resolvers (Sequence[Resolver]): External-system resolvers to register, by instance.
registry (ResolverRegistry | None): A prebuilt registry, mutually exclusive with resolvers.
audit_sink (AuditSink | None): Trail override; defaults to a effaced.DatabaseAuditSink on the mounted effaced_audit_events table.

Returns:

EffacedStack — The wired stack.

Raises:

ConfigurationError — If both resolvers and registry are given — two sources of truth for “where is my PII” would make the registration ambiguous. Raised before any data-map collection or graph resolution.
ManifestError — If the annotations on base are invalid (propagated from effaced.collect_data_map).

EffacedStack.from_manifest

def from_manifest(session_factory: sessionmaker, engine: Engine, manifest_payload: Mapping[str, Any], *, resolvers: Sequence[Resolver] = (), registry: ResolverRegistry | None = None, audit_sink: AuditSink | None = None) -> EffacedStack

Wire the full stack from a serialized manifest and a live database.

The mapper-free counterpart of from_base, for callers whose models are not in this process. Loads (and forward-migrates) the manifest with DataMap.from_payload, reflects exactly the manifest’s tables off engine with effaced.reflect_metadata, resolves the subject graph from the reflected foreign keys with effaced.resolve_subject_graph_from_fk, mounts the owned tables, and constructs every engine with the SQLAlchemy executors — the same engines from_base builds, proven byte-identical on the same schema. Subject-link paths in the manifest name target tables, not relationship attributes (the FK resolver’s contract).

Construction performs the read-only schema reflection and no other SQL: the owned tables ride your migrations (see effaced.bind_tables), and reflection emits no DDL or DML. Resolver registration stays explicit (never discovered): pass the resolver instances, or a prebuilt registry, but not both.

Args:

session_factory (sessionmaker): Factory producing sessions on the application database; used by the components that operate outside a caller transaction (audit sink, outbox claims). Bind it to the same database engine reflects.
engine (Engine): A connectable engine on the live database to reflect the manifest’s tables from.
manifest_payload (Mapping[str, Any]): A serialized manifest, as produced by DataMap.to_payload (any schema version — older payloads migrate forward).
resolvers (Sequence[Resolver]): External-system resolvers to register, by instance.
registry (ResolverRegistry | None): A prebuilt registry, mutually exclusive with resolvers.
audit_sink (AuditSink | None): Trail override; defaults to a effaced.DatabaseAuditSink on the mounted effaced_audit_events table.

Returns:

EffacedStack — The wired stack.

Raises:

ConfigurationError — If both resolvers and registry are given. Raised before any manifest load or reflection.
ManifestError — If the payload is structurally invalid or newer than this library understands (propagated from DataMap.from_payload), or if a manifest table is not found in the reflected database (propagated from effaced.reflect_metadata).
SubjectResolutionError — If the subject graph cannot be resolved from the reflected foreign keys (propagated from effaced.resolve_subject_graph_from_fk).

EffacedTables

class EffacedTables:
    def __init__(audit_events: Table, consent_records: Table, outbox: Table, restriction_records: Table) -> None

Handles to the four effaced-owned tables mounted on a MetaData.

Returned by effaced.bind_tables so downstream components (audit sink, consent ledger, outbox, restriction ledger) can reference the tables directly instead of looking them up by name.

Fields:

audit_events (Table): The append-only audit trail table.
consent_records (Table): The append-only consent event table.
outbox (Table): The durable external-call outbox table.
restriction_records (Table): The append-only restriction event table.

ErasureExecutor

class ErasureExecutor:
    def __init__(metadata: MetaData, surrogates: SurrogateRegistry | None = None) -> None

Executes one local erasure step per call, scoped to one subject.

The SQLAlchemy implementation of StepExecutor: each table’s TableAccessPlan hop chain becomes nested IN subqueries down to the subject identifier (shared with the rectification executor via the scoping module), so a step only ever touches the one subject’s rows. Statements run in the caller’s session and are never committed here (ADR 0006).

Two ADR 0007 consequences surface at this layer: a foreign-key reference into a row-deleted table from outside the subject path (e.g. another subject’s comment replying to this subject’s) fails loudly with the database’s integrity error, and ANONYMIZE rewrites rows one by one so every cell gets a fresh surrogate — unique constraints keep holding.

ErasureExecutor.execute

def execute(session: Session, graph: SubjectGraph, step: ErasureStep, subject_id: SubjectIdentifier) -> int

Run one local step scoped to one subject (see StepExecutor).

Args:

session (Session): The caller’s open session; never committed here.
graph (SubjectGraph): Resolved hop chains from each table to the subject.
step (ErasureStep): The local step to run.
subject_id (SubjectIdentifier): The subject identifier — a single-column str or a composite CompositeSubjectId; each component is coerced to its subject column’s python type for typed-parameter drivers.

Returns:

int — The number of rows deleted, anonymized, or counted as retained.

Raises:

ConfigurationError — If the step is external.
ManifestError — If the step targets a table or column missing from the bound metadata.
AnonymizationError — If an ANONYMIZE table has no primary key or a column type has no registered surrogate.

ErasureVerifier

class ErasureVerifier:
    def __init__(data_map: DataMap, graph: SubjectGraph, metadata: MetaData, *, audit_sink: AuditSink) -> None

Reads the annotated surface back after an erasure and records the verdict.

The verifier re-derives the plan’s table classification — the ADR 0007 row-delete versus anonymize/retain split — by calling ErasurePlanner internally and reading its local steps, never re-implementing the classification. It then counts, per table, the rows still scoped to the subject through the same hop-chain predicate the executor used (the shared scoping.subject_scope helper), issuing nothing but SELECT COUNT statements — it is strictly read-only and mutates no row.

The verdict proves execution fidelity, not erasure completeness, and the boundaries are stated on ErasureVerification:

it re-reads the same annotated surface the plan was built from, so un-annotated PII is invisible by construction;
a row orphaned off the subject’s path is unreachable by the scoping predicate and invisible here too;
anonymized cell values are not verified — surrogates are random, never NULL, so without a before-state they are indistinguishable from originals; that check needs a before-state and is out of scope.

verified is therefore the narrow claim that every row-deleted table holds zero subject-scoped rows; surviving (anonymize/retain) counts are reported for the record and never flip the verdict. This is never a determination that the subject is fully erased or that the controller is compliant.

ErasureVerifier.verify_subject_erased

def verify_subject_erased(session: Session, subject_id: SubjectIdentifier) -> ErasureVerification

Read the subject’s annotated surface back and record the verdict.

Re-derives the plan’s table classification, counts the subject’s surviving rows per table with SELECT COUNT statements only, and appends one audit event. Counting is strictly read-only; the caller’s session is never written to or committed here.

Args:

session (Session): An open database session; used for reads only.
subject_id (SubjectIdentifier): The subject identifier — a single-column str or a composite CompositeSubjectId; each component is coerced to its subject column’s python type for typed-parameter drivers.

Returns:

ErasureVerification — The verdict: verified is true iff every row-deleted table is
ErasureVerification — empty for this subject; residual and surviving carry the
ErasureVerification — per-table counts (see ErasureVerification).

Raises:

ManifestError — If a table or hop references a name missing from the bound metadata.
SubjectResolutionError — If the id cannot carry the subject column’s type.

lint_completeness

def lint_completeness(metadata: MetaData) -> tuple[CompletenessFinding, ...]

Find every place the metadata could hold undeclared personal data.

The exact complement of collect_data_map: every table is either in the data map, returned here as a whole-table finding, or an effaced-owned effaced_* table — and within a mapped table, every column is either annotated, a primary/foreign key, or returned here as a column finding. Nothing falls through silently.

Findings are questions, not verdicts — gate on them in CI with effaced.testing.assert_data_map_complete, which lets you exempt stores and fields you have consciously judged to hold no personal data.

Args:

metadata (MetaData): The MetaData holding your mapped tables (for the ORM, Base.metadata).

Returns:

tuple[CompletenessFinding, ...] — All findings, in deterministic table order, then column order.

Raises:

ManifestError — If an info entry under the effaced key is not a recognised annotation object — exactly the metadata collect_data_map rejects, so the complement contract holds even on malformed input.

lint_reachability

def lint_reachability(data_map: DataMap, orm_registry: registry) -> tuple[ReachabilityFinding, ...]

Find every annotated table the erasure planner cannot route to a subject.

resolve_subject_graph raises on the first unreachable table, so it answers “can this whole map be planned?” but not “which tables are the problem?”. This linter probes each concern independently and collects a finding per gap instead of raising, the way lint_completeness complements collect_data_map.

It is the exact inverse of resolution: lint_reachability(...) == () if and only if resolve_subject_graph succeeds on the same inputs. An empty result is the assurance that every annotated store has a subject path the planner can walk; any finding names a store whose data would otherwise be silently never erased.

Findings are questions, not verdicts — a table may be unreachable because its subject_link is wrong, because it is not ORM-mapped, or because the foreign keys form a cycle. effaced names the gap; the fix (and the judgement that a store needs no path at all) stays yours.

Args:

data_map (DataMap): The collected manifest (see collect_data_map).
orm_registry (registry): The ORM registry holding the mapped classes — for declarative styles, Base.registry.

Returns:

ReachabilityFinding — All findings, in deterministic order: the subject-anchor findings
... — first, then one per unreachable non-subject table in manifest order,
tuple[ReachabilityFinding, ...] — then a graph-level foreign-key-cycle finding if one remains.

Raises:

ManifestError — If an info entry under the effaced key is not a recognised annotation object — exactly the malformed metadata collect_data_map rejects. Lintable conditions (a missing or unreachable path) never raise; they become findings.

LintTarget

class LintTarget:
    def __init__(metadata: MetaData, orm_registry: registry | None) -> None

The live SQLAlchemy handles a lint run needs.

Loaded by load_lint_target from a module:attribute spec, the way Alembic and Gunicorn locate an app object. Holds live handles — not a serialized copy — so the linters walk the same metadata the application runs against.

Fields:

metadata (MetaData): The MetaData holding the mapped tables — the input to collect_data_map and lint_completeness.
orm_registry (registry | None): The ORM registry holding the mapped classes, or None when the spec resolved to a bare MetaData (no mappers). Reachability linting needs the registry; with it None the caller can lint completeness only.

load_lint_target

def load_lint_target(spec: str) -> LintTarget

Import and resolve a module.path:attribute spec into a lint target.

The attribute may be a declarative Base (anything exposing both .metadata and .registry) or a bare MetaData. A Base yields a target carrying both handles; a bare MetaData yields one with orm_registry set to None (completeness-only linting).

Args:

spec (str): module.path:attribute — e.g. myapp.models:Base or myapp.db:metadata.

Returns:

LintTarget — The resolved LintTarget.

Raises:

ConfigurationError — If the spec is not module:attribute, the module cannot be imported, the attribute is missing, or the attribute is neither a declarative Base nor a MetaData — every failure names what to fix, never guesses.

pii

def pii(category: PiiCategory, *, erasure: ErasureStrategy = ErasureStrategy.DELETE, retention: RetentionPolicy | None = None, legal_basis: LegalBasis | None = None, purpose: str | None = None, description: str | None = None) -> dict[str, Any]

Declare a column as personal data.

Returns an info dict fragment for mapped_column(info=...) / Column(info=...). Keeping this a function (not a bare dict) lets the manifest format evolve behind a stable call signature.

Args:

category (PiiCategory): What kind of personal data the column holds.
erasure (ErasureStrategy): Erasure behaviour; defaults to deletion.
retention (RetentionPolicy | None): Legal retention duty; required for RETAIN.
legal_basis (LegalBasis | None): Lawful basis, surfaced in Art. 15 exports.
purpose (str | None): Processing purpose, surfaced in Art. 15 exports.
description (str | None): Free-text note for audits.

Returns:

dict[str, Any] — A dict suitable for SQLAlchemy’s info parameter.

RectificationExecutor

class RectificationExecutor:
    def __init__(metadata: MetaData) -> None

Executes one local rectification step per call, scoped to one subject.

The SQLAlchemy implementation of RectificationStepExecutor: one UPDATE per step, scoped through the same hop-chain predicate the erasure executor uses, writing the single corrected value into every one of the step’s columns. Statements run in the caller’s session and are never committed here (ADR 0006).

One shared value per cell is correct here — the correction is one value, unlike anonymization’s per-row surrogates — and category-keyed writes are deliberately blunt (ADR 0013): a step cannot fix one row but not another. A unique-constraint collision (two matched rows forced to the same corrected value) surfaces as the database’s own error and is audited as a step failure by the caller.

RectificationExecutor.execute

def execute(session: Session, graph: SubjectGraph, step: RectificationStep, subject_id: SubjectIdentifier, value: str | int | float | bool) -> int

Run one local step scoped to one subject.

Args:

session (Session): The caller’s open session; never committed here.
graph (SubjectGraph): Resolved hop chains from each table to the subject.
step (RectificationStep): The value-free local step to run.
subject_id (SubjectIdentifier): The subject identifier — a single-column str or a composite CompositeSubjectId; each component is coerced to its subject column’s python type for typed-parameter drivers.
value (str | int | float | bool): The corrected value, written into every step column.

Returns:

int — The number of rows the UPDATE matched.

Raises:

ManifestError — If the step targets a table or column missing from the bound metadata.

reflect_metadata

def reflect_metadata(engine: Engine, *, only: Sequence[str] | None = None) -> MetaData

Reflect a database’s tables into a fresh MetaData.

Wraps MetaData.reflect. When only is given, just those tables (and the foreign keys among them) are reflected, so unrelated tables in the same database never enter the subject graph — pass the manifest’s table names to scope reflection to the declared surface. Omitting only reflects every table the engine can see.

The reflection runs read-only catalog queries against the live connection; it issues no DDL and no DML.

Args:

engine (Engine): A connectable engine bound to the live database.
only (Sequence[str] | None): Table names to reflect; None reflects all tables.

Returns:

MetaData — A fresh MetaData holding the reflected tables and their
MetaData — foreign-key constraints.

Raises:

ManifestError — If a name in only is not a table in the live database — the manifest names a table the database does not have, so no subject graph can be resolved against it.

resolve_subject_graph

def resolve_subject_graph(data_map: DataMap, orm_registry: registry) -> SubjectGraph

Resolve every subject-link path in a data map into a subject graph.

Each table’s dotted relationship path is walked against the ORM mappers and flattened into foreign-key column pairs; the resulting accesses are ordered FK-safely for deletion (children before parents, subject table last). Joined-table inheritance is keyed by each mapper’s local table only.

Args:

data_map (DataMap): The collected manifest (see collect_data_map).
orm_registry (registry): The ORM registry holding the mapped classes — for declarative styles, Base.registry.

Returns:

SubjectGraph — The resolved, FK-safely ordered SubjectGraph.

Raises:

SubjectResolutionError — If no (or more than one) table declares subject_link(""), a table holds personal data without a subject link, a table is not ORM-mapped, a path segment is not a relationship, a path joins through a many-to-many secondary table, a path does not end at the subject table, a declared subject id column does not exist or the subject-id columns are declared on a non-subject table, or the foreign keys between resolved tables form a cycle.

resolve_subject_graph_from_fk

def resolve_subject_graph_from_fk(data_map: DataMap, metadata: MetaData) -> SubjectGraph

Resolve a subject graph from foreign-key constraints, without ORM mappers.

The mapper-free sibling of resolve_subject_graph. Where that function walks ORM relationships keyed by attribute name, this one walks the foreign-key constraints on the bound MetaData directly: each segment of a table’s dotted subject_link path names the next table on the chain to the subject, and the single foreign key between the current table and that target supplies the join columns. It is the resolver an adapter reaches for when it has table metadata but no ORM registry — reflected schemas, hand-built metadata, or non-ORM layers such as the Django adapter (which translates Model._meta into annotated tables and FK constraints).

The produced graph is identical in shape and guarantees to the ORM-resolved one — same FK-safe deletion order, same hop chains, same fully_pii_owned classification — so every downstream engine behaves identically regardless of which resolver built it.

Args:

data_map (DataMap): The collected manifest (see collect_data_map).
metadata (MetaData): The MetaData holding the annotated tables, with the foreign-key constraints that link them.

Returns:

SubjectGraph — The resolved, FK-safely ordered SubjectGraph.

Raises:

SubjectResolutionError — If no (or more than one) table declares subject_link(""), a table holds personal data without a subject link, a table is absent from the metadata, a declared subject id column does not exist, a path segment names a table not in the metadata, the current table has no single foreign key to the next path segment (none, or an ambiguous several), a path does not end at the subject table, or the foreign keys between resolved tables form a cycle.

SqlStatusCountsSource

class SqlStatusCountsSource:
    ...

Counts outbox entries per status with a single GROUP BY query.

The SQLAlchemy implementation of StatusCountsSource: it aggregates in the database (SELECT status, count(*) ... GROUP BY status) rather than streaming every row back to Python, so a large outbox costs one cheap query. Stateless — share one instance freely.

The result is zero-filled over every OutboxStatus member, so it is byte-for-byte interchangeable with the Python-side count status_counts falls back to.

SqlStatusCountsSource.status_counts

def status_counts(outbox: Table, session_factory: sessionmaker) -> dict[OutboxStatus, int]

Count outbox entries per lifecycle status, SQL-side.

Args:

outbox (Table): The effaced_outbox table handle to count over.
session_factory (sessionmaker): Factory producing sessions on the database holding that table; a short-lived read session is opened.

Returns:

dict[OutboxStatus, int] — A mapping with one entry per OutboxStatus,
dict[OutboxStatus, int] — zero-filled where no rows exist for a status.

subject_link

def subject_link(path: str, *, subject_id_columns: tuple[str, ...] | str = 'id', subject_id_column: str | None = None) -> dict[str, Any]

Declare how a table reaches the data subject.

Attach via Table.info or the mapped class’s __table_args__ info dict. The subject table itself declares subject_link("").

Args:

path (str): Dotted relationship path to the subject table.
subject_id_columns (tuple[str, ...] | str): Ordered identifier column(s) on the subject table. A bare str is the single-column case (the default, "id"); a tuple declares a composite subject key whose order aligns to the caller’s CompositeSubjectId values (ADR 0025).
subject_id_column (str | None): Deprecated singular alias for the one-column case; mutually exclusive with subject_id_columns. Kept so existing single-column annotations need no edit.

Returns:

dict[str, Any] — A dict suitable for SQLAlchemy’s table-level info parameter.

Raises:

ConfigurationError — If both subject_id_columns and the subject_id_column alias are passed non-default.

SurrogateRegistry

class SurrogateRegistry:
    def __init__() -> None

Maps SQLAlchemy column types to surrogate-value factories.

Anonymization replaces a value with an irreversible surrogate instead of NULL — surrogates stay valid under NOT NULL and unique constraints, which is why factories are invoked once per cell (string and UUID surrogates are unique per call). Lookup walks the column type’s MRO, so registering String also covers Text and every other String subclass.

Unlike ResolverRegistry, re-registering a type silently overrides it (last wins): replacing a default surrogate with your own is the very point of extensibility, and nothing audited depends on which factory produced a surrogate.

The registry is consumed by the erasure executor, never by plan — plans carry no values, which keeps them deterministic and side-effect-free.

SurrogateRegistry.register

def register(sa_type: type[TypeEngine[Any]], factory: Callable[[], object]) -> None

Map one SQLAlchemy type (and its subclasses) to a factory.

Args:

sa_type (type[TypeEngine[Any]]): The type class to cover, e.g. sqlalchemy.String.
factory (Callable[[], object]): Zero-argument callable producing one surrogate value; called once per anonymized cell.

SurrogateRegistry.surrogate_for

def surrogate_for(column_type: TypeEngine[Any]) -> object

Produce one surrogate value for a column of the given type.

Args:

column_type (TypeEngine[Any]): The column’s type instance, e.g. Text().

Returns:

object — A fresh, type-valid surrogate value.

Raises:

AnonymizationError — If neither the type nor any of its base classes has a registered factory.