SQLAlchemy adapter
SQLAlchemy adapter — the first authoring layer for the effaced core.
The core (annotations, manifest, engine) is storage-agnostic; this package
is the thin layer that knows SQLAlchemy: authoring helpers that ride the
info dict, a collector that derives the manifest from metadata, a
resolver that turns subject-link paths into a subject graph, the
anonymizer surrogate registry, the erasure executor that runs local steps,
the erasure verifier that reads the annotated surface back afterwards, the
completeness linter that flags what the manifest does not cover, the
reachability linter that flags annotated tables the planner cannot reach,
the effaced-owned storage tables mounted via bind_tables, a
reflection helper that lifts a live database’s schema into MetaData, and
the EffacedStack facade that wires every engine from one schema
source — an annotated base or a serialized manifest plus a reflected database.
bind_tables
Section titled “bind_tables”def bind_tables(metadata: MetaData) -> EffacedTablesMount the effaced-owned tables on the application’s MetaData.
Defines effaced_audit_events, effaced_consent_records,
effaced_outbox and effaced_restriction_records so they live in
your database and ride your migration tooling — no migration tool is
assumed and no DDL is executed here. Calling it again on the same
MetaData is a no-op returning the already-mounted tables, so
module-level and app-factory setup styles both work.
With Alembic, call this where your env.py’s target_metadata is
defined; alembic revision --autogenerate then picks the tables up
like your own. New effaced releases may add tables, columns or indexes
in MINOR versions — re-run autogenerate after upgrading. Owned-table
changes are additive-only, and additive columns backfill populated
tables via server defaults — the caller-owned migration contract is
ADR 0021 (see the Alembic guide on the docs site). Without a
migration tool, metadata.create_all(engine) creates them directly.
Example:
>>> from effaced import bind_tables>>> tables = bind_tables(Base.metadata) # doctest: +SKIP>>> tables.audit_events.name # doctest: +SKIP'effaced_audit_events'Args:
- metadata (
MetaData): TheMetaDatayour migrations already manage (typicallyBase.metadata).
Returns:
EffacedTables— Handles to the four mounted tables.
Raises:
ValueError— If only some of the table names already exist onmetadata— i.e. a table of your own collides with aneffaced_-prefixed name.
collect_data_map
Section titled “collect_data_map”def collect_data_map(metadata: MetaData) -> DataMapCollect every effaced annotation from SQLAlchemy metadata.
Args:
- metadata (
MetaData): TheMetaDataholding your mapped tables (for the ORM,Base.metadata).
Returns:
DataMap— class:DataMapcontaining only tables with at least oneDataMap— annotation (apiicolumn or asubject_link).
Raises:
ManifestError— If aninfoentry under the effaced key is not a recognised annotation object, or a retention policy names an anchor column that does not exist on the table or is not datetime-typed (ADR 0012 — fail loudly at assembly, before any sweep runs).
default_surrogate_registry
Section titled “default_surrogate_registry”def default_surrogate_registry() -> SurrogateRegistryA registry covering the common SQLAlchemy scalar types.
Strings become unique opaque tokens (anon-…), numbers and booleans
become zero-values, dates collapse to the Unix epoch, and UUIDs become
fresh random UUIDs. The epoch datetime is naive and also resolves (via
MRO) for DateTime(timezone=True) columns — register a tz-aware
factory when your dialect rejects naive values there.
Returns:
SurrogateRegistry— A new, independently extensible registry.
EffacedStack
Section titled “EffacedStack”class EffacedStack: def __init__(metadata: MetaData, data_map: DataMap, graph: SubjectGraph, tables: EffacedTables, session_factory: sessionmaker, registry: ResolverRegistry, audit_sink: AuditSink, outbox: Outbox, exporter: Exporter, planner: ErasurePlanner, rectifier: Rectifier, consent: ConsentLedger, restriction: RestrictionLedger, sweeper: RetentionSweeper, saga_runner: SagaRunner) -> NoneEvery effaced engine, wired once from one schema source.
The manual integration sequence — collect the data map, resolve the subject graph, mount the owned tables, construct the audit sink, the outbox, and each engine — is mechanical and identical in every application. The two classmethods perform it in one call and return the wired components as named handles, so a web layer (or your own glue) only decides when to call them, never how to build them.
Pick the entry point by where the schema comes from. from_base
derives the manifest and graph from an annotated declarative base — the
in-process ORM is the source of truth. from_manifest runs the
same engines from a serialized manifest paired with a reflected live
database, for callers whose models live elsewhere (a separate service,
another language) but who export the manifest and connect to the same
database. Both wire identical engines and are proven byte-identical on
the same schema.
The stack adds no behaviour of its own: each handle is exactly the
component you could have constructed by hand, governed by its own
documented contract. Construction executes no SQL beyond the read-only
reflection from_manifest performs — the owned tables ride your
migrations (see effaced.bind_tables).
Fields:
- audit_sink (
AuditSink): The append-only trail every engine records into. - consent (
ConsentLedger): The Art. 7 consent ledger. - data_map (
DataMap): The manifest collected from the annotated models. - exporter (
Exporter): The Art. 15 export engine. - graph (
SubjectGraph): The resolved subject graph used to scope every operation. - metadata (
MetaData): The applicationMetaDatathe stack was built from. - outbox (
Outbox): The durable queue for external erasure/rectification calls. - planner (
ErasurePlanner): The Art. 17 erasure engine, execution-ready. - rectifier (
Rectifier): The Art. 16 rectification engine, execution-ready. - registry (
ResolverRegistry): The resolver registry routing external refs. - restriction (
RestrictionLedger): The Art. 18 restriction-of-processing ledger. - saga_runner (
SagaRunner): The outbox drainer — drive it from a worker, never on a serving event loop (ADR 0006). - session_factory (
sessionmaker): The application’s session factory, as provided. - sweeper (
RetentionSweeper): The Art. 5(1)(e) retention sweeper (report-only). - tables (
EffacedTables): Handles to the four effaced-owned tables.
EffacedStack.from_base
Section titled “EffacedStack.from_base”def from_base(base: type[DeclarativeBase], session_factory: sessionmaker, *, resolvers: Sequence[Resolver] = (), registry: ResolverRegistry | None = None, audit_sink: AuditSink | None = None) -> EffacedStackWire the full stack from an annotated declarative base.
Collects the data map from base.metadata, resolves the subject
graph through base.registry, mounts the owned tables, and
constructs every engine with the SQLAlchemy executors. Resolver
registration stays explicit (never discovered): pass the resolver
instances, or a prebuilt registry — e.g. from
effaced.registry_from_settings — but not both.
Args:
- base (
type[DeclarativeBase]): The declarative base whose models carry theeffaced.pii/effaced.subject_linkannotations. - session_factory (
sessionmaker): Factory producing sessions on the application database; used by the components that operate outside a caller transaction (audit sink, outbox claims). - resolvers (
Sequence[Resolver]): External-system resolvers to register, by instance. - registry (
ResolverRegistry | None): A prebuilt registry, mutually exclusive withresolvers. - audit_sink (
AuditSink | None): Trail override; defaults to aeffaced.DatabaseAuditSinkon the mountedeffaced_audit_eventstable.
Returns:
EffacedStack— The wired stack.
Raises:
ConfigurationError— If bothresolversandregistryare given — two sources of truth for “where is my PII” would make the registration ambiguous. Raised before any data-map collection or graph resolution.ManifestError— If the annotations onbaseare invalid (propagated fromeffaced.collect_data_map).
EffacedStack.from_manifest
Section titled “EffacedStack.from_manifest”def from_manifest(session_factory: sessionmaker, engine: Engine, manifest_payload: Mapping[str, Any], *, resolvers: Sequence[Resolver] = (), registry: ResolverRegistry | None = None, audit_sink: AuditSink | None = None) -> EffacedStackWire the full stack from a serialized manifest and a live database.
The mapper-free counterpart of from_base, for callers whose
models are not in this process. Loads (and forward-migrates) the
manifest with DataMap.from_payload, reflects exactly
the manifest’s tables off engine with
effaced.reflect_metadata, resolves the subject graph from the
reflected foreign keys with
effaced.resolve_subject_graph_from_fk, mounts the owned
tables, and constructs every engine with the SQLAlchemy executors —
the same engines from_base builds, proven byte-identical on
the same schema. Subject-link paths in the manifest name target
tables, not relationship attributes (the FK resolver’s contract).
Construction performs the read-only schema reflection and no other
SQL: the owned tables ride your migrations (see
effaced.bind_tables), and reflection emits no DDL or DML.
Resolver registration stays explicit (never discovered): pass the
resolver instances, or a prebuilt registry, but not both.
Args:
- session_factory (
sessionmaker): Factory producing sessions on the application database; used by the components that operate outside a caller transaction (audit sink, outbox claims). Bind it to the same databaseenginereflects. - engine (
Engine): A connectable engine on the live database to reflect the manifest’s tables from. - manifest_payload (
Mapping[str, Any]): A serialized manifest, as produced byDataMap.to_payload(any schema version — older payloads migrate forward). - resolvers (
Sequence[Resolver]): External-system resolvers to register, by instance. - registry (
ResolverRegistry | None): A prebuilt registry, mutually exclusive withresolvers. - audit_sink (
AuditSink | None): Trail override; defaults to aeffaced.DatabaseAuditSinkon the mountedeffaced_audit_eventstable.
Returns:
EffacedStack— The wired stack.
Raises:
ConfigurationError— If bothresolversandregistryare given. Raised before any manifest load or reflection.ManifestError— If the payload is structurally invalid or newer than this library understands (propagated fromDataMap.from_payload), or if a manifest table is not found in the reflected database (propagated fromeffaced.reflect_metadata).SubjectResolutionError— If the subject graph cannot be resolved from the reflected foreign keys (propagated fromeffaced.resolve_subject_graph_from_fk).
EffacedTables
Section titled “EffacedTables”class EffacedTables: def __init__(audit_events: Table, consent_records: Table, outbox: Table, restriction_records: Table) -> NoneHandles to the four effaced-owned tables mounted on a MetaData.
Returned by effaced.bind_tables so downstream components (audit
sink, consent ledger, outbox, restriction ledger) can reference the
tables directly instead of looking them up by name.
Fields:
- audit_events (
Table): The append-only audit trail table. - consent_records (
Table): The append-only consent event table. - outbox (
Table): The durable external-call outbox table. - restriction_records (
Table): The append-only restriction event table.
ErasureExecutor
Section titled “ErasureExecutor”class ErasureExecutor: def __init__(metadata: MetaData, surrogates: SurrogateRegistry | None = None) -> NoneExecutes one local erasure step per call, scoped to one subject.
The SQLAlchemy implementation of
StepExecutor: each table’s
TableAccessPlan hop chain becomes nested IN
subqueries down to the subject identifier (shared with the
rectification executor via the scoping module), so a step only ever
touches the one subject’s rows. Statements run in the caller’s session
and are never committed here (ADR 0006).
Two ADR 0007 consequences surface at this layer: a foreign-key
reference into a row-deleted table from outside the subject path
(e.g. another subject’s comment replying to this subject’s) fails
loudly with the database’s integrity error, and ANONYMIZE rewrites
rows one by one so every cell gets a fresh surrogate — unique
constraints keep holding.
ErasureExecutor.execute
Section titled “ErasureExecutor.execute”def execute(session: Session, graph: SubjectGraph, step: ErasureStep, subject_id: SubjectIdentifier) -> intRun one local step scoped to one subject (see StepExecutor).
Args:
- session (
Session): The caller’s open session; never committed here. - graph (
SubjectGraph): Resolved hop chains from each table to the subject. - step (
ErasureStep): The local step to run. - subject_id (
SubjectIdentifier): The subject identifier — a single-columnstror a compositeCompositeSubjectId; each component is coerced to its subject column’s python type for typed-parameter drivers.
Returns:
int— The number of rows deleted, anonymized, or counted as retained.
Raises:
ConfigurationError— If the step is external.ManifestError— If the step targets a table or column missing from the bound metadata.AnonymizationError— If anANONYMIZEtable has no primary key or a column type has no registered surrogate.
ErasureVerifier
Section titled “ErasureVerifier”class ErasureVerifier: def __init__(data_map: DataMap, graph: SubjectGraph, metadata: MetaData, *, audit_sink: AuditSink) -> NoneReads the annotated surface back after an erasure and records the verdict.
The verifier re-derives the plan’s table classification — the ADR 0007
row-delete versus anonymize/retain split — by calling
ErasurePlanner internally and reading its local steps,
never re-implementing the classification. It then counts, per table, the
rows still scoped to the subject through the same hop-chain predicate the
executor used (the shared scoping.subject_scope helper), issuing
nothing but SELECT COUNT statements — it is strictly
read-only and mutates no row.
The verdict proves execution fidelity, not erasure completeness, and
the boundaries are stated on ErasureVerification:
- it re-reads the same annotated surface the plan was built from, so un-annotated PII is invisible by construction;
- a row orphaned off the subject’s path is unreachable by the scoping predicate and invisible here too;
- anonymized cell values are not verified — surrogates are random, never NULL, so without a before-state they are indistinguishable from originals; that check needs a before-state and is out of scope.
verified is therefore the narrow claim that every row-deleted table
holds zero subject-scoped rows; surviving (anonymize/retain) counts are
reported for the record and never flip the verdict. This is never a
determination that the subject is fully erased or that the controller is
compliant.
ErasureVerifier.verify_subject_erased
Section titled “ErasureVerifier.verify_subject_erased”def verify_subject_erased(session: Session, subject_id: SubjectIdentifier) -> ErasureVerificationRead the subject’s annotated surface back and record the verdict.
Re-derives the plan’s table classification, counts the subject’s
surviving rows per table with SELECT COUNT statements only, and
appends one audit event. Counting is strictly read-only; the caller’s
session is never written to or committed here.
Args:
- session (
Session): An open database session; used for reads only. - subject_id (
SubjectIdentifier): The subject identifier — a single-columnstror a compositeCompositeSubjectId; each component is coerced to its subject column’s python type for typed-parameter drivers.
Returns:
ErasureVerification— The verdict:verifiedis true iff every row-deleted table isErasureVerification— empty for this subject;residualandsurvivingcarry theErasureVerification— per-table counts (seeErasureVerification).
Raises:
ManifestError— If a table or hop references a name missing from the bound metadata.SubjectResolutionError— If the id cannot carry the subject column’s type.
lint_completeness
Section titled “lint_completeness”def lint_completeness(metadata: MetaData) -> tuple[CompletenessFinding, ...]Find every place the metadata could hold undeclared personal data.
The exact complement of collect_data_map: every table is either
in the data map, returned here as a whole-table finding, or an
effaced-owned effaced_* table — and within a mapped table, every
column is either annotated, a primary/foreign key, or returned here as
a column finding. Nothing falls through silently.
Findings are questions, not verdicts — gate on them in CI with
effaced.testing.assert_data_map_complete, which lets you exempt
stores and fields you have consciously judged to hold no personal data.
Args:
- metadata (
MetaData): TheMetaDataholding your mapped tables (for the ORM,Base.metadata).
Returns:
tuple[CompletenessFinding, ...]— All findings, in deterministic table order, then column order.
Raises:
ManifestError— If aninfoentry under the effaced key is not a recognised annotation object — exactly the metadatacollect_data_maprejects, so the complement contract holds even on malformed input.
lint_reachability
Section titled “lint_reachability”def lint_reachability(data_map: DataMap, orm_registry: registry) -> tuple[ReachabilityFinding, ...]Find every annotated table the erasure planner cannot route to a subject.
resolve_subject_graph raises on the first unreachable table, so it
answers “can this whole map be planned?” but not “which tables are the
problem?”. This linter probes each concern independently and collects a
finding per gap instead of raising, the way lint_completeness
complements collect_data_map.
It is the exact inverse of resolution: lint_reachability(...) == () if
and only if resolve_subject_graph succeeds on the same inputs. An
empty result is the assurance that every annotated store has a subject path
the planner can walk; any finding names a store whose data would otherwise
be silently never erased.
Findings are questions, not verdicts — a table may be unreachable because
its subject_link is wrong, because it is not ORM-mapped, or because the
foreign keys form a cycle. effaced names the gap; the fix (and the judgement
that a store needs no path at all) stays yours.
Args:
- data_map (
DataMap): The collected manifest (seecollect_data_map). - orm_registry (
registry): The ORM registry holding the mapped classes — for declarative styles,Base.registry.
Returns:
ReachabilityFinding— All findings, in deterministic order: the subject-anchor findings...— first, then one per unreachable non-subject table in manifest order,tuple[ReachabilityFinding, ...]— then a graph-level foreign-key-cycle finding if one remains.
Raises:
ManifestError— If aninfoentry under the effaced key is not a recognised annotation object — exactly the malformed metadatacollect_data_maprejects. Lintable conditions (a missing or unreachable path) never raise; they become findings.
LintTarget
Section titled “LintTarget”class LintTarget: def __init__(metadata: MetaData, orm_registry: registry | None) -> NoneThe live SQLAlchemy handles a lint run needs.
Loaded by load_lint_target from a module:attribute spec, the
way Alembic and Gunicorn locate an app object. Holds live handles — not a
serialized copy — so the linters walk the same metadata the application
runs against.
Fields:
- metadata (
MetaData): TheMetaDataholding the mapped tables — the input tocollect_data_mapandlint_completeness. - orm_registry (
registry | None): The ORM registry holding the mapped classes, orNonewhen the spec resolved to a bareMetaData(no mappers). Reachability linting needs the registry; with itNonethe caller can lint completeness only.
load_lint_target
Section titled “load_lint_target”def load_lint_target(spec: str) -> LintTargetImport and resolve a module.path:attribute spec into a lint target.
The attribute may be a declarative Base (anything exposing both
.metadata and .registry) or a bare MetaData. A Base yields a
target carrying both handles; a bare MetaData yields one with
orm_registry set to None (completeness-only linting).
Args:
- spec (
str):module.path:attribute— e.g.myapp.models:Baseormyapp.db:metadata.
Returns:
LintTarget— The resolvedLintTarget.
Raises:
ConfigurationError— If the spec is notmodule:attribute, the module cannot be imported, the attribute is missing, or the attribute is neither a declarativeBasenor aMetaData— every failure names what to fix, never guesses.
def pii(category: PiiCategory, *, erasure: ErasureStrategy = ErasureStrategy.DELETE, retention: RetentionPolicy | None = None, legal_basis: LegalBasis | None = None, purpose: str | None = None, description: str | None = None) -> dict[str, Any]Declare a column as personal data.
Returns an info dict fragment for mapped_column(info=...) /
Column(info=...). Keeping this a function (not a bare dict) lets the
manifest format evolve behind a stable call signature.
Args:
- category (
PiiCategory): What kind of personal data the column holds. - erasure (
ErasureStrategy): Erasure behaviour; defaults to deletion. - retention (
RetentionPolicy | None): Legal retention duty; required forRETAIN. - legal_basis (
LegalBasis | None): Lawful basis, surfaced in Art. 15 exports. - purpose (
str | None): Processing purpose, surfaced in Art. 15 exports. - description (
str | None): Free-text note for audits.
Returns:
dict[str, Any]— A dict suitable for SQLAlchemy’sinfoparameter.
RectificationExecutor
Section titled “RectificationExecutor”class RectificationExecutor: def __init__(metadata: MetaData) -> NoneExecutes one local rectification step per call, scoped to one subject.
The SQLAlchemy implementation of
RectificationStepExecutor: one
UPDATE per step, scoped through the same hop-chain predicate the
erasure executor uses, writing the single corrected value into every
one of the step’s columns. Statements run in the caller’s session and
are never committed here (ADR 0006).
One shared value per cell is correct here — the correction is one value, unlike anonymization’s per-row surrogates — and category-keyed writes are deliberately blunt (ADR 0013): a step cannot fix one row but not another. A unique-constraint collision (two matched rows forced to the same corrected value) surfaces as the database’s own error and is audited as a step failure by the caller.
RectificationExecutor.execute
Section titled “RectificationExecutor.execute”def execute(session: Session, graph: SubjectGraph, step: RectificationStep, subject_id: SubjectIdentifier, value: str | int | float | bool) -> intRun one local step scoped to one subject.
Args:
- session (
Session): The caller’s open session; never committed here. - graph (
SubjectGraph): Resolved hop chains from each table to the subject. - step (
RectificationStep): The value-free local step to run. - subject_id (
SubjectIdentifier): The subject identifier — a single-columnstror a compositeCompositeSubjectId; each component is coerced to its subject column’s python type for typed-parameter drivers. - value (
str | int | float | bool): The corrected value, written into every step column.
Returns:
int— The number of rows the UPDATE matched.
Raises:
ManifestError— If the step targets a table or column missing from the bound metadata.
reflect_metadata
Section titled “reflect_metadata”def reflect_metadata(engine: Engine, *, only: Sequence[str] | None = None) -> MetaDataReflect a database’s tables into a fresh MetaData.
Wraps MetaData.reflect. When only is given, just
those tables (and the foreign keys among them) are reflected, so
unrelated tables in the same database never enter the subject graph —
pass the manifest’s table names to scope reflection to the declared
surface. Omitting only reflects every table the engine can see.
The reflection runs read-only catalog queries against the live connection; it issues no DDL and no DML.
Args:
- engine (
Engine): A connectable engine bound to the live database. - only (
Sequence[str] | None): Table names to reflect;Nonereflects all tables.
Returns:
MetaData— A freshMetaDataholding the reflected tables and theirMetaData— foreign-key constraints.
Raises:
ManifestError— If a name inonlyis not a table in the live database — the manifest names a table the database does not have, so no subject graph can be resolved against it.
resolve_subject_graph
Section titled “resolve_subject_graph”def resolve_subject_graph(data_map: DataMap, orm_registry: registry) -> SubjectGraphResolve every subject-link path in a data map into a subject graph.
Each table’s dotted relationship path is walked against the ORM mappers and flattened into foreign-key column pairs; the resulting accesses are ordered FK-safely for deletion (children before parents, subject table last). Joined-table inheritance is keyed by each mapper’s local table only.
Args:
- data_map (
DataMap): The collected manifest (seecollect_data_map). - orm_registry (
registry): The ORM registry holding the mapped classes — for declarative styles,Base.registry.
Returns:
SubjectGraph— The resolved, FK-safely orderedSubjectGraph.
Raises:
SubjectResolutionError— If no (or more than one) table declaressubject_link(""), a table holds personal data without a subject link, a table is not ORM-mapped, a path segment is not a relationship, a path joins through a many-to-many secondary table, a path does not end at the subject table, a declared subject id column does not exist or the subject-id columns are declared on a non-subject table, or the foreign keys between resolved tables form a cycle.
resolve_subject_graph_from_fk
Section titled “resolve_subject_graph_from_fk”def resolve_subject_graph_from_fk(data_map: DataMap, metadata: MetaData) -> SubjectGraphResolve a subject graph from foreign-key constraints, without ORM mappers.
The mapper-free sibling of resolve_subject_graph. Where that
function walks ORM relationships keyed by attribute name, this one walks
the foreign-key constraints on the bound MetaData
directly: each segment of a table’s dotted subject_link path names
the next table on the chain to the subject, and the single foreign
key between the current table and that target supplies the join columns.
It is the resolver an adapter reaches for when it has table metadata but
no ORM registry — reflected schemas, hand-built metadata, or non-ORM
layers such as the Django adapter (which translates Model._meta into
annotated tables and FK constraints).
The produced graph is identical in shape and guarantees to the
ORM-resolved one — same FK-safe deletion order, same hop chains, same
fully_pii_owned classification — so every downstream engine behaves
identically regardless of which resolver built it.
Args:
- data_map (
DataMap): The collected manifest (seecollect_data_map). - metadata (
MetaData): TheMetaDataholding the annotated tables, with the foreign-key constraints that link them.
Returns:
SubjectGraph— The resolved, FK-safely orderedSubjectGraph.
Raises:
SubjectResolutionError— If no (or more than one) table declaressubject_link(""), a table holds personal data without a subject link, a table is absent from the metadata, a declared subject id column does not exist, a path segment names a table not in the metadata, the current table has no single foreign key to the next path segment (none, or an ambiguous several), a path does not end at the subject table, or the foreign keys between resolved tables form a cycle.
SqlStatusCountsSource
Section titled “SqlStatusCountsSource”class SqlStatusCountsSource: ...Counts outbox entries per status with a single GROUP BY query.
The SQLAlchemy implementation of
StatusCountsSource: it aggregates in the database
(SELECT status, count(*) ... GROUP BY status) rather than streaming
every row back to Python, so a large outbox costs one cheap query.
Stateless — share one instance freely.
The result is zero-filled over every OutboxStatus
member, so it is byte-for-byte interchangeable with the Python-side
count status_counts falls back to.
SqlStatusCountsSource.status_counts
Section titled “SqlStatusCountsSource.status_counts”def status_counts(outbox: Table, session_factory: sessionmaker) -> dict[OutboxStatus, int]Count outbox entries per lifecycle status, SQL-side.
Args:
- outbox (
Table): Theeffaced_outboxtable handle to count over. - session_factory (
sessionmaker): Factory producing sessions on the database holding that table; a short-lived read session is opened.
Returns:
dict[OutboxStatus, int]— A mapping with one entry perOutboxStatus,dict[OutboxStatus, int]— zero-filled where no rows exist for a status.
subject_link
Section titled “subject_link”def subject_link(path: str, *, subject_id_columns: tuple[str, ...] | str = 'id', subject_id_column: str | None = None) -> dict[str, Any]Declare how a table reaches the data subject.
Attach via Table.info or the mapped class’s __table_args__
info dict. The subject table itself declares subject_link("").
Args:
- path (
str): Dotted relationship path to the subject table. - subject_id_columns (
tuple[str, ...] | str): Ordered identifier column(s) on the subject table. A barestris the single-column case (the default,"id"); a tuple declares a composite subject key whose order aligns to the caller’sCompositeSubjectIdvalues (ADR 0025). - subject_id_column (
str | None): Deprecated singular alias for the one-column case; mutually exclusive withsubject_id_columns. Kept so existing single-column annotations need no edit.
Returns:
dict[str, Any]— A dict suitable for SQLAlchemy’s table-levelinfoparameter.
Raises:
ConfigurationError— If bothsubject_id_columnsand thesubject_id_columnalias are passed non-default.
SurrogateRegistry
Section titled “SurrogateRegistry”class SurrogateRegistry: def __init__() -> NoneMaps SQLAlchemy column types to surrogate-value factories.
Anonymization replaces a value with an irreversible surrogate instead
of NULL — surrogates stay valid under NOT NULL and unique
constraints, which is why factories are invoked once per cell (string
and UUID surrogates are unique per call). Lookup walks the column
type’s MRO, so registering String also covers
Text and every other String subclass.
Unlike ResolverRegistry, re-registering a
type silently overrides it (last wins): replacing a default surrogate
with your own is the very point of extensibility, and nothing audited
depends on which factory produced a surrogate.
The registry is consumed by the erasure executor, never by
plan — plans carry no values,
which keeps them deterministic and side-effect-free.
SurrogateRegistry.register
Section titled “SurrogateRegistry.register”def register(sa_type: type[TypeEngine[Any]], factory: Callable[[], object]) -> NoneMap one SQLAlchemy type (and its subclasses) to a factory.
Args:
- sa_type (
type[TypeEngine[Any]]): The type class to cover, e.g.sqlalchemy.String. - factory (
Callable[[], object]): Zero-argument callable producing one surrogate value; called once per anonymized cell.
SurrogateRegistry.surrogate_for
Section titled “SurrogateRegistry.surrogate_for”def surrogate_for(column_type: TypeEngine[Any]) -> objectProduce one surrogate value for a column of the given type.
Args:
- column_type (
TypeEngine[Any]): The column’s type instance, e.g.Text().
Returns:
object— A fresh, type-valid surrogate value.
Raises:
AnonymizationError— If neither the type nor any of its base classes has a registered factory.