Retention
Art. 5(1)(e) storage limitation — a report-only retention-expiry sweep.
RetentionSweeper evaluates every bounded retention duty in the
manifest (a RetentionPolicy with a
duration, clocked by its anchor column) against one instant and
reports lapsed windows per subject, audited. It deletes nothing, and it
determines nothing: whether a lapsed duty permits — or requires — erasure
is the controller’s determination, never effaced’s.
RetentionReport
Section titled “RetentionReport”class RetentionReport(BaseModel): entries: tuple[RetentionReportEntry, ...] = () swept_at: datetimeEverything one retention sweep found, evaluated at a single instant.
The report is a mechanism’s output, not a determination: it says which declared retention windows have lapsed, never whether the data may — or must — now be erased.
Fields:
- entries (
tuple[RetentionReportEntry, ...]): One entry per column with a bounded retention duty (a policy carrying aduration), in manifest order. - swept_at (
datetime): The single cutoff instant the whole run was evaluated against (UTC).
RetentionReportEntry
Section titled “RetentionReportEntry”class RetentionReportEntry(BaseModel): anchor: str | None = None column: str = Field(min_length=1) expired: dict[str, int] = Field(default_factory=dict) indeterminate_rows: int = 0 reason: str = Field(min_length=1) table: str = Field(min_length=1)One annotated column’s retention-expiry findings.
The entry names the subjects whose declared retention window for this
column has lapsed — and is honest about what it could not evaluate:
rows without an anchor value are counted, never guessed. What a
lapsed duty permits is the controller’s determination; in particular,
erasure retains RETAIN columns by construction, so acting on a
lapsed RETAIN duty means changing the annotation first (flip the
strategy or drop the policy) and then erasing.
Fields:
- anchor (
str | None): The policy’s anchor column, orNonewhen the duration has no clock and the whole column is indeterminate. - column (
str): The annotated column whose retention policy was evaluated. - expired (
dict[str, int]): Expired row count per subject id; empty when nothing has lapsed (or nothing could be evaluated). - indeterminate_rows (
int): Rows the sweep could not evaluate — every row when the policy has no anchor, otherwise the rows whose anchor column is NULL. - reason (
str): The policy’s declared legal duty, verbatim. - table (
str): The table holding the annotated column.
RetentionSweeper
Section titled “RetentionSweeper”class RetentionSweeper: def __init__(data_map: DataMap, graph: SubjectGraph, metadata: MetaData, audit_sink: AuditSink) -> NoneFinds data whose declared retention window has lapsed — and reports it.
The sweep is read-only by construction: it builds nothing but SELECT statements,
writes no rows, and the erasure planner stays time-free — a lapsed
window changes the report, never any plan. Whether a lapsed duty
permits erasure is the controller’s determination (ADR 0012); acting on
a lapsed RETAIN duty means changing the annotation first, because
erasure retains RETAIN columns by construction.
A column participates iff its policy declares a duration; it is
sweepable iff the policy also names an anchor. Durations without
an anchor — and rows whose anchor is NULL — are reported as
indeterminate, never guessed.
RetentionSweeper.sweep
Section titled “RetentionSweeper.sweep”def sweep(session: Session, *, now: datetime | None = None) -> RetentionReportEvaluate every bounded retention duty against one instant.
Per sweepable column, cutoff = now - duration is computed in
Python so the database sees a portable anchor <= :cutoff
comparison; rows are attributed to subjects through the manifest’s
hop chains. One RETENTION_EXPIRED event is appended per subject
with expired rows — table/column names and counts only, never
values or anchor timestamps. Repeated sweeps re-emit for
still-expired data: each run is evidence.
Counting fetches the matched rows and counts in Python — O(rows); a COUNT pushdown is a future adapter optimization.
Args:
- session (
Session): An open database session; reads only, never writes. - now (
datetime | None): The cutoff instant for the whole run; defaults to the current UTC time. Pass it explicitly for determinism.
Returns:
RetentionReport— The report, one entry per column with a declaredduration.