Skip to content

Retention

Art. 5(1)(e) storage limitation — a report-only retention-expiry sweep.

RetentionSweeper evaluates every bounded retention duty in the manifest (a RetentionPolicy with a duration, clocked by its anchor column) against one instant and reports lapsed windows per subject, audited. It deletes nothing, and it determines nothing: whether a lapsed duty permits — or requires — erasure is the controller’s determination, never effaced’s.

class RetentionReport(BaseModel):
entries: tuple[RetentionReportEntry, ...] = ()
swept_at: datetime

Everything one retention sweep found, evaluated at a single instant.

The report is a mechanism’s output, not a determination: it says which declared retention windows have lapsed, never whether the data may — or must — now be erased.

Fields:

  • entries (tuple[RetentionReportEntry, ...]): One entry per column with a bounded retention duty (a policy carrying a duration), in manifest order.
  • swept_at (datetime): The single cutoff instant the whole run was evaluated against (UTC).
class RetentionReportEntry(BaseModel):
anchor: str | None = None
column: str = Field(min_length=1)
expired: dict[str, int] = Field(default_factory=dict)
indeterminate_rows: int = 0
reason: str = Field(min_length=1)
table: str = Field(min_length=1)

One annotated column’s retention-expiry findings.

The entry names the subjects whose declared retention window for this column has lapsed — and is honest about what it could not evaluate: rows without an anchor value are counted, never guessed. What a lapsed duty permits is the controller’s determination; in particular, erasure retains RETAIN columns by construction, so acting on a lapsed RETAIN duty means changing the annotation first (flip the strategy or drop the policy) and then erasing.

Fields:

  • anchor (str | None): The policy’s anchor column, or None when the duration has no clock and the whole column is indeterminate.
  • column (str): The annotated column whose retention policy was evaluated.
  • expired (dict[str, int]): Expired row count per subject id; empty when nothing has lapsed (or nothing could be evaluated).
  • indeterminate_rows (int): Rows the sweep could not evaluate — every row when the policy has no anchor, otherwise the rows whose anchor column is NULL.
  • reason (str): The policy’s declared legal duty, verbatim.
  • table (str): The table holding the annotated column.
class RetentionSweeper:
def __init__(data_map: DataMap, graph: SubjectGraph, metadata: MetaData, audit_sink: AuditSink) -> None

Finds data whose declared retention window has lapsed — and reports it.

The sweep is read-only by construction: it builds nothing but SELECT statements, writes no rows, and the erasure planner stays time-free — a lapsed window changes the report, never any plan. Whether a lapsed duty permits erasure is the controller’s determination (ADR 0012); acting on a lapsed RETAIN duty means changing the annotation first, because erasure retains RETAIN columns by construction.

A column participates iff its policy declares a duration; it is sweepable iff the policy also names an anchor. Durations without an anchor — and rows whose anchor is NULL — are reported as indeterminate, never guessed.

def sweep(session: Session, *, now: datetime | None = None) -> RetentionReport

Evaluate every bounded retention duty against one instant.

Per sweepable column, cutoff = now - duration is computed in Python so the database sees a portable anchor <= :cutoff comparison; rows are attributed to subjects through the manifest’s hop chains. One RETENTION_EXPIRED event is appended per subject with expired rows — table/column names and counts only, never values or anchor timestamps. Repeated sweeps re-emit for still-expired data: each run is evidence.

Counting fetches the matched rows and counts in Python — O(rows); a COUNT pushdown is a future adapter optimization.

Args:

  • session (Session): An open database session; reads only, never writes.
  • now (datetime | None): The cutoff instant for the whole run; defaults to the current UTC time. Pass it explicitly for determinism.

Returns:

  • RetentionReport — The report, one entry per column with a declared duration.