Skip to content

effaced-s3

effaced-s3 — first-party S3 resolver for effaced.

The resolver itself is S3Resolver. The object-store machinery it rides on is public and stable, so S3-compatible stores (Supabase Storage, MinIO, R2) can build their own resolvers on the same parts: the client protocol S3ObjectClient, the prefix guard checked_prefix, the export collector collect_object_records, the listing helpers iter_current_objects and collect_version_identifiers, the batched delete delete_in_batches, and the error taxonomy (error_code, is_nonretryable, NONRETRYABLE_CODES).

def checked_prefix(ref: SubjectRef) -> str

The ref’s key prefix, validated before any object-store call.

Object-store prefixes are literal substring matches, so a prefix that is not delimiter-terminated also matches sibling subjects (users/4 matches users/42/avatar.png) — that is cross-subject bleed, the one thing a resolver must never do. Both guards run before any call.

Args:

  • ref (SubjectRef): The subject reference whose value is the key prefix.

Returns:

  • str — The validated prefix, unchanged.

Raises:

  • ResolverError — The prefix is blank (it would address the whole bucket) or does not end with "/" (it would match sibling subjects).
def collect_object_records(client: S3ObjectClient, bucket: str, prefix: str, *, source: str, include_content: bool, max_object_bytes: int | None) -> tuple[ExportRecord, ...]

Map every current object under the prefix; the size cap fails loudly.

Args:

  • client (S3ObjectClient): The object-store client to list and fetch with.
  • bucket (str): The bucket holding the subject’s objects.
  • prefix (str): The subject’s key prefix.
  • source (str): The ExportRecord.source label every produced record carries — the resolver’s name.
  • include_content (bool): Fetch each object’s body (GET) or only its metadata (HEAD).
  • max_object_bytes (int | None): Refuse (loudly) to export any object larger than this; None means no cap.

Returns:

  • ExportRecord — The records for every current object under the prefix, in listing
  • ... — order. Empty when nothing lives under the prefix.

Raises:

  • ResolverError — An object under the prefix exceeds max_object_bytes — the export fails whole, never a silently thinned bundle.
def collect_version_identifiers(client: S3ObjectClient, bucket: str, prefix: str) -> list[ObjectIdentifierTypeDef]

Every (key, version) pair under the prefix — delete markers included.

Args:

  • client (S3ObjectClient): The S3 client to list with.
  • bucket (str): The bucket holding the subject’s objects.
  • prefix (str): The subject’s key prefix.

Returns:

  • list[ObjectIdentifierTypeDef] — Identifiers for all object versions and delete markers, in
  • list[ObjectIdentifierTypeDef] — listing order, ready for delete_objects batches.
def delete_in_batches(client: S3ObjectClient, bucket: str, identifiers: list[ObjectIdentifierTypeDef]) -> list[str]

Delete every identifier in bounded batches; collect per-key error codes.

Args:

  • client (S3ObjectClient): The object-store client to delete with.
  • bucket (str): The bucket holding the subject’s objects.
  • identifiers (list[ObjectIdentifierTypeDef]): The (key, optional version) pairs to delete, ready for delete_objects batches.

Returns:

  • list[str] — The per-key error codes the store reported, across all batches —
  • list[str] — empty when every deletion succeeded. Batches keep running past
  • list[str] — failures, so the codes accumulate without aborting the rest.
def error_code(error: ClientError) -> str

The S3 error code of a ClientError, or "" when absent.

Args:

  • error (ClientError): The ClientError botocore raised.

Returns:

  • str — The Error.Code field of the error response body.
def is_nonretryable(error: ClientError) -> bool

Whether a ClientError should abandon instead of retry.

Args:

  • error (ClientError): The ClientError botocore raised.

Returns:

  • bool — True for credential, permission, missing-bucket, and
  • bool — wrong-endpoint failures; False for everything else — throttling,
  • bool — server faults, and codes this taxonomy does not know.
def iter_current_objects(client: S3ObjectClient, bucket: str, prefix: str) -> Iterator[ObjectTypeDef]

The current (non-deleted) objects under the prefix, page by page.

Args:

  • client (S3ObjectClient): The S3 client to list with.
  • bucket (str): The bucket holding the subject’s objects.
  • prefix (str): The subject’s key prefix.

Yields:

  • ObjectTypeDef — One listing entry per current object.
NONRETRYABLE_CODES = frozenset({'AccessDenied', 'AllAccessDisabled', 'AccountProblem', 'InvalidAccessKeyId', 'SignatureDoesNotMatch', 'InvalidBucketName', 'NoSuchBucket', 'PermanentRedirect'})

Error codes that can never succeed on retry — they abandon immediately.

class PartialEraseError(Exception):
...

Some object versions under the prefix could not be deleted this attempt.

Deliberately not a ResolverError: the saga runner retries any other exception, and a partial batch failure is exactly that case — the keys that did delete stay deleted, the survivors are re-listed and re-deleted on the next attempt, and re-deleting an already-gone version is a no-op, so retries converge.

Messages carry counts and S3 error codes only — never keys or prefixes, which are user content.

Protocol — implement these members in your own class; do not subclass.

class S3ObjectClient(Protocol):
...

What the resolver requires of an S3 client (structural).

def delete_objects(*, Bucket: str, Delete: DeleteTypeDef) -> DeleteObjectsOutputTypeDef

Batch-delete up to 1000 (key, version) pairs.

def get_object(*, Bucket: str, Key: str) -> GetObjectOutputTypeDef

Fetch one object’s body and metadata.

def head_object(*, Bucket: str, Key: str) -> HeadObjectOutputTypeDef

Fetch one object’s metadata without the body.

def list_object_versions(*, Bucket: str, Prefix: str, KeyMarker: str = ..., VersionIdMarker: str = ...) -> ListObjectVersionsOutputTypeDef

Page every object version and delete marker under a prefix.

def list_objects_v2(*, Bucket: str, Prefix: str, ContinuationToken: str = ...) -> ListObjectsV2OutputTypeDef

Page the current objects under a prefix.

class S3Resolver:
def __init__(bucket: str, *, client: S3ObjectClient | None = None, region_name: str | None = None, include_content: bool = True, max_object_bytes: int | None = None) -> None

Exports and erases a subject’s objects held under an S3 key prefix.

Expects refs of kind "s3" (refs are routed to the resolver whose name equals their kind — ADR 0008) whose value is the subject’s key prefix, e.g. "users/42/"; the bucket is fixed at construction. The prefix must be non-blank and end with "/" — anything else raises ResolverError before any S3 call, because an unterminated prefix also matches sibling subjects (users/4 matches users/42/...) and a blank one is the whole bucket.

Erasure deletes every object version and delete marker under the prefix: a plain delete on a versioned bucket only hides data behind a delete marker, which is not erasure. Unversioned buckets take the same path (S3 reports their versions as "null"). Exports cover current versions and, by default, include each object’s content base64-encoded — for user-generated objects the bytes usually are the personal data. include_content=False is appropriate only when the controller provides the files through another complete channel.

Idempotency: a prefix S3 holds nothing under yields already_absent=True — success, never an error. A partially failed batch delete keeps deleting the rest, then raises PartialEraseError so the saga retries; re-deletes are no-ops, so retries converge.

Error taxonomy (see effaced_s3.errors): credential, permission, missing-bucket, and wrong-endpoint failures raise ResolverError; throttling, connection faults, S3-side errors, and unknown codes propagate so the saga runner retries. SDK-internal retries are disabled — the saga runner owns retry and backoff (ADR 0010).

Fields:

  • covered_surface (CoveredSurface): The S3 object PII this resolver claims to reach (AttestingResolver). Returns: S3_COVERED_SURFACE, built from the exporter’s object-field tuple so it cannot drift.
  • name (str): Stable resolver name recorded in manifests and audits.
async def erase_subject(ref: SubjectRef) -> ResolverErasure

Delete every object version under the subject’s prefix (Art. 17).

Args:

  • ref (SubjectRef): kind="s3", value=<key prefix>.

Returns:

  • ResolverErasure — The outcome; already_absent=True if S3 already held
  • ResolverErasure — nothing under the prefix.

Raises:

  • ResolverError — The credentials are invalid or lack a permission, the bucket does not exist, the prefix is blank or not "/"-terminated, or S3 refused every failed deletion for non-retryable reasons — retrying cannot succeed.
  • PartialEraseError — Some versions failed transiently this attempt; propagates so the saga retries to convergence.
async def export_subject(ref: SubjectRef) -> ResolverExport

Collect the objects held under the subject’s prefix (Art. 15).

Args:

  • ref (SubjectRef): kind="s3", value=<key prefix>.

Returns:

  • ResolverExport — Per object: key, size, content type, last-modified, user
  • ResolverExport — metadata, and (unless disabled) the base64-encoded body.
  • ResolverExport — Empty when nothing lives under the prefix.

Raises:

  • ResolverError — The credentials are invalid or lack a permission, the bucket does not exist, the prefix is blank or not "/"-terminated, or an object exceeds max_object_bytes — retrying cannot succeed.