tasks: decompose review #004 findings into 4 fix tasks + review gate

W1 (call/protocol/abort-cascade-wiring): wire AbortCascade into CallAdapter handle_stream for EVENT_ABORTED. W2 (core/endpoint-client-fingerprint): extract TLS client cert fingerprint in dispatch_quinn/dispatch_iroh. W3 (vault/mnemonic-debug-redaction): replace Mnemonic derive(Debug) with redacting impl. W4 (core/auth-apikey-resources, level: research): decide whether ApiKeyEntry should carry resources, then implement or drop from spec. review-post-impl-fixes gates on all four. Graph: 33 tasks, 12 gens.
2026-06-24 10:02:03 +00:00
parent d904dfc243
commit d149932e2a
5 changed files with 571 additions and 0 deletions
--- a/tasks/core/auth-apikey-resources.md
+++ b/tasks/core/auth-apikey-resources.md
@@ -0,0 +1,117 @@
+---
+id: core/auth-apikey-resources
+name: Reconcile ApiKeyEntry.resources — add field to type and populate in resolve_api_key, or drop from spec
+status: pending
+depends_on: []
+scope: narrow
+risk: low
+impact: component
+level: research
+---
+
+## Description
+
+Three-way mismatch between spec, type, and implementation for
+resource-scoped ACLs on API-key-authenticated identities:
+
+- **Spec** (`docs/architecture/crates/core/auth.md:153`):
+  > "Token: ... return `Identity { id: prefix, scopes: entry.scopes,
+  > resources: entry.resources }`."
+  The spec references `entry.resources`.
+
+- **Type** (`crates/alknet-core/src/config.rs:55–62`): `ApiKeyEntry` has
+  fields `prefix, hash, scopes, description, expires_at` — there is no
+  `resources` field. So `entry.resources` in the spec cannot be
+  implemented as written.
+
+- **Implementation** (`config.rs:113–117`): `resolve_api_key` constructs
+  the resolved `Identity` with `resources: std::collections::HashMap::new()`
+  — resources are always empty, regardless of what the API key grants.
+
+The same gap exists in `resolve_identity_from_fingerprint`
+(config.rs:69–79), which also returns `resources: HashMap::new()`.
+
+### Impact
+
+Latent today: no operation in the workspace uses resource-based ACLs
+against a token- or fingerprint-resolved identity. The
+`AccessControl::resource_type` / `resource_action` fields exist in
+`OperationSpec` (spec.rs:32–37) and are tested (spec.rs:284–303), but
+those tests always hand-construct `Identity.resources` directly —
+never via the resolver path. The moment an operation declares a
+resource-scoped ACL and a caller authenticates via API key, the ACL
+check will fail with "missing resource" even if the key was granted
+that resource in config — because `resources` is always empty.
+
+### This is a research/decision task, not an implementation task
+
+The decomposer rule applies: **the architecture is ambiguous** on
+whether API keys should grant resource-scoped access. Two valid designs
+exist; pick one and document it before implementing. Do not implement
+until the decision is made.
+
+**Option A — add `resources` to `ApiKeyEntry`** (matches current spec):
+- Add `pub resources: HashMap<String, Vec<String>>` to `ApiKeyEntry`.
+- Update `resolve_api_key` to populate `Identity.resources` from
+  `entry.resources`.
+- Update `resolve_identity_from_fingerprint` similarly — either add a
+  `resources` field to the fingerprint config path, or document that
+  fingerprint auth grants scopes only (resources empty).
+- Update `auth.md`'s token resolution example to match the new field.
+- Define the TOML schema for `resources` in `AuthPolicy` (when a TOML
+  schema is added — currently config is built in code, not parsed).
+- Resource-scoped ACLs then work for both auth paths.
+
+**Option B — drop `resources` from the spec for API keys**:
+- Remove `entry.resources` from `auth.md:153`.
+- Document that API keys grant scopes only; resource-scoped access
+  requires a different identity source (e.g., a future OAuth/JWT
+  provider that carries resource claims).
+- `Identity.resources` stays in the type (it's used by hand-constructed
+  identities in tests and by `CompositionAuthority::as_identity` for
+  internal calls) but token/fingerprint resolvers always return empty.
+- Resource-scoped ACLs against token identities return `Forbidden` —
+  this becomes a documented limitation, not a bug.
+
+### Deliverable
+
+Produce a short decision note (a paragraph in `auth.md` under
+"Identity Resolution" — or a new ADR if the decision feels
+consequential enough) that picks A or B and justifies it. Then either
+implement the chosen option in the same task (if small) or split a
+follow-up `level: implementation` task gated on this one.
+
+The decision should consider: do any planned operations (in the
+upcoming alknet-ssh, alknet-fs, alknet-git crates) need resource-scoped
+ACLs on API-key identities? If yes, A. If resource ACLs are only ever
+applied to handler-internal composition identities
+(`CompositionAuthority`), B is fine and simpler.
+
+## Acceptance Criteria
+
+- [ ] Decision made: Option A or Option B
+- [ ] Decision documented in `auth.md` (or a new ADR if consequential)
+- [ ] If Option A: `ApiKeyEntry.resources` added, `resolve_api_key` populates `Identity.resources`, `resolve_identity_from_fingerprint` handling decided and documented, `auth.md:153` matches the new shape
+- [ ] If Option B: `auth.md:153` corrected to drop `entry.resources`, limitation documented
+- [ ] Either way: a test covering the chosen behavior (token resolves with resources, or token resolves with empty resources + documented limitation)
+- [ ] `cargo test -p alknet-core` succeeds
+- [ ] `cargo clippy -p alknet-core --all-targets` succeeds with no warnings
+
+## References
+
+- docs/reviews/004-post-implementation-sanity-check.md — W4 (full finding)
+- docs/architecture/crates/core/auth.md:152–153 — spec text referencing `entry.resources`
+- crates/alknet-core/src/config.rs:55–62 — `ApiKeyEntry` (missing `resources`)
+- crates/alknet-core/src/config.rs:69–118 — both resolvers returning empty `resources`
+- crates/alknet-call/src/registry/spec.rs:77–103 — `AccessControl::check` resource path (the consumer that would fail)
+- crates/alknet-call/src/registry/context.rs:58–65 — `CompositionAuthority::as_identity` (the internal-call path that does populate `resources`)
+
+## Notes
+
+> This is a `level: research` task because the fix is small but the
+> decision is not. The decomposer principle: if architecture is
+> ambiguous, do not proceed with implementation — escalate. Make the
+> decision first, then implement. If the decision is A and the
+> implementation is more than ~30 lines, split a follow-up
+> `level: implementation` task (`core/auth-apikey-resources-impl`)
+> depending on this one.
--- a/tasks/core/endpoint-client-fingerprint.md
+++ b/tasks/core/endpoint-client-fingerprint.md
@@ -0,0 +1,118 @@
+---
+id: core/endpoint-client-fingerprint
+name: Extract TLS client certificate fingerprint in endpoint dispatch (ADR-004)
+status: pending
+depends_on: []
+scope: narrow
+risk: medium
+impact: component
+level: implementation
+---
+
+## Description
+
+Both dispatch functions in `crates/alknet-core/src/endpoint.rs` hard-code
+`tls_client_fingerprint: None` when calling `build_auth_context`
+(endpoint.rs:306 and 396). As a result, `AuthContext.identity` (the
+endpoint-resolved identity) is always `None` at the endpoint layer, and
+all identity resolution is deferred to handler-level code. The
+endpoint-level auth resolution path described in
+`docs/architecture/crates/core/auth.md:159–171` is non-functional:
+
+> "QUIC connection arrives → TLS handshake → Extract TLS client
+> certificate fingerprint (if presented) → If fingerprint present:
+> `IdentityProvider::resolve_from_fingerprint()` → `auth.identity =
+> Some(identity)` → Construct `AuthContext { identity, alpn, remote_addr,
+> tls_client_fingerprint }`"
+
+This matters most for P2P nodes using RFC 7250 raw Ed25519 keys (the
+"default for most alknet nodes" per OQ-12), where the connection-level
+identity *is* the TLS client cert — there is no separate protocol-level
+credential to extract. Without endpoint-level fingerprint extraction,
+a raw-key peer connecting to an `alknet/call` endpoint cannot be
+identified by fingerprint at the endpoint layer.
+
+### Quinn path
+
+`extract_quinn_alpn` (endpoint.rs:316–326) already downcasts
+`connection.handshake_data()` to `quinn::crypto::rustls::HandshakeData`.
+The same `HandshakeData` struct exposes the peer's client certificate
+chain when one was presented. Extract the chain, hash the leaf cert's
+DER to a `SHA256:`-prefixed fingerprint string (matching the format
+`AuthPolicy::resolve_identity_from_fingerprint` expects — see
+auth.md:152), and pass it to `build_auth_context` in place of `None`.
+
+Note: `rustls::ServerConfig` is currently built with
+`with_no_client_auth()` (endpoint.rs:450, 463, 473), so the server does
+not *request* client certs. To actually receive a client cert, the
+server config must use `with_client_auth()` or an equivalent that
+requests but does not require client certs (raw-public-key peers
+present their Ed25519 key as the "client cert" in RFC 7250 mode). This
+is the one design decision to make in this task: whether to switch from
+`with_no_client_auth()` to a "request-but-don't-require" mode, or to
+leave `with_no_client_auth()` and accept that fingerprints only flow
+when the client opts to present a cert unbidden. The RFC 7250 raw-key
+path (the `RawKeyCertResolver` at endpoint.rs:565–595) already
+advertises `only_raw_public_keys() -> true`, which is the server-side
+half of RFC 7250; the client-side presentation is set by the client's
+`rustls::ClientConfig`, not by the server. Read ADR-004 and OQ-12
+before deciding.
+
+### Iroh path
+
+iroh's `Connection` exposes the peer's `NodeId` (the raw Ed25519
+public key) via the connection's TLS session metadata. In iroh's model
+the `NodeId` *is* the fingerprint — it's the raw-public-key identity.
+Extract it and format as a `NodeId:`-prefixed string (or `SHA256:` of
+the public key bytes — match whatever `AuthPolicy`'s fingerprint set
+is expected to contain). Look at `iroh::endpoint::Connection` methods
+and the `iroh::tls::Lts` / peer-certificate accessor for the exact API.
+
+### Fingerprint format
+
+`AuthPolicy::resolve_identity_from_fingerprint` (config.rs:69–79) does
+a literal `HashSet::contains()` check — it does not normalize. So
+whatever format the extractor produces must be the same format the
+operator configures in `authorized_fingerprints`. The existing
+fingerprint test (auth.rs:145–153) uses `"SHA256:abc123"` as a
+placeholder. Pick a concrete format and document it in `auth.md` (the
+spec is currently silent on the exact string format). Suggested:
+`SHA256:<hex of leaf cert DER>` for X.509, `ed25519:<base64 of pub key>`
+for raw keys — but confirm against any existing fingerprint producer
+in the codebase before committing.
+
+## Acceptance Criteria
+
+- [ ] `dispatch_quinn` extracts client cert fingerprint from `HandshakeData` when present
+- [ ] `dispatch_iroh` extracts peer `NodeId` (or equivalent raw-key fingerprint) when present
+- [ ] `build_auth_context` receives `Some(fingerprint)` when a client cert was presented, `None` otherwise
+- [ ] `AuthContext.identity` is `Some(identity)` when the fingerprint resolves via `IdentityProvider`, `None` otherwise (no regression for the no-cert case)
+- [ ] Server config decision (request-but-don't-require vs. no-client-auth) is made and documented
+- [ ] Fingerprint string format is chosen, documented in `auth.md`, and consistent between extractor and `AuthPolicy::authorized_fingerprints` config
+- [ ] Unit test: quinn path with a presented client cert → `auth.tls_client_fingerprint` is `Some(...)`
+- [ ] Unit test: quinn path with no client cert → `auth.tls_client_fingerprint` is `None` (existing behavior preserved)
+- [ ] Unit test: iroh path → `auth.tls_client_fingerprint` is `Some(NodeId-format)` when peer identity is available
+- [ ] `cargo test -p alknet-core --all-features` succeeds
+- [ ] `cargo clippy -p alknet-core --all-features --all-targets` succeeds with no warnings
+
+## References
+
+- docs/reviews/004-post-implementation-sanity-check.md — W2 (full finding)
+- docs/architecture/crates/core/auth.md:159–171 — endpoint-level resolution flow spec
+- docs/architecture/crates/core/auth.md:152 — fingerprint format used by `resolve_identity_from_fingerprint`
+- docs/architecture/decisions/004-auth-as-shared-core.md — ADR-004 (hybrid resolution)
+- docs/architecture/open-questions.md — OQ-12 (TLS identity provisioning)
+- crates/alknet-core/src/endpoint.rs:306, 396 — the two `None` sites to fix
+- crates/alknet-core/src/endpoint.rs:316–326 — `extract_quinn_alpn` (pattern to follow for `HandshakeData` downcast)
+- crates/alknet-core/src/endpoint.rs:565–595 — `RawKeyCertResolver` (RFC 7250 server-side half)
+
+## Notes
+
+> If the server-config decision (request-but-don't-require client auth)
+> is too large for this task's scope, split it: implement extraction
+> first (this task, gated on the cert being presented *if* one arrives),
+> then a follow-up task switches the server config to actually request
+> client certs. The extraction code is correct either way — it returns
+> `None` when no cert was presented, which is the current behavior, so
+> landing extraction first is a safe no-op until the server config
+> changes.