Phase 0 exploration for alknet-ssh: confirms SSH-over-QUIC-bistream via
tokio::io::join (no custom adapter needed, per reference impl), russh 0.60.2
generic run_stream/connect_stream, and channel-into-bistream multiplexing.
Surfaces 9 decision points for Phase 1: host key sourcing (vault-derived vs
config), channel policy v1 surface, client + SOCKS5 crate split, crypto
backend, auth method coverage, and a stream-handling POC to close russh's
upstream test gap.
First dedicated coverage pass (cargo-llvm-cov --workspace --all-features).
Workspace at 87.1% line coverage (5759/6615), all 224 tests pass. Vault
and registry layers are essentially fully covered; gaps concentrate in
endpoint.rs (56%), types.rs (57%), and connection.rs (54%), all stemming
from tests using MockConnection whose open_bi/accept_bi return Err.
Eight suggestions (S1-S8) ordered by leverage: pure-function tests for
dispatch_envelope / map_*_connection_error / error Display+Debug (S1-S3),
Tier A directly-callable TLS/rustls helpers in endpoint.rs (S4), one
loopback quinn integration test as the real unlock across four files (S5),
ACME event-loop extraction via synthetic stream (S6, the flagged research
item), and two small remaining gaps (S7-S8). No critical or warning
findings — this is a testing-infrastructure gap, not a logic gap.
Three tasks implementing ADR-027:
1. core/rawkey-decouple-from-iroh: TlsIdentity::RawKey now uses
Ed25519SecretKey (alknet-core-owned wrapper over ed25519_dalek)
instead of iroh::SecretKey. RawKeyCertResolver and Ed25519SigningKey
un-gated from #[cfg(all(quinn, iroh))] to #[cfg(quinn)] only.
Quinn-only builds (default) now support RFC 7250 raw-key identity.
iroh transport converts via iroh::SecretKey::from_bytes.
2. core/endpoint-request-client-cert: replaced with_no_client_auth()
with AcceptAnyCertVerifier — a custom ClientCertVerifier that
requests client certs but doesn't require them or verify against
a CA. alknet's identity model is fingerprint-based (the
authorized_fingerprints set is the trust anchor), not PKI-based.
Peer certs are extracted at the TLS layer for fingerprinting;
peers without certs connect normally.
3. core/acme-integration: TlsIdentity::Acme variant (domains,
cache_dir, directory, contact) + AcmeDirectory enum. TlsSetup
two-phase construction: synchronous for X509/RawKey/SelfSigned,
async for Acme (spawns AcmeState event loop, builds ServerConfig
with ResolvesServerCertAcme). acme-tls/1 ALPN added when ACME is
active; dispatch_quinn guard closes challenge connections
gracefully (challenge is TLS-layer-handled). acme feature gate
keeps rustls-acme out of non-ACME builds.
Workspace: build/test/clippy green across all 3 feature configs
(quinn-only, quinn+iroh, quinn+acme, all-features). 331 tests, 0
failures, 0 warnings.
ADR-027 resolves the architectural gap surfaced when ACME integration
became a concrete target:
1. TlsIdentity::Acme variant — static config data (domains, cache_dir,
directory, contact) with async AcmeState constructed at endpoint
setup via two-phase TlsSetup (not stuffed into the Clone-able enum).
2. TlsIdentity::RawKey decoupled from the iroh feature — uses
Ed25519SecretKey (alknet-core-owned wrapper over ed25519_dalek)
instead of iroh::SecretKey. Raw-key TLS identity (RFC 7250, the
default for most alknet nodes) now works in quinn-only builds.
iroh transport converts via SecretKey::from_bytes.
3. ACME feature-gated behind new acme feature (rustls-acme optional
dep). Non-ACME builds don't compile it.
4. dispatch_quinn guard for acme-tls/1 challenge connections — TLS-ALPN-01
is handled at the rustls cert resolver layer during the handshake;
the guard closes challenge connections gracefully instead of logging
a misleading "no handler" warning.
Research confirmed QUIC (quinn) handles ACME challenges differently than
TCP (reverse-proxy): quinn gives no ClientHello peek hook, but the
challenge is fully answered at the cert resolution step before the
connection surfaces to the application. No handler registration needed.
Spec updates: config.md, endpoint.md, open-questions.md (OQ-12),
overview.md + README.md (ADR index), ADR-010 (cross-ref).
Tasks: core/rawkey-decouple-from-iroh (gen 1, no deps),
core/acme-integration (gen 2, depends on rawkey). Graph: 36 tasks.
W1 (call/protocol/abort-cascade-wiring): wire AbortCascade into
CallAdapter handle_stream for EVENT_ABORTED. Cascades with
AbortPolicy::AbortDependents, aborts root, no descendant frames on
wire (ADR-016 Decision 2). Two integration tests added.
W2 (core/endpoint-client-fingerprint): extract TLS client cert
fingerprint in dispatch_quinn (SHA256:<hex> of leaf cert DER via
peer_identity) and dispatch_iroh (ed25519:<hex> of peer NodeId).
Fingerprint format documented in auth.md. Server config change
(with_no_client_auth → request-but-don't-require) deferred to new
follow-up task core/endpoint-request-client-cert.
W3 (vault/mnemonic-debug-redaction): replace Mnemonic derive(Debug)
with manual redacting impl (phrase: "[REDACTED]"). Seed confirmed
no Debug impl. Redaction test added.
W4 (core/auth-apikey-resources): Option B — drop entry.resources from
spec. External identities (token/fingerprint) grant scopes only;
resource-scoped ACLs are composition-internal (ADR-015/022). auth.md
corrected + limitation documented. Two tests confirm empty resources.
review-post-impl-fixes: all 4 verified, workspace green (326 tests,
0 failures, 0 clippy warnings). Review #004 status → resolved.
Graph: 34 tasks, 12 gens.
W1 (call/protocol/abort-cascade-wiring): wire AbortCascade into CallAdapter
handle_stream for EVENT_ABORTED. W2 (core/endpoint-client-fingerprint):
extract TLS client cert fingerprint in dispatch_quinn/dispatch_iroh.
W3 (vault/mnemonic-debug-redaction): replace Mnemonic derive(Debug) with
redacting impl. W4 (core/auth-apikey-resources, level: research): decide
whether ApiKeyEntry should carry resources, then implement or drop from
spec. review-post-impl-fixes gates on all four. Graph: 33 tasks, 12 gens.
Implement AbortCascade in protocol/abort.rs per ADR-016: PendingEntry stores
parent_request_id (Call & Subscribe) and a started flag for tree indexing.
AbortCascade::cascade_abort walks the call tree by parent_request_id and aborts
descendants per AbortPolicy (AbortDependents aborts all; ContinueRunning aborts
only unstarted via mark_started()). Returns sorted list of aborted IDs; unknown
root silently discarded. 20 unit tests covering depth-3 cascade, mixed
Call/Subscribe, determinism, both policies.
Refs: docs/architecture/crates/call/call-protocol.md
Implements: ADR-016
- PendingEntry stores parent_request_id (Call and Subscribe) and started flag
for abort-cascade tree indexing
- register_call/register_subscribe accept optional parent_request_id
- AbortCascade::cascade_abort walks the call tree by parent_request_id and
aborts descendants per AbortPolicy (AbortDependents: all; ContinueRunning:
unstarted only). Returns sorted list of aborted request IDs
- call.aborted for unknown request_id silently discarded (empty result)
- Composed child request_ids stay internal (not sent as call.requested)
- mark_started() tracks dispatch state for ContinueRunning decisions
- 20 unit tests covering AbortDependents/ContinueRunning, depth-3 tree,
unknown root, mixed Call/Subscribe, determinism
Review of alknet-core found one issue: RawKeyCertResolver/Ed25519SigningKey/
std::path::Path were gated on #[cfg(feature = "iroh")] but only used in the
quinn TLS path — caused clippy -D warnings failures on iroh-only builds.
Re-gated to #[cfg(all(feature = "quinn", feature = "iroh"))]. All 4 feature
combinations now pass clippy -D warnings; 55 tests pass.
Review confirmed: all core types, config, auth, and endpoint implementations
are spec-conformant.
The RawKeyCertResolver, Ed25519SigningKey, and std::path::Path imports
were gated on #[cfg(feature = "iroh")] but are only used in the quinn
TLS server-config path (build_rustls_server_config RawKey arm). With
iroh-only builds (--no-default-features --features iroh), these became
dead code and triggered clippy -D warnings failures.
Re-gated to #[cfg(all(feature = "quinn", feature = "iroh"))] so they
only compile when both features are active (the combination that
actually uses raw-key TLS via quinn). std::path::Path is now
#[cfg(feature = "quinn")] since it is only used by quinn's
load_cert_chain/load_private_key helpers.
Verified: cargo clippy passes with -D warnings across all four feature
combinations (none, quinn, iroh, quinn+iroh). cargo test --all-features
passes 55 tests. cargo fmt --check clean.
Implement CallConnection in protocol/connection.rs with Layer 2 imported-ops
overlay (Arc<RwLock<HashMap>>), register_imported/register_imported_all,
overlay_env() returning an OperationEnv that dispatches to imported ops,
and call()/subscribe()/abort() methods that open a stream, send call.requested,
register in PendingRequestMap, spawn a stream reader, and correlate responses
by ID. Connection drop drops the overlay. Exposed MockConnection +
Connection::from_mock in alknet-core for cross-crate testing. 9 new connection
tests (102 total in alknet-call).
Refs: docs/architecture/crates/call/call-protocol.md
Implements: ADR-012, ADR-017, ADR-024
Implements CallConnection in src/protocol/connection.rs representing an
established alknet/call connection (either direction). Holds the Layer 2
imported-ops overlay (ADR-024) as Arc<RwLock<HashMap>>.
- register_imported / register_imported_all add to the connection overlay
- overlay_env returns an OperationEnv dispatching to imported ops; contains()
returns true only for ops in the overlay
- call() opens a stream, sends call.requested, registers in PendingRequestMap,
spawns a stream reader, resolves on first call.responded
- subscribe() sends call.requested and yields call.responded until
call.completed/call.aborted via a SubscriptionStream wrapping the mpsc receiver
- abort() sends call.aborted for the request ID and removes the pending entry
- connection drop drops the overlay (no explicit deregistration needed)
Exposes MockConnection trait and Connection::from_mock in alknet-core so
cross-crate tests can construct mock connections without real QUIC. Removes
two unused test helpers in env.rs that triggered dead-code warnings under
-D warnings. Adds parking_lot dep for the overlay RwLock and pending Mutex.
9 new connection tests (102 total in alknet-call). Clippy clean.
Implement the ALPN router and endpoint in endpoint.rs: AlknetEndpoint with
quinn+iroh accept loops (both feature-gated, both Option), HandlerRegistry
(new/register/get/alpn_strings with panic-on-duplicate), dispatch via
tokio::spawn by ALPN, AuthContext construction from connection
(alpn/remote_addr/fingerprint/identity), TLS identity modes (RawKey RFC 7250
via on-the-fly cert resolver, X509 from files, SelfSigned via rcgen),
EndpointError enum, graceful shutdown with drain timeout + force close.
ACME deferred as TODO per task spec. 55 tests (--all-features), 52 (default),
47 (no-default); clippy clean across all 3 feature combos.
Refs: docs/architecture/crates/core/endpoint.md
Implements: ADR-010
Implement services/list and services/schema in registry/discovery.rs: spec
constructors, factory handlers taking Arc<OperationRegistry>, JSON serialization
of OperationSpec (incl. error_schemas per ADR-023), leading-slash normalization
for services/schema, NOT_FOUND for unknown ops, INVALID_INPUT for missing name.
Both registered as Local provenance with empty authority/env/caps and empty
AccessControl.
Refs: docs/architecture/crates/call/operation-registry.md
Implements: ADR-023
Expand the minimal OperationEnv trait from the operation-context task with
concrete dispatch implementations per ADR-024:
- LocalOperationEnv (Layer 0): wraps Arc<OperationRegistry>. invoke_with_policy
runs the scoped_env reachability check (ADR-015/022), looks up the
registration, then constructs a child OperationContext with internal: true,
identity = parent.handler_identity.as_identity() (the ADR-015 authority
switch), fresh metadata (HashMap::new() — ADR-014 security constraint, no
parent metadata propagation), inherited deadline (parent.deadline, not a
fresh 30s), inherited env (parent.env.clone() — Arc::clone per ADR-024), and
the child's own composition_authority + scoped_env from its registration.
contains() uses the default impl (returns true — curated registry contains
everything it can dispatch).
- CompositeOperationEnv (per-call, ADR-024): composes session (Layer 1),
connection (Layer 2), and base (Layer 0) trait objects. invoke_with_policy
runs the same reachability check, then probes overlays in order via
contains() (the overlay-dispatch contract from review #003 C9), dispatching
to the first overlay that contains the op. contains() aggregates all layers.
The trait-object design is load-bearing: making OperationEnv concrete would
close the session-overlay and connection-overlay patterns. Same integration-
point pattern as IdentityProvider (ADR-004).
Tests cover: allowed/disallowed reachability, internal-flag propagation,
authority switch (child identity = parent handler_identity), fresh metadata,
inherited deadline, composite session-overlay dispatch, composite fall-through
to base, composite connection-overlay dispatch when session lacks op, and
composite contains aggregation.
Implements the operation context types in registry/context.rs (ADR-015,
ADR-022, ADR-024): OperationContext with all 10 fields (internal is
pub(crate) for writes, read via is_internal()), AbortPolicy enum with
AbortDependents default, CompositionAuthority with synthetic Identity
projection for ACL, ScopedOperationEnv reachability set, and
generate_request_id() (UUID v4). Adds a minimal OperationEnv trait
forward-declaration in registry/env.rs so the context env field compiles;
the operation-env task will expand it.
Implement PendingRequestMap in protocol/pending.rs with Call (oneshot) and
Subscribe (mpsc) entries, ID-based correlation (ADR-012), timeout-based eviction,
fail_all for connection close, and silent discard of unknown request IDs. 17 unit
tests.
Refs: docs/architecture/crates/call/call-protocol.md
Implements: ADR-012
Correlates call.responded events back to call.requested by request ID
(stream-agnostic per ADR-012). Manages Call (oneshot) and Subscribe
(mpsc) entries with timeout-based eviction and fail_all on connection
close. Unknown request IDs are silently discarded.
Remove the Known Source Drift table from vault/README.md. Remove all 'known
drift'/'current source uses X' prose from Security Constraints in README,
encryption.md, service.md (constraint statements preserved). Remove stale
ADR-025/postcard notes in protocol.md. Bump all 5 vault doc frontmatter to
status: stable. Update architecture/README.md vault doc statuses to stable
and Current State to remove 'pending ADR-025/026 refactor' language.
Refs: docs/architecture/crates/vault/README.md (drift cleanup)
The vault spec-to-implementation sync is complete. Remove the drift
tracking tools that were only needed during sync:
- Remove the Known Source Drift table from vault/README.md
- Remove 'known drift' / 'current source uses X' prose from Security
Constraints sections in vault/README.md, encryption.md, and service.md.
The permanent constraint statements (OsRng for IVs, zeroized drop,
no unwrap, etc.) are preserved.
- Remove the drift paragraph in encryption.md Key Versioning.
- Remove stale 'to be updated per ADR-025' / 'postcard tests to be
removed' notes in protocol.md References.
- Bump status: draft -> stable in the frontmatter of all vault docs
(README, mnemonic-derivation, encryption, service, protocol).
- Update architecture/README.md: vault doc status entries to stable,
Current State paragraph reflects vault implementation complete (no
'pending ADR-025/026 refactor' language).
Review of vault crate against all architecture specs. Fixed 5 deviations:
1. EncryptionKey: removed Clone (now move-only per spec), added redacting Debug
2. EncryptionKey::new made private (cfg(test)), added pub(crate) key_bytes()
3. encrypt/decrypt made pub(crate) per encryption.md, low-level crypto tests
moved from integration to unit tests
4. CachedKey refactored to wrap DerivedKey with cached_at/last_accessed fields
per service.md, with key_type()/private_key()/public_key() accessors
5. Mnemonic::to_seed() unwrap() eliminated by storing validated Bip39Mnemonic
(enabled bip39 zeroize feature for proper zeroization)
All 10 drift items verified resolved. 105 tests pass; clippy clean.
Refs: docs/architecture/crates/vault/README.md (review checklist)