The VaultProtocol is a remote-capable irpc service by construction —
#[rpc_requests] generates both Service (local) and RemoteService (remote)
trait impls. DerivedKey's dual serialization (JSON redacts, postcard
preserves) was designed for this. Enabling remote vault access is a
server-setup change, not a protocol change.
OQ-21 enriched with full context:
- What's already in place (protocol, serialization, actor, auth transport)
- What's not in place (IrohProtocol handler forwards all messages without
auth checks; needs NodeId allowlist + message filtering in assembly layer)
- Operation access policy: Unlock/Lock local-only; Derive/Encrypt/Decrypt
remote-capable
- Use case: machine node → workers (workers don't hold mnemonics)
- Per-machine-node vaults, not shared (compartmentalization)
- Breaking vs non-breaking analysis (enabling = non-breaking; protocol
evolution = wire break, manageable via ALPN versioning)
The auth-wrapping handler lives in the assembly layer (or a dedicated
vault-server crate depending on both alknet-core and alknet-vault), not in
the vault crate itself — the vault is standalone (ADR-018) and can't
import alknet-core's auth model.
OQ-21 remains deferred — no commitment to implement, but the door is open
and the design space is mapped.
Key rotation uses version-indexed derivation paths: each key version maps
to a distinct SLIP-0010 path (m/74'/2'/0'/{version-2}'). v2 is at index 0
(PATHS::ENCRYPTION), v3 at index 1, etc.
Mechanism:
- encryption_path_for_version(version) constructs the path
- decrypt derives the key at the version-indicated path (not always
PATHS::ENCRYPTION)
- rotate(blob, to_version) decrypts with old key, re-encrypts with new
- No new mnemonic needed — same seed, different path
- Partial rotation is safe — old keys remain derivable
- The vault does not self-rotate; the assembly layer iterates blobs
Source drift flagged:
- decrypt currently ignores key_version for path selection (always uses
PATHS::ENCRYPTION) — must use version-indexed paths
- rotate method does not exist in source — must be added
- CURRENT_KEY_VERSION must bump from 1 to 2 (per ADR-020, reinforced here)
OQ-22 resolved. Only OQ-21 (remote vault admin, deferred) remains.
The vault uses SLIP-0010 HD derivation from the BIP39 seed for the
AES-256-GCM encryption key, not PBKDF2. This replaces the TypeScript
predecessor's (@alkdev/storage/src/graphs/crypto.ts) PBKDF2-based
approach.
Key decisions:
- HD derivation at m/74'/2'/0'/0' produces the encryption key
- PBKDF2 is not implemented in the vault; no password-based derivation
- salt field is unused in v2 (wire-format compat only)
- key_version=1 reserved for TS PBKDF2 data; key_version=2 for vault HD
- TS-encrypted data requires one-time migration to v2
- CURRENT_KEY_VERSION changes from 1 to 2 (source drift flagged)
OQ-20 resolved: the encryption key derivation method is locked. OQ-22
(key rotation workflow) remains open but does not block implementation.
Spec the vault crate from its existing implementation. The vault is
stable (implementation exists); this spec documents what IS so the
implementation-sync agent can reconcile source drift.
New spec documents (crates/vault/):
- README.md — crate index, security constraints, public API
- mnemonic-derivation.md — BIP39, SLIP-0010, BIP-0032, derivation paths
- encryption.md — AES-256-GCM, EncryptedData, key versioning, salt
- service.md — VaultServiceHandle lifecycle, actor dispatch, cache
- protocol.md — VaultProtocol irpc messages, DerivedKey redaction
New ADRs:
- ADR-018: Vault as standalone crate (zero alknet deps; own types/errors)
- ADR-019: Vault assembly-layer-only access (CLI is sole caller)
New open questions:
- OQ-20: Salt/KDF Phase B (open, low priority — salt field reserved)
- OQ-21: Remote vault administration (deferred — needs ADR if ever needed)
- OQ-22: Key rotation mechanism (open, low priority — workflow not specced)
Spec-vs-source drift explicitly flagged (for the sync agent):
- rand::random() used for IVs instead of OsRng (security-critical)
- unwrap() on every RwLock acquisition (must use unwrap_or_else)
- ADR-038 / OQ-SVC-03 references in source comments are stale (old numbering)
- VaultServiceActor::spawn returns a non-functional second actor (source bug)
- KeyVersionMismatch error variant is defined but unused in v1
OQ-11 (handler-level auth observability): Option B — handlers store
resolved identity on Connection via set_identity. Two identity scopes:
connection-level (observability, write-once-read-many) and per-request
(ACL, on OperationContext). Per-request takes precedence for ACL;
connection-level is for logging/audit only.
OQ-19 (session-scoped registries): Protocol doesn't need changes.
OperationEnv must remain a trait (not concrete) to enable session-overlay
pattern. Three-tier registry: core (static, External+Internal), session
(dynamic, Internal-only), promotion (curated review). Documented as
implementation guard in operation-registry.md.
All 19 open questions are now resolved. No open one-way or two-way doors
remain. The architecture is ready for review and implementation.
ADR-017 locks the client/adapter architecture:
- CallClient opens QUIC connections, shares dispatch loop with CallAdapter
- Connection direction independent of call direction (both sides can call)
- from_call adapter: discovers remote ops via services/list + services/schema,
registers with forwarding handlers (same pattern as from_openapi/from_mcp)
- to_openapi/to_mcp: project local ops to external protocols
- OperationAdapter trait: produces (OperationSpec, Handler) pairs
- Cross-node call tree: abort cascade propagates through from_call handlers
- Credentials from capabilities (ADR-014), adapter ops Internal by default (ADR-015)
The dispatch POC at /workspace/@alkdev/dispatch demonstrated head/worker over
SSH+axum; under the call protocol it's cross-node composition via from_call.
Connection topology (who advertises, who opens) is independent of call
direction — runner pattern, dispatch pattern, and P2P all work.
Document the three-tier registry model (core/session/promotion) and the
self-improving agent workflow where agents write their own operations in
a quickjs sandbox. The POC at /workspace/toolEnv demonstrated the sandbox
mechanism (quickjs in Deno web workers, proxy-based env bridge via
postMessage) but exposed the full registry to the sandbox — the security
gap that OQ-18's scoped composition env addresses.
The call protocol doesn't need changes: the OperationEnv trait is the
composition point, and a session-scoped env wraps the global env (session
registry first, fall through to global). The one-way door this OQ guards
against: making OperationEnv concrete instead of a trait, or hardcoding
the global registry into the dispatch path, would close the session-overlay
pattern. Session-scoped operations are always Internal, run under the
handler's identity, and are ephemeral. Promotion to core requires curation
review (architect role with promote scope).
The abort cascade and privilege model are call protocol semantics that
every consumer inherits — NAPI adapter, Python adapter, agent service, and
any future service speaking the EventEnvelope wire format. Framing them as
'needs agent crate in view' let a single consumer's timeline gate a
protocol-level decision. The agent use case is a useful test case for edge
cases, but the decisions belong to the call protocol.
The 'trusted' flag on OperationContext was the wrong word — it implies a
trust decision was made, but what actually happens is the call originated
internally (from composition) not externally (from the wire). Renamed to
'internal' with clarified semantics: internal calls switch authority
context to the handler's identity, not skip ACL. This prevents the
privilege escalation vector where composition with 'trusted: true' bypassed
all access control (buggy handler + parameterized dispatch).
- Rename trusted -> internal across operation-registry.md, ADR-014
- Update OperationContext field description and LocalOperationEnv code
- Add OQ-17: abort cascade for nested calls (call.aborted cascades to
descendants, default abort-dependents, continue-running opt-in). One-way
door on the protocol event schema; mechanism is a two-way door.
- Add OQ-18: privilege model and authority context (internal = authority
switch not ACL skip, External/Internal operation visibility, scoped
composition env + handler identity). Needs agent crate in view.
- Add abort cascade section and constraint to call-protocol.md
- Update crates/call/README.md with OQ-17, OQ-18, and two new design principles
- Update architecture README.md with OQ-17, OQ-18
Resolve the contradiction between ADR-008's "capability source" model
and operation-registry.md showing vault operations on the wire. ADR-014
establishes: vault is assembly-layer only, capabilities carry outbound
credentials (distinct from inbound identity), call protocol carries no
secret material, adapters take credential sources not static tokens.
- Add ADR-014 (Secret Material Flow and Capability Injection)
- Remove vault/derive, vault/unlock, vault/decrypt from call protocol
registration examples and all spec examples
- Add Capabilities field to OperationContext, propagate through
LocalOperationEnv nested calls
- Add Capability Injection section to operation-registry.md
- Add no-secret-material wire constraint to call-protocol.md
- Add streaming subscribe example (LLM chat with Vercel UI chunks)
- Add Security Model section to overview.md (identity vs capabilities)
- Trim WASM treatment from ~20 lines to a design-constraint note
- Add OQ-16 (resolved: no vault ops on wire), update OQ-08, OQ-15
- Update ADR-003, ADR-008, ADR-013 to remove stale "via call protocol"
vault references
- Rewrite OQ-12: separate two distinct TLS identity use cases (RFC 7250
raw keys as default for P2P, X.509 for domain-hosted/browsers) instead
of conflating them as 'file paths now, ACME later'. ACME is a proven
pattern from the reverse-proxy project, not speculative future work.
- Resolve OQ-13 and OQ-14: remove 'Phase 1' framing from core crate
specs. /{service}/{op} is the correct design for alknet-call, not a
simplification. Batch as correlated call.requested events is the correct
protocol design. Core crates need to be done right from the start.
- Add ADR-013: Rust as canonical implementation language. TypeScript
@alkdev/operations is a reference that informed the design, not a
parallel implementation. The only JS use case is browser SDK adaptation.
Five reasons: memory safety, LLM competence, supply chain attacks,
performance, browser-only JS.
- Add alknet-agent crate to the crate graph (depends on alknet-call, not
alknet-core). Agent service uses call protocol client for tool dispatch
and vault/derive for provider keys — no env vars for secrets. ALPN
alknet/agent added to the registry.
- Add OQ-15: call protocol client and adapter contract. alknet-call needs
both server (CallAdapter) and client (remote invocation over QUIC), plus
the adapter traits (from_*, to_*) that enable composition.
- Clarify alknet-napi as thin NAPI projection layer, not business logic.
- Fix bugs: ProtocolController → ProtocolHandler typo, OperationEnv
invoke() path format inconsistency, RateLimitConfig comment confusion.
- Update endpoint.md TLS section: comprehensive identity model comparison
table, RFC 7250 as default mode, ACME as proven pattern.
Add architecture specs for the alknet-call crate:
- call-protocol.md: CallAdapter, EventEnvelope wire format, bidirectional
stream model with ID-based correlation, PendingRequestMap, protocol
operations (call/subscribe/batch/schema), per-request identity resolution,
connection/stream lifecycle, error codes
- operation-registry.md: OperationSpec, async Handler type, OperationRegistry,
AccessControl with trusted call bypass, OperationEnv with context
propagation (parent_request_id, identity inheritance), service discovery,
irpc integration layering, naming convention (no leading slash in names)
- ADR-012: Call protocol uses bidirectional QUIC streams with EventEnvelope
framing and ID-based correlation. Protocol is stream-agnostic and symmetric.
Resolves OQ-07.
Key design decisions:
- Handler type is async (Fn returning Pin<Box<dyn Future>>)
- OperationEnv::invoke propagates parent context (identity, metadata,
parent_request_id)
- Identity resolution is per-request, not per-connection
- Operation names without leading slash (fs/readFile, not /fs/readFile)
- Batch is a client-side pattern, not a protocol primitive (OQ-14)
- Phase 1 uses service/op paths, node prefix added later (OQ-13)
Also: promote ADR-010 and ADR-011 from Proposed to Accepted, add OQ-13
and OQ-14 to open-questions.md.
Correct the conflation of quinn/TLS/iroh as interchangeable transports.
They are complementary connectivity modes serving different deployment
contexts: quinn (public IP + TLS), iroh (NAT traversal via relay), TCP
(handler-specific, not core). Clarify that TLS cert = network identity,
not auth identity. Map stealth mode to HTTP handler on standard ALPNs
instead of byte-peeking. Resolve OQ-05 as one-way door. SendStream/
RecvStream now use internal enum dispatch for both quinn and iroh
streams.
Rename the crate from alknet-secret to alknet-vault to better reflect its
purpose as a local key vault (seed management, key derivation, encryption)
rather than a network service.
Symbol renames:
- SecretService → VaultService
- SecretServiceHandle → VaultServiceHandle
- SecretServiceActor → VaultServiceActor
- SecretServiceError → VaultServiceError
- SecretProtocol → VaultProtocol
- SecretMessage → VaultMessage
- ServiceLocked → VaultLocked
- alknet_secret → alknet_vault (crate name)
Update ADR-008 with vault access pattern: the vault is a capability source,
not a service endpoint. The CLI injects derived/decrypted material into
operation contexts — handlers never hold vault references.
Create the alknet-secret crate with BIP39 mnemonic generation, SLIP-0010
Ed25519 HD key derivation, AES-256-GCM encryption, and SecretProtocol
irpc service definition. This is Phase 3.1 from the integration plan.
Architecture changes:
- Promote secret-service.md to reviewed status with full spec format
(crate structure, public API, security model, phase progression,
ADR/OQ cross-references, wire format compatibility section)
- Add ADR-038 (seed lifecycle and memory security): zeroize for v1,
mlock deferred to Phase B
- Add OQ-SEC-01 (mlock/VirtualLock for seed RAM) to open-questions.md
- Update README.md with ADR-038 and secret-service status
Crate structure:
- src/mnemonic.rs: BIP39 phrase generation, validation, seed derivation
- src/derivation.rs: SLIP-0010 HD key derivation, path constants (74')
- src/encryption.rs: AES-256-GCM encrypt/decrypt, EncryptedData type
- src/protocol.rs: SecretProtocol irpc enum, DerivedKey, KeyType
- src/service.rs: SecretServiceHandle with Unlock/Lock lifecycle
- 40 passing tests (unit + integration + doc)
Update four existing specs (overview, server, napi-and-pubsub, call-protocol) to
reflect Phase 0 decisions: three-layer model, IdentityProvider, ForwardingPolicy,
OperationEnv, static/dynamic config split. Review all 9 Phase 0a ADRs (026-034)
for consistency. Fix 4 critical issues from architecture review: missing OQ-SVC-05
in open-questions.md, deprecated hub terminology, undefined AuthService and noq
terms. Replace inline OQ text with cross-references per format rules. Add
ConfigServiceImpl definition to configuration.md. Port absolute workspace paths
to project-relative links by copying referenced docs (feasibility, certbot,
fail2ban, event_source_types) into docs/research/.
Unified authentication (ADR-023): SSH and WebTransport auth share the same
Ed25519 key material. Token auth uses signed timestamps verified against the
same authorized_keys set. IdentityProvider trait decouples core from identity
storage.
Bidirectional call protocol (ADR-024): Generalizes control channel (ADR-018)
to support hub→spoke and spoke→hub calls. Operation paths use /{spoke}/{service}/{op}
format for three-level routing. EventEnvelope wire format, five call events,
PendingRequestMap for correlation.
Handler/spec separation (ADR-025): Downstream consumers register operations
without modifying core. OperationRegistry maps paths to specs + handlers.
Service discovery via /services/list and /services/schema.
Resolves OQ-17 (transport-aware auth), OQ-21 (spoke routing), OQ-CFG-04 and
OQ-CFG-06 (WebTransport auth and transport-aware auth layer). Adds OQ-18
through OQ-22 for remaining open questions.
Address 5 critical and 7 warning issues from architecture review:
- Fix duplicate sentence in napi-and-pubsub.md server side section
- Add wraith- namespace reservation to server.md constraints (ADR-018)
- Document stealth mode TLS-only requirement in server.md
- Create ADR-019 for --proxy dual semantics (client vs server)
- Clarify NAPI connect() vs CLI wraith connect distinction
- Add SOCKS5h default as privacy design decision in client.md
- Expand reconnection section (always-on, re-register port forwards)
- Add graceful shutdown sections to client.md and server.md
- Specify OpenSSH key format for path-or-buffer inputs across all docs
- Resolve pubsub alternative approach ambiguity (ADR-018 is primary)
- Replace server.md handler impl block with behavioral description
- Standardize iroh endpoint ID terminology (base58-encoded)
- Remove iroh API implementation details from transport.md/server.md
- Add error handling pattern as cross-cutting concern in overview.md
- Update all document statuses from draft to reviewed