Phase 0 exploration for alknet-ssh: confirms SSH-over-QUIC-bistream via
tokio::io::join (no custom adapter needed, per reference impl), russh 0.60.2
generic run_stream/connect_stream, and channel-into-bistream multiplexing.
Surfaces 9 decision points for Phase 1: host key sourcing (vault-derived vs
config), channel policy v1 surface, client + SOCKS5 crate split, crypto
backend, auth method coverage, and a stream-handling POC to close russh's
upstream test gap.
First dedicated coverage pass (cargo-llvm-cov --workspace --all-features).
Workspace at 87.1% line coverage (5759/6615), all 224 tests pass. Vault
and registry layers are essentially fully covered; gaps concentrate in
endpoint.rs (56%), types.rs (57%), and connection.rs (54%), all stemming
from tests using MockConnection whose open_bi/accept_bi return Err.
Eight suggestions (S1-S8) ordered by leverage: pure-function tests for
dispatch_envelope / map_*_connection_error / error Display+Debug (S1-S3),
Tier A directly-callable TLS/rustls helpers in endpoint.rs (S4), one
loopback quinn integration test as the real unlock across four files (S5),
ACME event-loop extraction via synthetic stream (S6, the flagged research
item), and two small remaining gaps (S7-S8). No critical or warning
findings — this is a testing-infrastructure gap, not a logic gap.
Three tasks implementing ADR-027:
1. core/rawkey-decouple-from-iroh: TlsIdentity::RawKey now uses
Ed25519SecretKey (alknet-core-owned wrapper over ed25519_dalek)
instead of iroh::SecretKey. RawKeyCertResolver and Ed25519SigningKey
un-gated from #[cfg(all(quinn, iroh))] to #[cfg(quinn)] only.
Quinn-only builds (default) now support RFC 7250 raw-key identity.
iroh transport converts via iroh::SecretKey::from_bytes.
2. core/endpoint-request-client-cert: replaced with_no_client_auth()
with AcceptAnyCertVerifier — a custom ClientCertVerifier that
requests client certs but doesn't require them or verify against
a CA. alknet's identity model is fingerprint-based (the
authorized_fingerprints set is the trust anchor), not PKI-based.
Peer certs are extracted at the TLS layer for fingerprinting;
peers without certs connect normally.
3. core/acme-integration: TlsIdentity::Acme variant (domains,
cache_dir, directory, contact) + AcmeDirectory enum. TlsSetup
two-phase construction: synchronous for X509/RawKey/SelfSigned,
async for Acme (spawns AcmeState event loop, builds ServerConfig
with ResolvesServerCertAcme). acme-tls/1 ALPN added when ACME is
active; dispatch_quinn guard closes challenge connections
gracefully (challenge is TLS-layer-handled). acme feature gate
keeps rustls-acme out of non-ACME builds.
Workspace: build/test/clippy green across all 3 feature configs
(quinn-only, quinn+iroh, quinn+acme, all-features). 331 tests, 0
failures, 0 warnings.
ADR-027 resolves the architectural gap surfaced when ACME integration
became a concrete target:
1. TlsIdentity::Acme variant — static config data (domains, cache_dir,
directory, contact) with async AcmeState constructed at endpoint
setup via two-phase TlsSetup (not stuffed into the Clone-able enum).
2. TlsIdentity::RawKey decoupled from the iroh feature — uses
Ed25519SecretKey (alknet-core-owned wrapper over ed25519_dalek)
instead of iroh::SecretKey. Raw-key TLS identity (RFC 7250, the
default for most alknet nodes) now works in quinn-only builds.
iroh transport converts via SecretKey::from_bytes.
3. ACME feature-gated behind new acme feature (rustls-acme optional
dep). Non-ACME builds don't compile it.
4. dispatch_quinn guard for acme-tls/1 challenge connections — TLS-ALPN-01
is handled at the rustls cert resolver layer during the handshake;
the guard closes challenge connections gracefully instead of logging
a misleading "no handler" warning.
Research confirmed QUIC (quinn) handles ACME challenges differently than
TCP (reverse-proxy): quinn gives no ClientHello peek hook, but the
challenge is fully answered at the cert resolution step before the
connection surfaces to the application. No handler registration needed.
Spec updates: config.md, endpoint.md, open-questions.md (OQ-12),
overview.md + README.md (ADR index), ADR-010 (cross-ref).
Tasks: core/rawkey-decouple-from-iroh (gen 1, no deps),
core/acme-integration (gen 2, depends on rawkey). Graph: 36 tasks.
W1 (call/protocol/abort-cascade-wiring): wire AbortCascade into
CallAdapter handle_stream for EVENT_ABORTED. Cascades with
AbortPolicy::AbortDependents, aborts root, no descendant frames on
wire (ADR-016 Decision 2). Two integration tests added.
W2 (core/endpoint-client-fingerprint): extract TLS client cert
fingerprint in dispatch_quinn (SHA256:<hex> of leaf cert DER via
peer_identity) and dispatch_iroh (ed25519:<hex> of peer NodeId).
Fingerprint format documented in auth.md. Server config change
(with_no_client_auth → request-but-don't-require) deferred to new
follow-up task core/endpoint-request-client-cert.
W3 (vault/mnemonic-debug-redaction): replace Mnemonic derive(Debug)
with manual redacting impl (phrase: "[REDACTED]"). Seed confirmed
no Debug impl. Redaction test added.
W4 (core/auth-apikey-resources): Option B — drop entry.resources from
spec. External identities (token/fingerprint) grant scopes only;
resource-scoped ACLs are composition-internal (ADR-015/022). auth.md
corrected + limitation documented. Two tests confirm empty resources.
review-post-impl-fixes: all 4 verified, workspace green (326 tests,
0 failures, 0 clippy warnings). Review #004 status → resolved.
Graph: 34 tasks, 12 gens.
The vault spec-to-implementation sync is complete. Remove the drift
tracking tools that were only needed during sync:
- Remove the Known Source Drift table from vault/README.md
- Remove 'known drift' / 'current source uses X' prose from Security
Constraints sections in vault/README.md, encryption.md, and service.md.
The permanent constraint statements (OsRng for IVs, zeroized drop,
no unwrap, etc.) are preserved.
- Remove the drift paragraph in encryption.md Key Versioning.
- Remove stale 'to be updated per ADR-025' / 'postcard tests to be
removed' notes in protocol.md References.
- Bump status: draft -> stable in the frontmatter of all vault docs
(README, mnemonic-derivation, encryption, service, protocol).
- Update architecture/README.md: vault doc status entries to stable,
Current State paragraph reflects vault implementation complete (no
'pending ADR-025/026 refactor' language).
Review #003 found 11 critical, 14 warning, and 6 suggestion findings
after reviews #001 (governance/security) and #002 (cross-document
consistency/two-way-door audit) were resolved. The theme: types and
APIs that were *referenced* but never *defined*, and stale ADR sketches
that didn't match the now-updated spec docs.
Critical fixes (11):
- C1: DerivedKey #[derive(Deserialize)] contradicted the custom
Deserialize that rejects "[REDACTED]" — dropped the derive, added
explicit manual Serialize/Deserialize impls (protocol.md).
- C2: encrypt prose said "derived at PATHS::ENCRYPTION" but the
signature takes key_version — updated to encryption_path_for_version
(service.md).
- C3: derive_encryption_key returned DerivedKey, derive_encryption_key
_for_version returned EncryptionKey (same cache) — unified on
DerivedKey, defined CachedKey (service.md).
- C4: tokio vs std::sync::RwLock contradiction — specified
std::sync::RwLock, dropped tokio from vault deps (ADR-018, ADR-025,
service.md).
- C5: Missing drift rows in vault README — added #9 (key_version
ignored) and #10 (rotate not implemented).
- C6: ADR-022 build_root_context and invoke() sketches omitted
abort_policy (9 fields vs 10) — added the field to both sketches.
- C7: Capabilities type referenced 20+ times, never defined — added
struct definition to core-types.md with Clone+Send+Sync, Zeroize,
sealed builder API, immutability guard.
- C8: SessionOverlaySource on CallAdapter but never defined, crate
violation (alknet-call can't depend on alknet-agent) — defined the
trait in alknet-call (call-protocol.md), matching the IdentityProvider
pattern.
- C9: CompositeOperationEnv dispatch fall-through was "a two-way door"
— added contains() to OperationEnv trait, made the composite probe
before dispatching, eliminating the sentinel ambiguity.
- C10: No API for Layer 2 (connection overlay) registration, CallConnection
undefined — defined CallConnection struct + register_imported() API
(call-protocol.md).
- C11: with_local signature diverged between two examples (4 args vs 5)
— added capabilities as the 5th arg, made both examples consistent.
Warning fixes (14):
- W1: invoke_with_policy restructured as required method, invoke gets a
default impl delegating to it — eliminates duplication across impls.
- W2: CachedKey defined (service.md).
- W3: EncryptionKey constructor/glue specified, added to re-export list.
- W4: Secp256k1ExtendedPrivKey defined, derive_ethereum_key glue shown.
- W5: encryption_path_for_version rejects version < 2 (v1 is TS PBKDF2).
- W6: Wire payload schemas for all event types + ResponseEnvelope →
EventEnvelope conversion table (call-protocol.md).
- W7: Timeout section — deadline on OperationContext, composed calls
inherit parent's deadline, CallAdapter::with_timeout().
- W8: Request ID generation spec — UUID v4 for composed calls, wire ID
vs internal ID relationship for abort cascade.
- W9: unlock_new already-unlocked behavior specified (returns
AlreadyUnlocked).
- W10: KeyType Serialize/Deserialize justification corrected (stale
irpc reference removed).
- W11: OperationProvenance and CompositionAuthority defined inline in
operation-registry.md (were only in ADR-022).
- W12: encrypt/decrypt free functions marked pub(crate), relationship
to VaultServiceHandle methods stated.
- W13: rotate signature removed from encryption.md (it's a
VaultServiceHandle method, not a free function).
- W14: CallAdapter::new() + with_session_source() + with_timeout()
constructors shown.
Suggestion fixes (6): Seed: Clone note, VaultServiceInner invariant,
ExtendedPrivKey accessor signatures, CURRENT_KEY_VERSION location, ADR-018
stale actor text, derivation helpers re-export note.
Add ADR-026 (vault key model — HD derivation) recording the foundational
HD-derivation decision, 74' coin type reservation, SLIP-0010/Ed25519
default, secp256k1 feature-gating, and AES-256-GCM cipher choice. These
were previously inline rationale with no ADR (W9).
Extend ADR-018 with an explicit EncryptedData wire format lock — fields,
encoding, and semantics are frozen; no removal without a format-version
migration (W10).
Resolve the remaining guard clauses and spec decisions:
- W2: Capabilities must be immutable after construction (no interior
mutability). Makes the Arc vs deep-copy clone semantics genuinely
two-way.
- W5: Published to_* specs are compatibility contracts — best-effort
mappings are two-way before first publication, one-way after. Version
generated specs.
- W6: Salt field clarification — v2 salt is permanently unused; a future
KDF is a different derivation family, not a version-indexed path; the
field saves a wire-format change only.
- W7: unlock_new returns Zeroizing<String> — the mnemonic is the root of
trust and must not linger in freed memory.
- W17: OQ-09 WASM — server-side dispatch door is honestly closed
(Connection is concrete, tokio-bound), not implicitly preserved.
- W18: OQ-10 git — composability fork (raw smart protocol vs call-protocol
projection) is a separate decision from ERC721 scope.
- W20: from_openapi must prefix imported error codes (HTTP_404) to avoid
collision with protocol-level codes (NOT_FOUND). Normative rule, not
naming convention.
- W21: ScopedOperationEnv field is private — construction via new()/
empty(), query via allows(). Makes the future subgraph refactor
non-breaking.
- C13: Connection::set_identity — the endpoint does not read identity()
after handle() returns (Connection is moved into the spawned task).
Observability is handler-side logging. Simplest honest answer.
- W1: OperationAdapter trait is async, returns Vec<HandlerRegistration>.
from_call requires async discovery; ADR-022 changed the return type.
- W11: CompositionAuthority::as_identity() defined — constructs a
synthetic Identity (label as id, scopes, resources) not resolvable via
IdentityProvider. Second Identity construction path, acknowledged.
- W14: SecretKey is iroh::SecretKey (Ed25519) — consistent with the
endpoint's iroh dependency.
- W19: Grandchild abort propagation is inherit-by-default (option a) —
invoke() with no explicit policy inherits parent's policy. ContinueRunning
auto-propagates to grandchildren unless explicitly overridden.
The password-manager pattern (deterministic per-site passwords from HD
derivation) is not relevant to an RPC system's vault. Handlers call APIs
(using API keys, OAuth tokens, mTLS), not websites with passwords. The
vault is for cryptographic key derivation and credential encryption.
Removes:
- derive_password, derive_password_string from service.md
- site_password_path from mnemonic-derivation.md
- m/74'/1'/0'/{hash}' path from PATHS module and path semantics table
- derive_password row from the cache table
Resolves review #002 C9 (site_password_path hash mapping underspecified)
by removing the feature rather than specifying the non-standard
string→u32 mapping and Ed25519-as-password-entropy construction.
If deterministic password generation is ever needed (browser-automation
edge case), it can be re-added — the cost is near-zero. Removing it now
eliminates permanent API surface inherited from a prior project's
password-manager pattern.
Drops irpc from alknet-vault entirely. The vault's dispatch is now direct
method calls on VaultServiceHandle — no VaultProtocol enum, no
VaultMessage, no VaultServiceActor, no mpsc channel, no Service trait, no
RemoteService trait, no postcard serialization. The vault is local-only by
construction.
The core security argument: irpc made the vault remote-capable by default
(RemoteService generated unless no_rpc is passed). The IrohProtocol handler
forwards all messages without auth. The docs framed 'register an ALPN' as a
server-setup change. This is the default-insecure anti-pattern — security
should be opt-in, not opt-out. ADR-025 inverts the default: local-only is
the only mode, and remote access requires building a separate vault-server
crate (a visible architectural act, not a flag flip).
The actor path was already dead code — service.md said 'prefer
VaultServiceHandle directly — no channel, no serialization.' The actor
existed only to make irpc's Service trait work, which existed only to make
RemoteService work, which was the footgun. VaultServiceHandle's
Arc<RwLock> provides concurrent reads and exclusive writes — better
throughput than the actor's sequential processing.
DerivedKey serialization simplifies: always redact on serialize (for
logging safety), reject '[REDACTED]' on deserialize with an error. No
'postcard preserves bytes' path. This resolves review #002 W8 (silent
corruption on JSON-deserialized DerivedKey).
Resolves:
- OQ-21: remote vault access — resolved (not deferred). Not a vault crate
feature; if needed, a separate vault-server crate with its own ADR.
- C7: vault-server-crate question decided — not created now, not precluded.
- C8: operation access policy table dissolved — all operations local-only
by default; if a vault-server crate exposes some remotely, that crate
defines the policy.
- W8: DerivedKey JSON deserialization — resolved (reject redacted payloads).
Amends ADR-005 (irpc remains for alknet-call, not for alknet-vault),
ADR-018 (vault is even more standalone — zero RPC framework deps),
ADR-019 (vault is the only layer, not just the only direct-caller layer),
ADR-008 (vault integration point unchanged, but now local-only by
construction).
Diagnoses a conflation in the pre-ADR-024 spec: the OperationRegistry
inherited immutability by analogy from ADR-010's HandlerRegistry (ALPN-level),
but the TLS-config argument that justifies HandlerRegistry immutability does
not apply to the operation registry, which lives behind a single ALPN
(alknet/call). This made from_call (which discovers ops over a live connection
at runtime) structurally incompatible with the blanket immutability claim.
ADR-024 layers the operation registry by trust boundary: curated (Local) ops
are static and immutable — the startup trust boundary is where their
composition authority is granted; session (Session) and imported (FromCall
etc.) ops are dynamic at their respective scopes (per-session, per-connection)
— their trust boundaries are per-scope, not per-startup. The principle:
immutability follows the trust boundary. Immutability is the security control
for composing ops (can escalate privilege); provenance + composition authority
are the controls for non-composing ops (can't escalate).
The OperationEnv trait becomes the integration point (Arc<dyn OperationEnv>),
following the IdentityProvider precedent (ADR-004): the CallAdapter composes
the root OperationContext.env per incoming call from the active layers
(curated base + connection overlay + session overlay). Children inherit the
parent's composite env by Arc::clone — overlay composition happens once at
the root and propagates through the composition tree.
Resolves review #002 C6 (OperationContext.env type identity crisis): the
field is split into scoped_env: ScopedOperationEnv (reachability data, from
the registration bundle) and env: Arc<dyn OperationEnv + Send + Sync>
(dispatch trait object). One field was being used as two different types
(reachability set with .allows() and dispatch trait with .invoke());
Localizes W4 (hot-swap ↔ registry mutability coupling) to the connection
scope: no global mutable registry to hot-swap; overlays replace naturally
with connect/disconnect and session start/end. Schema-drift on reconnect is
a per-connection overlay-rebuild concern, not a global hot-swap protocol.
Partially addresses W3 (CallClient registry security): the registry-shape
sub-question is resolved by the overlay model; the capability-exposure
sub-question (what capabilities a remote peer can trigger) remains for
ADR-017 — ADR-024 does not overclaim resolution there.
Amends OQ-04 to scope its immutability claim to the HandlerRegistry and
cross-reference ADR-024 for the operation registry. Generalizes OQ-19's
session-overlay mechanism to also cover connection-scoped remote imports —
both are per-scope dynamic overlays on the static curated base, using the
same trait-layering mechanism.
Governance (Tier 2):
- Advance ADR-022 and ADR-023 from Proposed to Accepted (specs already
depend on their types as source of truth)
- Amend ADR-015: mark Decision 3 and Assumption 6 as superseded by ADR-022;
update handler_identity type to CompositionAuthority
- Amend ADR-002: note handle() signature revised by ADR-007 (BiStream → Connection)
- Amend ADR-004: note 'enrich/replace' AuthContext language superseded by
ADR-011's immutability model; update to describe set_identity on Connection
- Update main README ADR table to show ADR-022/023 as Accepted
Spec-ADR consistency (Tier 3):
- Add abort_policy: AbortPolicy field to OperationContext struct (ADR-016
Decision 6 mandated this but the spec omitted it)
- Define AbortPolicy enum (AbortDependents | ContinueRunning) with Default impl
- Add abort_policy to build_root_context and LocalOperationEnv::invoke()
- Define the OperationEnv trait explicitly with invoke() and
invoke_with_policy() methods (was referenced as 'must remain a trait'
but never defined)
- Specify From<StreamError> for HandlerError impl with exact variant mapping
- Add Connection::from_quinn() / from_iroh() constructors (was referenced
as Connection::new() but never defined)
- Remove undefined CertAuthorityEntry placeholder from AuthPolicy v1 (will
be added additively when alknet-ssh lands)
- Fix config.md key-differences table: rate limits are in DynamicConfig,
not StaticConfig
Mechanical fixes (Tier 1):
- overview.md: 'closes the QUIC stream' → 'closes the connection' (stale
from pre-ADR-007 model)
- overview.md: OQ-04 entry updated from stale 'defer to implementation'
to 'resolved: static at startup'
- mnemonic-derivation.md: remove duplicate helper functions block (incomplete
first copy, complete second copy)
- ADR-003: add iroh (feature-gated) to alknet-core dependency list, added
by ADR-010
- ADR-021: fix ambiguous 'W1 drift issue from the vault review' cross-reference
- ADR-022: rephrase FromCall 'leaf locally' to 'leaf in the local registry'
- ADR-017: add error_schemas to from_call mirror list and services/schema
step (inconsistency with ADR-023)
- ADR-016: fix self-referential citation ('ADR-016 Assumption 5' → 'Assumption 5')
- Add ScopedOperationEnv::empty(), allows(), new() and
CompositionAuthority::none(), new() impl blocks (referenced but undefined)
- Add call.completed clarification for non-subscription calls
- Add services/schema leading-slash normalization note
- Crate README ADR tables: add missing ADR-013 (call), ADR-015 (core),
ADR-006 + ADR-010 (vault)
- Vault README: add consolidated 'Known Source Drift' table tracking all
four drift items (OsRng, unwrap, CURRENT_KEY_VERSION, spawn bug) in one
place, including the two previously missing from README
Second pre-implementation review. Goes wider than #001 on cross-document
consistency and the two-way-door framing from ADR-009.
Finds 13 critical, 21 warning, 12 suggestion issues:
- Governance: ADR-022/023 are Proposed but specs treat them as binding;
ADR-015/002/004 (Accepted) contradict later refinements without supersession
markers
- Abort policy (ADR-016) missing from OperationContext struct; OperationEnv
trait never defined
- OperationContext.env type identity crisis (reachability set vs dispatch
trait)
- ADR-017 from_call mirror list missing error_schemas; OperationAdapter trait
stale vs ADR-022 bundle
- OQ-21 remote vault 'non-breaking' framing conflicts with ADR-019 and hides
a crate-decomposition decision; RemoteService path unvalidated
- Vault operation access policy table incomplete for security-sensitive methods
- site_password_path string-to-index mapping breaks determinism guarantee
- Two-way-door audit: ADR-022 narrowed several doors without updating OQ
classifications; 'published artifact is a contract' blind spot in framework
Includes recommended 5-pass resolution order.
ADR-015 L171 said the scoped env API was 'a two-way door for
implementation.' ADR-022 has now resolved it: ScopedOperationEnv with
operation-level granularity (HashSet<String>), not namespace-level.
Update the stale text to point to the resolution.
ADR-016 Decision 6 specifies that the abort policy (abort-dependents vs
continue-running) is set on OperationContext and propagated through
OperationEnv::invoke() — the composing handler decides the child's
policy, not the wire caller. The call.requested payload does not carry
an abort policy field. This resolves the TBD that was masquerading as a
two-way door: two of the three options ADR-016 floated (wire payload,
per-operation declaration) were inconsistent with the ADR's own
assumptions.
Also marks review #001 as resolved — all 5 critical, 4 warning, and 4
suggestion findings are now addressed.
ADR-023 adds error_schemas to OperationSpec so operations can declare
their domain-level failure modes (FILE_NOT_FOUND, RATE_LIMITED, etc.)
distinct from protocol-level codes (NOT_FOUND, FORBIDDEN, etc.). The
call.error payload gains an optional 'details' field carrying the typed
error payload conforming to the declared schema. from_openapi/to_openapi
map OpenAPI response status codes to/from ErrorDefinitions, making the
adapter contract from ADR-017 faithful on the error axis.
Also fixes W2 (KeyVersionMismatch stale comment in encryption.md —
ADR-021 implements rotation without this variant) and W4
(derive_encryption_key_for_version missing from service.md method list).
Spec updates: operation-registry.md (OperationSpec, ErrorDefinition,
Handler error mapping, services/schema), call-protocol.md (call.error
payload, CallError, ResponseEnvelope), README.md, overview.md,
open-questions.md (OQ-24), call/README.md, encryption.md, service.md.
ADR-022 wires the three controls ADR-015 specified but left without
registration paths (C1-C4 from review #001): composition authority,
scoped env, and capabilities now enter through a HandlerRegistration
bundle. Provenance (Local, FromOpenAPI, FromMCP, FromCall, Session)
determines which ops can compose — leaves don't get composition
authority. CompositionAuthority replaces handler_identity: Identity
(it's a declared authority bundle, not a peer identity). Capabilities
are per-request from the bundle (resolves closure-capture vs context
ambiguity). Kernel/user analogy: user's authority checked at External
gate; handler's composition authority used inside; scoped env bounds
reachability.
Also fixes W1 (stale ADR-020 path example) and W3 (from_mcp missing
from adapter lists in operation-registry.md).
Spec updates: operation-registry.md (OperationRegistry,
HandlerRegistration, OperationContext, OperationEnv, registration
example, capability injection), call-protocol.md (build_root_context),
README.md, overview.md, open-questions.md (OQ-23), call/README.md.
Third POC iteration (alknet-fs-sync-poc, 9/9 tests) proves multi-node
path-tree sync:
- Path tree modeled as automerge CRDT document, synced via automerge's
sync protocol over iroh QUIC connections
- Each node has a local replica; writes are local + immediate (no
network latency); sync is async, gossip-style, eventually consistent
- Concurrent writes to different paths converge cleanly; concurrent
writes to same path resolve via LWW (NFS-equivalent semantics)
- Content (blobs) and metadata (path tree) sync separately — automerge
for path edges, iroh-blobs for file bytes
- Branch inheritance works through automerge sync
Key finding: automerge concurrent put_object on same key creates a
conflict, not a merge. Root structures must be created by one node and
synced before other nodes write. This is a design constraint for the
spec.
24 total tests pass across both POC crates. All remaining unknowns are
implementation-scope, not feasibility blockers.
Validates the three-layer architecture for a content-addressed, branch-aware,
mountable filesystem:
- SQLite path tree over iroh-blobs MemStore (15/15 tests pass)
- Fossil-style branching with free content dedup via BLAKE3 content addressing
- honker-core for notify-on-commit inside the same transaction as path-tree
mutations (transactional outbox pattern)
- Write path: "branch on write, merge on close" reconciles BLAKE3-must-hash-
complete-file with chunked filesystem writes; concurrent readers see old
version until close commits atomically; crash/abort leaves old version intact
- Multi-tenancy via bucket_id column (free isolation, auth is an adapter problem)
Remaining unknowns (FsStore/redb coexistence, distributed incomplete-blob reads,
SFTP wiring, GC/tag management, branch chain depth) are implementation-scope,
not feasibility blockers.
Documents the metatensor format: a binary data format where a TypeBox/jsonschema
schema describes the layout of binary data at schema-computed offsets. Extends
safetensors (fixed TensorRef schema) to arbitrary schemas, enabling struct tensors
(records), blob tensors (variable-length via indirection), and nested layouts.
Key points:
- TypeBox schemas render to standard JSON Schema; the jsonschema Rust crate
validates them with zero translation. Custom typedef.ts kinds (TFloat32,
TInt32, TStruct) map to jsonschema custom keywords via with_keyword().
- This eliminates typebox-rs as a schema engine — replaced by jsonschema +
a small offset-computation module + ~50 lines of custom keyword impls.
- Three tensor kinds: flat (safetensor today), struct (record of typed fields),
blob (struct tensor as index + flat tensor as data store, for variable-length)
- Memory-mappable: parse header, compute offsets, mmap data, typed views per
schema. No copy, no deserialization.
- QUIC-streamable: header is one small JSON message, each tensor is a separate
stream. Lazy loading, parallel transfer, incremental compute.
- ujsx-authorable: <Tensor>, <Struct>, <Field> as layout components, same
reconciler that diffs UI trees diffs model schemas. Model versioning is
tree diffing.
- Category-theory foundation: ujsx as universal typed-tree IR, HostConfig as
interpreter. <Tensor> is no stranger than <div>.
Adds a major section documenting how @alkdev/flowgraph (already npm-published,
uses ujsx) becomes the compute graph authoring and execution layer for
alknet-tensor, replacing webgpu-torch's imperative nn.Module hierarchy and
autograd recording with declarative ujsx templates and reactive DAG execution.
Key points documented:
- The ujsx tree IS the compute graph (CUDA-graphs-shaped but declarative)
- flowgraph's two HostConfigs: GraphologyHostConfig (compile/validate) and
ReactiveHostConfig (execute with signal-driven status propagation)
- nn modules become ujsx components, autograd becomes reverse tree walk
- Conditional/Map components enable dynamic structure CUDA graphs can't express
- Network-callable compute graphs (mix local + remote ops in one template)
- TSX authoring via standard JSX→h transform (ujsx jsx-runtime as target)
- graphology → petgraph port: ~15 API methods map 1:1, removes ~5400 lines of JS
- Updated POC priorities: end-to-end skeleton now includes flowgraph integration,
petgraph host port as a separate POC
Documents the architectural direction for a PyTorch-shaped tensor computation
library built on Rust + wgpu, where QuickJS is a thin API/composition layer
and Rust owns memory, dispatch, and WGSL codegen. Derived from webgpu-torch
as the reference design (op_spec → opgen → WGSL shader pipeline) but not a
port of its code — webgpu-torch is the reference, alknet-tensor is the
production architecture.
Key decisions: JS holds handles (BufferId), Rust owns wgpu::Buffers; ~4-5
high-level Rust ops (create_tensor/dispatch_kernel/register_kernel/read/write)
not ~20 low-level GPU API calls; WgslGenerator as a third handlebars backend
in typebox-rs codegen alongside RustGenerator and TypeScriptGenerator; tensor
ops as OperationSpecs on the registry (network-callable over irpc, verified
protocol-compatible on quickjs by POC 2).
Documents the downstream problems this solves as a side effect: distributed
compute over irpc, LLM-authored model code (toolEnv pattern), edge/embedded
tensor compute, the compositing problem sidestepped (compute has no surface),
and cross-platform by construction (wgpu's many backends).
The quickjs-reactive-probe was extended to load @alkdev/operations (registry,
call protocol, response envelopes, ACL, buildCallHandler) alongside the
reactive core. All five operations assertions pass on QuickJS-NG via rquickjs:
registry/execute/envelope/acl/callHandler. 271 modules loaded total.
This closes the third highest-leverage unknown: the operations protocol is
runtime-agnostic in practice, not just in theory. Adds a new section on the
QuickJS UDF host convergence — a minimal isolate speaking the same bidirectional
operations protocol as the TypeScript reference, the Rust alknet-call port,
and the planned NAPI/Python adapters, without needing Node/Deno/Bun. Connects
to the toolEnv WASM-QuickJS sandbox precedent at /workspace/toolEnv.
Captures 5 critical, 4 warning, 4 suggestion findings from a sanity
check of the core, call, and vault crate specs against ADRs 001-021
and the OQ tracker. Criticals cluster on one tangle: the registration
API surface in operation-registry.md doesn't carry the handler
identity, scoped env, or capabilities that ADR-014/015 lock as 'set at
registration' — plus a missing error-schema concept for adapters.
Captures the two completed POCs that resolve the highest-leverage unknowns
around the alknet-desktop direction (Rust + wgpu + rquickjs + ujsx over three.js):
- ui-spoke-poc: headless WebGPU rendering in Deno, three.js WebGPURenderer via
device-capture, MSDF text (the '2D UI is rocket surgery' subproblem)
- quickjs-reactive-probe: @preact/signals-core + @alkdev/typebox + @alkdev/ujsx
reconciler verified compatible with QuickJS-NG via rquickjs
Documents the rejected deno-desktop alternative, the established architectural
direction (head-worker over irpc/ALPN, two HostConfigs over one wgpu surface),
headless/headed parity via llvmpipe, the supply-chain surface reduction, and
the open unknowns that remain before SDD can begin.
The VaultProtocol is a remote-capable irpc service by construction —
#[rpc_requests] generates both Service (local) and RemoteService (remote)
trait impls. DerivedKey's dual serialization (JSON redacts, postcard
preserves) was designed for this. Enabling remote vault access is a
server-setup change, not a protocol change.
OQ-21 enriched with full context:
- What's already in place (protocol, serialization, actor, auth transport)
- What's not in place (IrohProtocol handler forwards all messages without
auth checks; needs NodeId allowlist + message filtering in assembly layer)
- Operation access policy: Unlock/Lock local-only; Derive/Encrypt/Decrypt
remote-capable
- Use case: machine node → workers (workers don't hold mnemonics)
- Per-machine-node vaults, not shared (compartmentalization)
- Breaking vs non-breaking analysis (enabling = non-breaking; protocol
evolution = wire break, manageable via ALPN versioning)
The auth-wrapping handler lives in the assembly layer (or a dedicated
vault-server crate depending on both alknet-core and alknet-vault), not in
the vault crate itself — the vault is standalone (ADR-018) and can't
import alknet-core's auth model.
OQ-21 remains deferred — no commitment to implement, but the door is open
and the design space is mapped.
Key rotation uses version-indexed derivation paths: each key version maps
to a distinct SLIP-0010 path (m/74'/2'/0'/{version-2}'). v2 is at index 0
(PATHS::ENCRYPTION), v3 at index 1, etc.
Mechanism:
- encryption_path_for_version(version) constructs the path
- decrypt derives the key at the version-indicated path (not always
PATHS::ENCRYPTION)
- rotate(blob, to_version) decrypts with old key, re-encrypts with new
- No new mnemonic needed — same seed, different path
- Partial rotation is safe — old keys remain derivable
- The vault does not self-rotate; the assembly layer iterates blobs
Source drift flagged:
- decrypt currently ignores key_version for path selection (always uses
PATHS::ENCRYPTION) — must use version-indexed paths
- rotate method does not exist in source — must be added
- CURRENT_KEY_VERSION must bump from 1 to 2 (per ADR-020, reinforced here)
OQ-22 resolved. Only OQ-21 (remote vault admin, deferred) remains.
The vault uses SLIP-0010 HD derivation from the BIP39 seed for the
AES-256-GCM encryption key, not PBKDF2. This replaces the TypeScript
predecessor's (@alkdev/storage/src/graphs/crypto.ts) PBKDF2-based
approach.
Key decisions:
- HD derivation at m/74'/2'/0'/0' produces the encryption key
- PBKDF2 is not implemented in the vault; no password-based derivation
- salt field is unused in v2 (wire-format compat only)
- key_version=1 reserved for TS PBKDF2 data; key_version=2 for vault HD
- TS-encrypted data requires one-time migration to v2
- CURRENT_KEY_VERSION changes from 1 to 2 (source drift flagged)
OQ-20 resolved: the encryption key derivation method is locked. OQ-22
(key rotation workflow) remains open but does not block implementation.
Spec the vault crate from its existing implementation. The vault is
stable (implementation exists); this spec documents what IS so the
implementation-sync agent can reconcile source drift.
New spec documents (crates/vault/):
- README.md — crate index, security constraints, public API
- mnemonic-derivation.md — BIP39, SLIP-0010, BIP-0032, derivation paths
- encryption.md — AES-256-GCM, EncryptedData, key versioning, salt
- service.md — VaultServiceHandle lifecycle, actor dispatch, cache
- protocol.md — VaultProtocol irpc messages, DerivedKey redaction
New ADRs:
- ADR-018: Vault as standalone crate (zero alknet deps; own types/errors)
- ADR-019: Vault assembly-layer-only access (CLI is sole caller)
New open questions:
- OQ-20: Salt/KDF Phase B (open, low priority — salt field reserved)
- OQ-21: Remote vault administration (deferred — needs ADR if ever needed)
- OQ-22: Key rotation mechanism (open, low priority — workflow not specced)
Spec-vs-source drift explicitly flagged (for the sync agent):
- rand::random() used for IVs instead of OsRng (security-critical)
- unwrap() on every RwLock acquisition (must use unwrap_or_else)
- ADR-038 / OQ-SVC-03 references in source comments are stale (old numbering)
- VaultServiceActor::spawn returns a non-functional second actor (source bug)
- KeyVersionMismatch error variant is defined but unused in v1
Critical:
- operation-registry: remove stale duplicate OperationEnv impl that
propagated parent.metadata through composition (violated ADR-014);
collapse to one canonical block with metadata: HashMap::new()
- operation-registry: fix request_id collision — format!("env-{name}")
produced identical IDs across concurrent invocations, corrupting
PendingRequestMap correlation and the abort-cascade tree (ADR-016)
- operation-registry + ADR-015: fix OperationContext.internal visibility —
pub field let handlers mark their own call internal (privilege
escalation per ADR-015); change to pub(crate) with pub fn is_internal
Warnings:
- core-types: add Connection::set_identity/identity (OQ-11) to the
Connection type spec — was specified in auth.md but missing from the
type definition
- operation-registry: add Capabilities: Clone design note — invoke()
clones capabilities through composition; explicit security implication
- call-protocol: add CallAdapter root OperationContext construction
example showing internal: false wire path, complementing
OperationEnv::invoke() internal: true composition path
- overview: remove alknet/agent from ALPN registry — agent is a future
consumer of alknet-call (call-protocol operations), not a separate ALPN
- call-protocol: clarify call.requested payload schema and the
leading-slash convention (wire operationId has slash, registry name
does not)
Suggestions:
- operation-registry: cross-reference ResponseEnvelope definition
- core-types: add StreamError to HandlerError mapping table
Address security review findings by adding explicit constraints to specs
and implementation specialist role:
Architectural constraints (spec updates):
- metadata does not propagate through OperationEnv::invoke() — fresh
HashMap for nested calls, closes the back-door leak channel where a
handler that puts a secret in metadata would leak it to children and
across from_call to remote nodes (ADR-014)
- Config reload must be authenticated/local-only — malicious reload =
root-equivalent privilege grant (config.md)
- from_call trust is transitive — scoped env bounds reachability, not
what the remote op does (operation-registry.md)
- Token entropy ≥128 bits — prefix is lookup aid not secret, offline
hash verification requires high-entropy tokens (auth.md)
Implementation constraints (auth.md security constraints section + role spec):
- OsRng for cryptographic nonces (AES-GCM IV reuse is catastrophic)
- CachedKey derives Zeroize/ZeroizeOnDrop (no secrets in freed heap)
- No unwrap()/expect() outside tests (poisoned lock recovery, not crash)
- Implementation specialist role spec updated with all four constraints
OQ-11 (handler-level auth observability): Option B — handlers store
resolved identity on Connection via set_identity. Two identity scopes:
connection-level (observability, write-once-read-many) and per-request
(ACL, on OperationContext). Per-request takes precedence for ACL;
connection-level is for logging/audit only.
OQ-19 (session-scoped registries): Protocol doesn't need changes.
OperationEnv must remain a trait (not concrete) to enable session-overlay
pattern. Three-tier registry: core (static, External+Internal), session
(dynamic, Internal-only), promotion (curated review). Documented as
implementation guard in operation-registry.md.
All 19 open questions are now resolved. No open one-way or two-way doors
remain. The architecture is ready for review and implementation.
ADR-017 locks the client/adapter architecture:
- CallClient opens QUIC connections, shares dispatch loop with CallAdapter
- Connection direction independent of call direction (both sides can call)
- from_call adapter: discovers remote ops via services/list + services/schema,
registers with forwarding handlers (same pattern as from_openapi/from_mcp)
- to_openapi/to_mcp: project local ops to external protocols
- OperationAdapter trait: produces (OperationSpec, Handler) pairs
- Cross-node call tree: abort cascade propagates through from_call handlers
- Credentials from capabilities (ADR-014), adapter ops Internal by default (ADR-015)
The dispatch POC at /workspace/@alkdev/dispatch demonstrated head/worker over
SSH+axum; under the call protocol it's cross-node composition via from_call.
Connection topology (who advertises, who opens) is independent of call
direction — runner pattern, dispatch pattern, and P2P all work.
Document the three-tier registry model (core/session/promotion) and the
self-improving agent workflow where agents write their own operations in
a quickjs sandbox. The POC at /workspace/toolEnv demonstrated the sandbox
mechanism (quickjs in Deno web workers, proxy-based env bridge via
postMessage) but exposed the full registry to the sandbox — the security
gap that OQ-18's scoped composition env addresses.
The call protocol doesn't need changes: the OperationEnv trait is the
composition point, and a session-scoped env wraps the global env (session
registry first, fall through to global). The one-way door this OQ guards
against: making OperationEnv concrete instead of a trait, or hardcoding
the global registry into the dispatch path, would close the session-overlay
pattern. Session-scoped operations are always Internal, run under the
handler's identity, and are ephemeral. Promotion to core requires curation
review (architect role with promote scope).
The abort cascade and privilege model are call protocol semantics that
every consumer inherits — NAPI adapter, Python adapter, agent service, and
any future service speaking the EventEnvelope wire format. Framing them as
'needs agent crate in view' let a single consumer's timeline gate a
protocol-level decision. The agent use case is a useful test case for edge
cases, but the decisions belong to the call protocol.
The 'trusted' flag on OperationContext was the wrong word — it implies a
trust decision was made, but what actually happens is the call originated
internally (from composition) not externally (from the wire). Renamed to
'internal' with clarified semantics: internal calls switch authority
context to the handler's identity, not skip ACL. This prevents the
privilege escalation vector where composition with 'trusted: true' bypassed
all access control (buggy handler + parameterized dispatch).
- Rename trusted -> internal across operation-registry.md, ADR-014
- Update OperationContext field description and LocalOperationEnv code
- Add OQ-17: abort cascade for nested calls (call.aborted cascades to
descendants, default abort-dependents, continue-running opt-in). One-way
door on the protocol event schema; mechanism is a two-way door.
- Add OQ-18: privilege model and authority context (internal = authority
switch not ACL skip, External/Internal operation visibility, scoped
composition env + handler identity). Needs agent crate in view.
- Add abort cascade section and constraint to call-protocol.md
- Update crates/call/README.md with OQ-17, OQ-18, and two new design principles
- Update architecture README.md with OQ-17, OQ-18
Resolve the contradiction between ADR-008's "capability source" model
and operation-registry.md showing vault operations on the wire. ADR-014
establishes: vault is assembly-layer only, capabilities carry outbound
credentials (distinct from inbound identity), call protocol carries no
secret material, adapters take credential sources not static tokens.
- Add ADR-014 (Secret Material Flow and Capability Injection)
- Remove vault/derive, vault/unlock, vault/decrypt from call protocol
registration examples and all spec examples
- Add Capabilities field to OperationContext, propagate through
LocalOperationEnv nested calls
- Add Capability Injection section to operation-registry.md
- Add no-secret-material wire constraint to call-protocol.md
- Add streaming subscribe example (LLM chat with Vercel UI chunks)
- Add Security Model section to overview.md (identity vs capabilities)
- Trim WASM treatment from ~20 lines to a design-constraint note
- Add OQ-16 (resolved: no vault ops on wire), update OQ-08, OQ-15
- Update ADR-003, ADR-008, ADR-013 to remove stale "via call protocol"
vault references
- Rewrite OQ-12: separate two distinct TLS identity use cases (RFC 7250
raw keys as default for P2P, X.509 for domain-hosted/browsers) instead
of conflating them as 'file paths now, ACME later'. ACME is a proven
pattern from the reverse-proxy project, not speculative future work.
- Resolve OQ-13 and OQ-14: remove 'Phase 1' framing from core crate
specs. /{service}/{op} is the correct design for alknet-call, not a
simplification. Batch as correlated call.requested events is the correct
protocol design. Core crates need to be done right from the start.
- Add ADR-013: Rust as canonical implementation language. TypeScript
@alkdev/operations is a reference that informed the design, not a
parallel implementation. The only JS use case is browser SDK adaptation.
Five reasons: memory safety, LLM competence, supply chain attacks,
performance, browser-only JS.
- Add alknet-agent crate to the crate graph (depends on alknet-call, not
alknet-core). Agent service uses call protocol client for tool dispatch
and vault/derive for provider keys — no env vars for secrets. ALPN
alknet/agent added to the registry.
- Add OQ-15: call protocol client and adapter contract. alknet-call needs
both server (CallAdapter) and client (remote invocation over QUIC), plus
the adapter traits (from_*, to_*) that enable composition.
- Clarify alknet-napi as thin NAPI projection layer, not business logic.
- Fix bugs: ProtocolController → ProtocolHandler typo, OperationEnv
invoke() path format inconsistency, RateLimitConfig comment confusion.
- Update endpoint.md TLS section: comprehensive identity model comparison
table, RFC 7250 as default mode, ACME as proven pattern.
Add architecture specs for the alknet-call crate:
- call-protocol.md: CallAdapter, EventEnvelope wire format, bidirectional
stream model with ID-based correlation, PendingRequestMap, protocol
operations (call/subscribe/batch/schema), per-request identity resolution,
connection/stream lifecycle, error codes
- operation-registry.md: OperationSpec, async Handler type, OperationRegistry,
AccessControl with trusted call bypass, OperationEnv with context
propagation (parent_request_id, identity inheritance), service discovery,
irpc integration layering, naming convention (no leading slash in names)
- ADR-012: Call protocol uses bidirectional QUIC streams with EventEnvelope
framing and ID-based correlation. Protocol is stream-agnostic and symmetric.
Resolves OQ-07.
Key design decisions:
- Handler type is async (Fn returning Pin<Box<dyn Future>>)
- OperationEnv::invoke propagates parent context (identity, metadata,
parent_request_id)
- Identity resolution is per-request, not per-connection
- Operation names without leading slash (fs/readFile, not /fs/readFile)
- Batch is a client-side pattern, not a protocol primitive (OQ-14)
- Phase 1 uses service/op paths, node prefix added later (OQ-13)
Also: promote ADR-010 and ADR-011 from Proposed to Accepted, add OQ-13
and OQ-14 to open-questions.md.
iroh uses RFC 7250 raw Ed25519 public keys for TLS instead of X.509
certificates. rustls already supports this. This means the quinn
endpoint can also use raw public keys — same key-based identity model
as iroh, but with direct QUIC over UDP. X.509 is optional, needed
only for domain-facing identity (browser/WebTransport clients).
Update StaticConfig with TlsIdentity enum (X509, RawKey, SelfSigned)
and add iroh_relay field. Remove 'iroh deferred' language — iroh is
a first-class connectivity mode.
iroh's Endpoint natively supports ALPN negotiation and set_alpns(). Our
HandlerRegistry dispatches exactly like iroh's own ProtocolMap/Router
pattern, but shared across both quinn and iroh connection sources. We
use iroh::Endpoint directly (not iroh::Router) because our HandlerRegistry
and AuthContext are shared across sources.
Correct the conflation of quinn/TLS/iroh as interchangeable transports.
They are complementary connectivity modes serving different deployment
contexts: quinn (public IP + TLS), iroh (NAT traversal via relay), TCP
(handler-specific, not core). Clarify that TLS cert = network identity,
not auth identity. Map stealth mode to HTTP handler on standard ALPNs
instead of byte-peeking. Resolve OQ-05 as one-way door. SendStream/
RecvStream now use internal enum dispatch for both quinn and iroh
streams.
Rename the crate from alknet-secret to alknet-vault to better reflect its
purpose as a local key vault (seed management, key derivation, encryption)
rather than a network service.
Symbol renames:
- SecretService → VaultService
- SecretServiceHandle → VaultServiceHandle
- SecretServiceActor → VaultServiceActor
- SecretServiceError → VaultServiceError
- SecretProtocol → VaultProtocol
- SecretMessage → VaultMessage
- ServiceLocked → VaultLocked
- alknet_secret → alknet_vault (crate name)
Update ADR-008 with vault access pattern: the vault is a capability source,
not a service endpoint. The CLI injects derived/decrypted material into
operation contexts — handlers never hold vault references.