docs(architecture): add Phase 0 architecture specs for ALPN-as-service model

Foundational architecture documents following the SDD process:

ADRs:
- 001: ALPN-based protocol dispatch (one endpoint, ALPN negotiation)
- 002: ProtocolHandler trait (replaces StreamInterface/MessageInterface)
- 003: Crate decomposition (one crate per handler, core provides shared infra)
- 004: Auth as shared core (IdentityProvider, hybrid resolution model)
- 005: irpc as call protocol foundation
- 006: ALPN string convention and connection model (alknet/ prefix, one ALPN per connection)

Docs:
- overview.md: crate graph, shared types, ALPN registry, failure modes
- README.md: index with doc table, ADR table, lifecycle definitions
- open-questions.md: 10 OQs across 7 themes (3 resolved, 7 open)

Crate spec stubs for all 11 planned crates (alknet-core through alknet CLI).

Key decisions resolved during self-review:
- AuthContext resolution is hybrid: endpoint resolves TLS-level auth,
  handlers resolve protocol-level auth (resolves OQ-02)
- ALPN is per-connection not per-stream, corrected ADR-001 (resolves OQ-06)
- ALPN naming uses alknet/ prefix without versions (resolves OQ-03)
- HandlerError return type on ProtocolHandler trait
- alknet/secret removed from ALPN registry until OQ-08 resolved
This commit is contained in:
2026-06-15 22:14:58 +00:00
parent b5a4600d74
commit f77b515968
20 changed files with 1017 additions and 0 deletions

View File

@@ -0,0 +1,46 @@
# ADR-001: ALPN-Based Protocol Dispatch
## Status
Accepted
## Context
The previous architecture used a three-layer model: transports produced byte streams, interfaces defined how to interpret those streams (StreamInterface, MessageInterface), and OperationEnv dispatched operations through local, irpc, or remote paths. This required a ListenerConfig enum with three variants (Stream, Http, Dns), a server accept loop handling three different listener types, and a complex dispatch model that mixed concerns across layers.
Protocol detection was done by byte-peeking — the server read the first bytes of an incoming connection and guessed which protocol the client was speaking. This is fragile, limits protocol extensibility, and cannot work with encrypted transports where the payload is opaque.
ALPN (Application-Layer Protocol Negotiation) is a TLS extension where the client advertises supported protocols during the handshake and the server selects one. QUIC builds on this natively — every QUIC connection has an ALPN. This is the same pattern iroh uses: `Router` dispatches incoming QUIC connections to `ProtocolHandler` implementations based on the ALPN string. Hickory DNS registers ALPN protocols (`dot`, `doq`, `h2`, `h3`). The reverse-proxy project at `@alkdev/reverse-proxy` uses the same pattern for TLS.
The core insight: **a service IS an ALPN**. Every protocol handler registers an ALPN string on a shared QUIC+TLS endpoint. The ALPN negotiation during the handshake routes the connection to the correct handler before any application bytes are read.
## Decision
All protocol dispatch in alknet is ALPN-based. A single QUIC+TLS endpoint accepts connections, and the ALPN string selected during the handshake determines which `ProtocolHandler` receives the connection. There is no byte-peeking, no ListenerConfig enum, and no three-layer dispatch model.
The endpoint advertises the union of all registered handlers' ALPN strings. When a client connects, the TLS/QUIC handshake negotiates the ALPN. If the client's offered ALPNs and the server's advertised ALPNs have no intersection, the handshake fails — this is the correct behavior, not an error to work around.
## Consequences
**Positive:**
- Single dispatch mechanism replaces three separate listener types
- Protocol detection happens at the TLS layer, not application layer — no byte-peeking
- Adding a new protocol is registering a new ALPN string — no server code changes
- Each handler owns its entire wire format — no shared framing layer
- QUIC connections are cheap — a client that needs multiple protocols opens one connection per ALPN, all multiplexed over the same UDP flow
- Stealth mode (byte-peek protocol detection on port 443) is unnecessary — ALPN negotiation handles this cleanly
- WASM story is clean: handlers receive byte streams, protocol parsers that operate on bytes compile to WASM
**Negative:**
- ALPN is negotiated per-connection, not per-stream — a client that wants to use multiple ALPNs (e.g., SSH and call protocol) opens separate QUIC connections for each. QUIC connections are cheap (multiplexed over the same UDP flow), so this is acceptable, but it means `alknet/call` cannot serve as a multiplexer for other ALPNs within a single connection unless explicitly designed to do so (see ADR-006).
- All protocols must be registered at endpoint creation time (or use hot-reload via ArcSwap for dynamic addition)
- Custom protocols require reserving ALPN strings — we own the `alknet/` namespace
- Debugging requires knowing which ALPN was negotiated (mitigated by logging at the endpoint level)
## References
- Pivot proposal: `docs/research/pivot/alpn-service-architecture.md`
- ADR-002: ProtocolHandler trait
- ADR-003: Crate decomposition
- iroh reference: `docs/research/references/iroh/` (ALPN dispatch, ProtocolHandler pattern)
- Replaces the old three-layer model (StreamInterface/MessageInterface/OperationEnv)

View File

@@ -0,0 +1,57 @@
# ADR-002: ProtocolHandler Trait
## Status
Accepted
## Context
The previous architecture had two separate interface traits: `StreamInterface` (for byte-stream protocols like SSH, raw TCP) and `MessageInterface` (for message-based protocols like DNS, HTTP). This split created complexity — each interface type needed its own listener configuration, its own dispatch path, and its own framing assumptions. The `ListenerConfig` enum had three variants. The server accept loop handled three different listener types.
In practice, the distinction between "stream" and "message" protocols is artificial at the handler level. SSH starts as a byte stream but internally multiplexes channels and messages. DNS over QUIC is message-based but arrives as a stream of frames. HTTP/2 is both — bidirectional streams with message semantics. Every protocol can be modeled as "receive a byte stream, manage your own wire format."
iroh's `ProtocolHandler` trait demonstrates this: it takes a bidirectional QUIC stream and the handler is responsible for its own protocol. One trait, one dispatch point.
## Decision
A single `ProtocolHandler` trait replaces both `StreamInterface` and `MessageInterface`:
```rust
#[async_trait]
pub trait ProtocolHandler: Send + Sync + 'static {
/// The ALPN string this handler claims (e.g. b"alknet/ssh")
fn alpn(&self) -> &'static [u8];
/// Handle an incoming bidirectional QUIC stream
async fn handle(&self, stream: BiStream, auth: &AuthContext) -> Result<(), HandlerError>;
}
```
- `alpn()` returns a static byte string — the handler's ALPN identifier
- `handle()` receives a `BiStream` (a joined `(SendStream, RecvStream)` implementing `AsyncRead + AsyncWrite`) and an `AuthContext` carrying the authenticated identity, and returns `HandlerError` on failure
- Every handler manages its own wire format — no shared framing, no StreamInterface/MessageInterface split
- The `ListenerConfig` enum is eliminated — ALPN advertisement configuration replaces it
**AuthContext resolution is hybrid** (see ADR-004, OQ-02 resolution): the endpoint resolves what it can before calling `handle()` (e.g., TLS client certificate fingerprint), and the handler resolves what it must inside `handle()` (e.g., AuthToken in the first frame of a call stream). The `AuthContext` passed to `handle()` may contain partial identity information — the handler is responsible for completing authentication if the endpoint didn't have enough information.
## Consequences
**Positive:**
- One trait, one dispatch point — eliminates the StreamInterface/MessageInterface split and ListenerConfig enum
- Each handler owns its wire format — no shared framing assumptions that constrain protocol design
- Adding a new protocol is implementing one trait with two methods
- Testable in isolation — give a handler a mock BiStream and AuthContext
- WASM-compatible in principle — handlers that don't need tokio runtime features compile to WASM
**Negative:**
- Every handler must implement its own framing — no shared "read a length-prefixed message" utility (mitigated: common utilities can live in alknet-core without mandating their use)
- Handlers that want message semantics must build them (mitigated: alknet-call provides this as a handler, not a mandatory layer)
- AuthContext resolution is hybrid — the endpoint resolves what it can (TLS-level auth), but handlers that need protocol-level credential extraction must do so inside handle(). This means AuthContext may be partial when handle() is called. Handlers must not assume AuthContext is fully resolved.
## References
- Pivot proposal: `docs/research/pivot/alpn-service-architecture.md`
- ADR-001: ALPN-based protocol dispatch
- ADR-004: Auth as shared core (IdentityProvider)
- iroh ProtocolHandler pattern: `docs/research/references/iroh/`
- Replaces StreamInterface, MessageInterface, and ListenerConfig

View File

@@ -0,0 +1,68 @@
# ADR-003: Crate Decomposition
## Status
Accepted
## Context
The previous alknet-core crate was a monolith containing transport, interface, server, client, call, auth, config, socks5, credentials, and HTTP — all in one crate with interdependent modules. This created coupling (interface types depended on auth, server depended on call, everything depended on config) and made it impossible to use individual components independently.
The new ALPN dispatch model eliminates the need for a shared interface layer. Each handler is self-contained — it receives a byte stream and manages its own protocol. This naturally decomposes into separate crates.
Key constraints:
- Protocol crates must depend on alknet-core for auth/identity/config — but not on each other
- alknet-secret is already standalone (no alknet-core dependency) and must remain so
- The CLI binary assembles everything — it's the only crate that depends on all handler crates
- Some handlers (SFTP, call protocol) need to compile to WASM for browser/client use
- irpc is the foundation for the call protocol — it provides the operation registry, framing, and pub/sub patterns
## Decision
The workspace decomposes into the following crates:
| Crate | Responsibility | Depends on |
|-------|---------------|------------|
| `alknet-core` | ProtocolHandler trait, ALPN router, endpoint, BiStream, AuthContext, IdentityProvider, config, ArcSwap dynamic config | tokio, quinn, rustls, irpc |
| `alknet-secret` | BIP39/SLIP-0010/AES-GCM key derivation and encryption, SecretProtocol service | (standalone, no alknet-core) |
| `alknet-ssh` | SshAdapter (russh, SOCKS5, port forwarding) | alknet-core, russh |
| `alknet-call` | CallAdapter (JSON-RPC via irpc, operation registry, pub/sub, access control) | alknet-core, irpc |
| `alknet-git` | GitAdapter (gix, pkt-line protocol) | alknet-core, gix |
| `alknet-sftp` | SftpAdapter (russh-sftp protocol core) | alknet-core, russh-sftp |
| `alknet-msg` | MessageAdapter (E2E encryption, mixnet) | alknet-core |
| `alknet-http` | HttpAdapter (axum, REST API, MCP endpoint) | alknet-core, axum |
| `alknet-dns` | DnsAdapter (hickory-proto, pkarr, service discovery) | alknet-core, hickory-proto |
| `alknet-napi` | Node.js native addon (call protocol client) | alknet-call, napi-rs |
| `alknet` | CLI binary — registers handlers, starts endpoint | all handler crates |
Dependency flow:
```
alknet-secret (standalone)
alknet-core ← all handler crates ← alknet (CLI)
alknet-call ← alknet-napi
```
No handler crate depends on another handler crate. Cross-handler communication goes through the call protocol (alknet-call) or through alknet-core's endpoint.
## Consequences
**Positive:**
- Each handler can be developed, tested, and versioned independently
- WASM-compatible handlers (sftp, call) don't pull in heavy dependencies (russh, axum)
- alknet-secret remains standalone — no circular dependency risk
- New handlers are added by creating a crate and registering it with the endpoint
- Clean separation of concerns — each crate has one job
**Negative:**
- More crates to manage in the workspace — workspace Cargo.toml and version coordination
- Shared types (AuthContext, BiStream) must live in alknet-core — if they change, all handlers recompile
- The CLI binary has a large dependency tree (all handlers) — but this is expected for a binary that assembles everything
- Testing cross-handler behavior requires integration tests in the CLI or a test utility crate
## References
- Pivot proposal: `docs/research/pivot/alpn-service-architecture.md`
- ADR-001: ALPN-based protocol dispatch
- ADR-002: ProtocolHandler trait
- ADR-004: Auth as shared core (IdentityProvider)
- ADR-005: irpc as call protocol foundation

View File

@@ -0,0 +1,69 @@
# ADR-004: Auth as Shared Core (IdentityProvider)
## Status
Accepted
## Context
The previous architecture had authentication spread across multiple layers: `CredentialProvider` with four phases (AD), `AuthProtocol` as an irpc service, `server_auth` and `client_auth` as separate modules, and `IdentityProvider` as a trait in alknet-core. Different interface types presented credentials differently — SSH used key fingerprints, HTTP used Bearer tokens, DNS used query labels — but the resolution was ad-hoc and tied to the three-layer model.
The ALPN dispatch model simplifies this: every handler receives the same `AuthContext`, but the credential extraction (how a handler learns who the peer is) differs per ALPN. The resolution (turning a credential into an `Identity`) should be shared across all handlers.
## Decision
Authentication and identity resolution live in `alknet-core` as shared infrastructure. Each handler presents credentials differently, but all resolve through the same `IdentityProvider`:
```rust
pub trait IdentityProvider: Send + Sync + 'static {
fn resolve_from_fingerprint(&self, fingerprint: &str) -> Option<Identity>;
fn resolve_from_token(&self, token: &AuthToken) -> Option<Identity>;
}
```
Credential presentation per handler:
| Handler | Credential presentation | Resolves via |
|---------|------------------------|-------------|
| SshAdapter | SSH public key handshake | `resolve_from_fingerprint()` |
| CallAdapter | AuthToken in first frame | `resolve_from_token()` |
| HttpAdapter | `Authorization: Bearer` header | `resolve_from_token()` |
| DnsAdapter | AuthToken in query labels | `resolve_from_token()` |
| WebTransportAdapter | AuthToken in CONNECT headers | `resolve_from_token()` |
| GitAdapter | Signed push certificate | `resolve_from_fingerprint()` |
Auth resolution is **hybrid** — the endpoint resolves what it can, and handlers resolve what they must:
1. **Endpoint-level resolution** (before `handle()` is called): If the TLS handshake provides a client certificate, the endpoint resolves the fingerprint to an `Identity` and passes it in `AuthContext`. This is the case for SSH (where the key exchange happens at the protocol level, but the TLS layer may also provide information).
2. **Handler-level resolution** (inside `handle()`): For protocols that carry credentials in application frames (AuthToken in the first call frame, Bearer header in HTTP), the handler extracts the credential from the stream and calls `IdentityProvider` to resolve it. The handler then enriches or replaces the partial `AuthContext` with the fully resolved `Identity`.
The `AuthContext` passed to `handle()` may be partial — containing only transport-level information if no TLS client certificate was provided. Handlers must not assume `AuthContext` contains a fully resolved `Identity`. Each handler knows its own credential extraction protocol and is responsible for completing authentication.
The `CredentialProvider` concept from the previous architecture is simplified: there is no phase progression (AD). The `IdentityProvider` has two resolution paths — fingerprint and token — and a `ConfigIdentityProvider` implementation that draws from static and dynamic config.
`alknet-secret` remains independent. It does not depend on `alknet-core` or `IdentityProvider`. The secret service provides derived keys on request; identity resolution is a separate concern.
## Consequences
**Positive:**
- Unified identity model — every handler resolves identities the same way through `IdentityProvider`
- Handlers own their credential extraction — SSH reads key fingerprints, call reads AuthTokens, HTTP reads Bearer headers
- Endpoint provides what it can for free (TLS-level auth), handlers complete what they need
- Adding a new credential type is adding a method to `IdentityProvider`, not a new phase
- alknet-secret stays standalone — no coupling between key derivation and identity resolution
- `AuthContext` is a value type — easy to construct in tests, can be partial for handler-level testing
**Negative:**
- `IdentityProvider` is in alknet-core — any change to it recompiles all handlers (mitigated: the trait should be stable; implementation changes don't force recompiles)
- Two resolution paths (fingerprint, token) may not cover all future auth schemes (mitigated: the trait can be extended, or a handler can do custom resolution after the initial AuthContext)
- Handlers must handle partial AuthContext — the endpoint may not have resolved an Identity, so handlers must be prepared to do credential extraction themselves
- WebTransport and browser-based auth needs careful design — AuthToken in CONNECT headers requires the token to be available before the stream is established
## References
- Pivot proposal: `docs/research/pivot/alpn-service-architecture.md`
- ADR-002: ProtocolHandler trait
- ADR-003: Crate decomposition
- ADR-005: irpc as call protocol foundation
- The previous architecture had equivalent decisions in ADR-023 (unified auth) and ADR-029 (identity as core type), which are archived in the reference implementation at `/workspace/@alkdev/alknet-main/`.

View File

@@ -0,0 +1,56 @@
# ADR-005: irpc as Call Protocol Foundation
## Status
Accepted
## Context
The call protocol (alknet-call) provides structured RPC — operations, request/response, streaming subscriptions, and pub/sub. This is the primary interface for programmatic interaction with an alknet node. It needs to work across platforms: Rust clients, TypeScript/JavaScript clients (via NAPI), WASM targets, and any language that can speak the wire format.
The previous implementation used `irpc` for the call protocol's operation registry, framing, and service patterns. irpc provides:
- An operation registry with schema-based discovery
- Length-prefixed JSON framing (EventEnvelope)
- Request/response and streaming patterns
- Type-safe operation definitions via derive macros
The call protocol is derived from a TypeScript implementation of "operations" and "pub/sub" that can wholesale import OpenAPI schemas, wrap MCP servers, and go the other direction — exposing operations as HTTP endpoints, MCP tools, etc. This bidirectional capability is strategically important.
## Decision
alknet-call uses irpc as its foundation. The `CallAdapter` implements `ProtocolHandler` on ALPN `alknet/call` and delegates to irpc's operation registry, framing, and dispatch.
irpc is not replaced or wrapped in an abstraction layer — it IS the call protocol's core. The relationship is:
- irpc provides: operation registry, schema discovery, frame encoding/decoding, request/response routing, streaming
- alknet-call provides: the ProtocolHandler adapter (BiStream → irpc), AuthContext integration, access control checks, the ALPN registration
This means:
- The wire format is irpc's EventEnvelope framing — length-prefixed JSON
- Operation schemas follow irpc's schema model — JSON Schema compatible
- The TypeScript "operations" and "pub/sub" patterns that can import OpenAPI schemas and expose MCP tools are supported at the protocol level
- Future NAPI and WASM clients speak the same wire format
The `SecretProtocol` in alknet-secret also uses irpc as its service protocol. This is consistent — alknet-secret's irpc service is an independent service that happens to use the same framing, not a dependency on alknet-call.
## Consequences
**Positive:**
- Proven operation registry and framing — irpc is already tested in production (iroh uses it)
- JSON Schema compatible — OpenAPI import, MCP tool exposure, cross-language client generation
- No need to design a custom RPC wire format — irpc's is already battle-tested
- The call protocol inherits irpc's streaming and subscription patterns
- Consistency with alknet-secret's service model — both use irpc
**Negative:**
- alknet-call depends on irpc — if irpc has limitations or bugs, we're affected (mitigated: irpc is lightweight and we can fork if needed)
- JSON framing is not the most compact binary format — for high-throughput scenarios, a binary codec could be added later as an irpc extension
- irpc's derive macros add a compilation dependency — but this is standard for Rust RPC frameworks
- The call protocol's cross-language story depends on irpc's wire format being documented and stable (mitigated: it's length-prefixed JSON, which is inherently cross-language)
## References
- Pivot proposal: `docs/research/pivot/alpn-service-architecture.md`
- ADR-003: Crate decomposition
- ADR-004: Auth as shared core (IdentityProvider)
- irpc reference: `docs/research/references/iroh/irpc/` (see individual docs in that directory)
- The previous architecture had an equivalent decision in ADR-024 (bidirectional call protocol with EventEnvelope framing), which is archived in the reference implementation at `/workspace/@alkdev/alknet-main/`.

View File

@@ -0,0 +1,71 @@
# ADR-006: ALPN String Convention and Connection Model
## Status
Accepted
## Context
ADR-001 establishes ALPN-based protocol dispatch. Two questions arise:
1. **ALPN string naming**: What format do custom ALPN strings follow? Should they include version numbers? How do standard ALPNs (`h2`, `http/1.1`, `h3`) coexist with custom ones?
2. **Connection model**: ALPN is negotiated per-connection in QUIC/TLS, not per-stream. A client that wants to speak both SSH and call protocol must open two separate QUIC connections, each with its own ALPN. This is different from the claim in earlier drafts that "a single connection can carry multiple protocols via additional streams" — it cannot. However, QUIC connections are cheap (multiplexed over the same UDP flow), so opening multiple connections is acceptable.
The iroh reference project uses the same model: each `ProtocolHandler` claims an ALPN, and each incoming connection is dispatched to exactly one handler based on the negotiated ALPN.
## Decision
### ALPN String Convention
Custom ALPN strings use the `alknet/` prefix:
| ALPN | Handler | Type |
|------|---------|------|
| `alknet/ssh` | SshAdapter | Custom |
| `alknet/call` | CallAdapter | Custom |
| `alknet/git` | GitAdapter | Custom |
| `alknet/sftp` | SftpAdapter | Custom |
| `alknet/msg` | MessageAdapter | Custom |
| `alknet/http` | HttpAdapter | Custom |
| `alknet/dns` | DnsAdapter | Custom |
| `h3` | WebTransport → alknet/http | Standard (IANA) |
| `h2` | HTTP/2 → alknet/http | Standard (IANA) |
| `http/1.1` | HTTP/1.1 → alknet/http | Standard (IANA) |
Rules:
- Custom ALPNs use the format `alknet/<name>` — lowercase, no version number
- Standard ALPNs (`h2`, `http/1.1`, `h3`) use their IANA-registered strings and are handled by the HTTP adapter
- No version numbers in ALPN strings initially. If protocol compatibility breaks, a new ALPN string is registered (e.g., `alknet/call/v2`). This is simpler than version negotiation and follows the QUIC convention that ALPN mismatch means connection failure
- ALPN strings are compile-time constants in each handler's `alpn()` method — no runtime registration of new ALPN strings
### Connection Model
**One ALPN per connection.** A client that wants to use multiple ALPNs opens one QUIC connection per ALPN. All connections from the same client are multiplexed over the same UDP flow (QUIC's natural connection multiplexing), so the overhead is minimal.
This means:
- `alknet/call` is a distinct ALPN with its own connection — not a multiplexer for other ALPNs
- A client interacting with both SSH and call protocol has two QUIC connections
- Within an `alknet/call` connection, multiple QUIC streams can carry independent operations (see ADR-005)
- The endpoint logs the negotiated ALPN for each connection for observability
## Consequences
**Positive:**
- Simple model: one connection, one protocol — no multiplexing layer needed inside a connection
- ALPN strings are predictable and discoverable — `alknet/<name>` is a clear namespace
- No version negotiation complexity — incompatible versions get new ALPN strings
- QUIC connection multiplexing means multiple ALPN connections share the same UDP flow
**Negative:**
- Multiple ALPNs require multiple connections — a full-featured client might have 3-5 QUIC connections open simultaneously
- No version negotiation — an incompatible change requires a new ALPN string, which means old and new clients can coexist only if the server registers both ALPNs
- The `alknet/` namespace is owned by this project — third-party extensions need their own prefix
## References
- ADR-001: ALPN-based protocol dispatch
- ADR-002: ProtocolHandler trait
- OQ-03: ALPN string naming convention (resolved by this ADR)
- OQ-06: Server-side ALPN vs client-side ALPN (resolved by this ADR)
- iroh reference: `docs/research/references/iroh/`