docs(architecture): untangle TLS identity use cases, remove phase framing, add ADR-013 Rust canonical + agent crate
- Rewrite OQ-12: separate two distinct TLS identity use cases (RFC 7250
raw keys as default for P2P, X.509 for domain-hosted/browsers) instead
of conflating them as 'file paths now, ACME later'. ACME is a proven
pattern from the reverse-proxy project, not speculative future work.
- Resolve OQ-13 and OQ-14: remove 'Phase 1' framing from core crate
specs. /{service}/{op} is the correct design for alknet-call, not a
simplification. Batch as correlated call.requested events is the correct
protocol design. Core crates need to be done right from the start.
- Add ADR-013: Rust as canonical implementation language. TypeScript
@alkdev/operations is a reference that informed the design, not a
parallel implementation. The only JS use case is browser SDK adaptation.
Five reasons: memory safety, LLM competence, supply chain attacks,
performance, browser-only JS.
- Add alknet-agent crate to the crate graph (depends on alknet-call, not
alknet-core). Agent service uses call protocol client for tool dispatch
and vault/derive for provider keys — no env vars for secrets. ALPN
alknet/agent added to the registry.
- Add OQ-15: call protocol client and adapter contract. alknet-call needs
both server (CallAdapter) and client (remote invocation over QUIC), plus
the adapter traits (from_*, to_*) that enable composition.
- Clarify alknet-napi as thin NAPI projection layer, not business logic.
- Fix bugs: ProtocolController → ProtocolHandler typo, OperationEnv
invoke() path format inconsistency, RateLimitConfig comment confusion.
- Update endpoint.md TLS section: comprehensive identity model comparison
table, RFC 7250 as default mode, ACME as proven pattern.
This commit is contained in:
@@ -33,8 +33,8 @@ Structured RPC over QUIC: operations, request/response, streaming subscriptions,
|
||||
| OQ | Title | Status | Relevance |
|
||||
|----|-------|--------|-----------|
|
||||
| OQ-07 | Call protocol scope within a connection | resolved (ADR-012) | Stream model, multiplexing, scope |
|
||||
| OQ-13 | Operation path format and routing scope | open | Namespace paths: `/{service}/{op}` vs `/{node}/{service}/{op}` |
|
||||
| OQ-14 | Batch operation semantics | open | Whether batch is a protocol primitive or client-side pattern |
|
||||
| OQ-13 | Operation path format and routing scope | resolved | `/{service}/{op}` is the correct design; remote dispatch is a separate layer |
|
||||
| OQ-14 | Batch operation semantics | resolved | Correlated `call.requested` events is the correct protocol design |
|
||||
|
||||
## Key Design Principles
|
||||
|
||||
@@ -43,4 +43,4 @@ Structured RPC over QUIC: operations, request/response, streaming subscriptions,
|
||||
3. **Stream-agnostic correlation**: PendingRequestMap correlates by request ID, not by stream. The protocol works with any stream arrangement.
|
||||
4. **Operation registry is dynamic**: Operations are registered at startup by the CLI binary. The registry supports JSON Schema discovery.
|
||||
5. **irpc is one dispatch backend**: Local operations dispatch directly. irpc service calls (vault, auth) are internal. The call protocol is the external interface.
|
||||
6. **Phase 1 is local dispatch only**: The operation registry dispatches to local handlers. Remote dispatch (head/worker routing) and irpc service dispatch are contracted but not built yet.
|
||||
6. **Local dispatch only**: The operation registry dispatches to local handlers. Remote dispatch (federation, head/worker routing) would be a separate mechanism at a different layer, not a modification to alknet-call's path format.
|
||||
@@ -85,7 +85,7 @@ The `Value` type is `serde_json::Value`. The envelope is JSON because it must be
|
||||
|
||||
Binary payloads (postcard, protobuf) are base64-encoded as a JSON string within the `payload` field. The convention is: if an operation's output schema specifies a binary field, the handler encodes it as a base64 string and the client decodes it. The `EventEnvelope` structure is not aware of this convention — it carries a `serde_json::Value` and does not interpret the payload. This is a handler-level concern, not a protocol-level concern.
|
||||
|
||||
This is the same framing used by irpc and by the `@alkdev/pubsub` TypeScript adapters. The wire format is identical — an `EventEnvelope` flowing from a Rust handler through core, out over a QUIC stream, can be consumed by a JavaScript `@alkdev/operations` call handler with zero translation at the wire level.
|
||||
This is the same framing used by irpc. The Rust implementation in alknet-call is canonical — the `@alkdev/pubsub` TypeScript adapters serve as a reference and browser adaptation, not a parallel implementation (see ADR-013).
|
||||
|
||||
### Event Types
|
||||
|
||||
@@ -115,7 +115,7 @@ The `id` field carries the `requestId` for correlation.
|
||||
}
|
||||
```
|
||||
|
||||
Error codes use an extensible string enum. Phase 1 defines the following codes:
|
||||
Error codes use an extensible string enum. The protocol defines the following codes:
|
||||
- `NOT_FOUND` — operation not in registry
|
||||
- `FORBIDDEN` — access denied (insufficient scopes or unauthenticated)
|
||||
- `INVALID_INPUT` — input doesn't match the operation's JSON Schema
|
||||
@@ -254,7 +254,7 @@ Local dispatch produces `ResponseEnvelope` with no serialization overhead. The `
|
||||
- Operation specs use JSON Schema. The envelope is always JSON. Binary payloads may be base64-encoded in the `payload` field.
|
||||
- Batch is not a protocol primitive — multiple `call.requested` events with correlated IDs provide equivalent semantics. See OQ-14.
|
||||
- The call protocol is transport-agnostic at the envelope level. The `EventEnvelope` framing can run over QUIC streams, WebSocket frames, or Worker `postMessage`. The `CallAdapter` is the QUIC-specific implementation.
|
||||
- Phase 1 is local dispatch only. The operation registry dispatches to handlers in the same process. Remote dispatch (head/worker routing) and irpc service dispatch are contracted but not built. See ADR-005 and OQ-13.
|
||||
- `OperationEnv::invoke()` dispatches through the local registry. Remote dispatch (federation, head/worker routing) would be a separate mechanism at a different layer. See ADR-005 and OQ-13.
|
||||
|
||||
## Design Decisions
|
||||
|
||||
@@ -267,8 +267,10 @@ Local dispatch produces `ResponseEnvelope` with no serialization overhead. The `
|
||||
|
||||
## Open Questions
|
||||
|
||||
- **OQ-13**: What is the operation path format for the alknet-call crate? The reference implementation used `/{node}/{service}/{op}` for head/worker routing. Phase 1 is single-node, so `/{service}/{op}` may be sufficient. The node prefix can be added later when remote dispatch is implemented.
|
||||
- **OQ-14**: Should batch be a distinct protocol primitive with its own event types, or is the "multiple call.requested with correlated IDs" pattern sufficient? The reference implementation treats batch as a client-side pattern. This is a two-way door — batch-specific event types can be added later without breaking existing clients.
|
||||
See [open-questions.md](../../open-questions.md) for full details.
|
||||
|
||||
- **OQ-13** (resolved): Operation path format is `/{service}/{op}`. Remote dispatch is a separate mechanism, not a path prefix.
|
||||
- **OQ-14** (resolved): Batch is a client-side pattern of correlated `call.requested` events, not a protocol primitive.
|
||||
|
||||
## References
|
||||
|
||||
|
||||
@@ -19,9 +19,9 @@ The operation registry provides:
|
||||
- **Discoverability**: Clients can query `/services/list` and `/services/schema` to learn what operations exist before calling them
|
||||
- **Access control**: Each operation declares its required scopes and resources; the registry enforces ACL before invoking the handler
|
||||
- **Type safety**: JSON Schema for input and output enables validation and client code generation
|
||||
- **Composability**: Handlers can invoke other operations through `OperationEnv` (local dispatch in Phase 1)
|
||||
- **Composability**: Handlers can invoke other operations through `OperationEnv` (local dispatch — remote dispatch is a separate architectural concern, see Constraints)
|
||||
|
||||
The registry design is derived from the `@alkdev/operations` TypeScript package, which provides the same capabilities in JavaScript runtimes. The Rust implementation preserves the behavioral contract: namespace + operation name → invoke with input, return output.
|
||||
The registry design is informed by the `@alkdev/operations` TypeScript package, which demonstrated the same capabilities in JavaScript runtimes. The Rust implementation in alknet-call is canonical — it preserves the behavioral contract (namespace + operation name → invoke with input, return output) while defining the adapter contract (from_*, to_*) in Rust (see ADR-013).
|
||||
|
||||
## Architecture
|
||||
|
||||
@@ -146,7 +146,7 @@ pub trait OperationEnv: Send + Sync {
|
||||
|
||||
The `parent` parameter propagates the calling context: the nested call gets `parent_request_id: Some(parent.request_id)`, inherits `parent.identity`, and is marked `trusted: true`.
|
||||
|
||||
**Phase 1: Local dispatch only.** The initial `OperationEnv` implementation dispatches directly through the local `OperationRegistry`:
|
||||
**Local dispatch only.** The initial `OperationEnv` implementation dispatches directly through the local `OperationRegistry`:
|
||||
|
||||
```rust
|
||||
pub struct LocalOperationEnv {
|
||||
@@ -156,7 +156,7 @@ pub struct LocalOperationEnv {
|
||||
#[async_trait]
|
||||
impl OperationEnv for LocalOperationEnv {
|
||||
async fn invoke(&self, namespace: &str, operation: &str, input: Value, parent: &OperationContext) -> ResponseEnvelope {
|
||||
let name = format!("/{namespace}/{operation}");
|
||||
let name = format!("{namespace}/{operation}");
|
||||
let context = OperationContext {
|
||||
request_id: format!("env-{name}"),
|
||||
parent_request_id: Some(parent.request_id.clone()),
|
||||
@@ -170,7 +170,7 @@ impl OperationEnv for LocalOperationEnv {
|
||||
}
|
||||
```
|
||||
|
||||
Future phases add irpc service dispatch and remote call protocol dispatch as additional backends. The handler-facing API stays the same.
|
||||
Future work may add irpc service dispatch and remote call protocol dispatch as additional backends. The handler-facing API stays the same.
|
||||
|
||||
### Service Discovery
|
||||
|
||||
@@ -239,7 +239,7 @@ The registry is immutable after construction. Adding operations requires restart
|
||||
|
||||
- The registry is immutable after construction. No runtime registration or deregistration. Two-way door — `ArcSwap<OperationRegistry>` can be added later.
|
||||
- Operation specs use JSON Schema. The call protocol's external interface is always JSON. irpc's postcard serialization is internal only.
|
||||
- Phase 1 is local dispatch only. `OperationEnv::invoke()` goes through the local registry. irpc service dispatch and remote call protocol dispatch are contracted but not built.
|
||||
- `OperationEnv::invoke()` dispatches through the local registry. Remote dispatch (federation, head/worker routing) would be a separate mechanism at a different layer — not a prefix added to operation paths. irpc service dispatch is contracted but not built.
|
||||
- The call protocol does not depend on any database. Operation specs are in-memory, populated at startup.
|
||||
- `OperationContext.trusted` is set by `OperationEnv`, not by callers. A handler cannot mark its own call as trusted.
|
||||
|
||||
@@ -254,8 +254,10 @@ The registry is immutable after construction. Adding operations requires restart
|
||||
|
||||
## Open Questions
|
||||
|
||||
- **OQ-13**: Operation path format — `/{service}/{op}` for Phase 1 (single-node), with the node prefix `/{node}/{service}/{op}` added when remote dispatch is implemented. Two-way door — the prefix can be added later without breaking existing operations.
|
||||
- **OQ-14**: Batch operation semantics — whether to add batch-specific event types or rely on the "multiple call.requested with correlated IDs" pattern. Two-way door — can be added later.
|
||||
See [open-questions.md](../../open-questions.md) for full details.
|
||||
|
||||
- **OQ-13** (resolved): Operation path format is `/{service}/{op}`. Remote dispatch is a separate mechanism, not a path prefix.
|
||||
- **OQ-14** (resolved): Batch is a client-side pattern of correlated `call.requested` events, not a protocol primitive.
|
||||
|
||||
## References
|
||||
|
||||
|
||||
@@ -51,7 +51,9 @@ pub enum TlsIdentity {
|
||||
|
||||
### Why `TlsIdentity` instead of `tls_cert`/`tls_key` options
|
||||
|
||||
The original `tls_cert: Option<PathBuf>` / `tls_key: Option<PathBuf>` assumed X.509 was the only TLS identity model. RFC 7250 raw public keys (used by iroh, supported by rustls) provide an alternative: Ed25519 key as identity, no X.509, no CA, no domain. This is a separate mode, not just "no cert."
|
||||
TLS identity in alknet has two distinct use cases, not one. The original `tls_cert: Option<PathBuf>` / `tls_key: Option<PathBuf>` assumed X.509 was the only TLS identity model. RFC 7250 raw public keys (used by iroh, supported by rustls) provide a fundamentally different mode: Ed25519 key as identity, no X.509, no CA, no domain. This is the default for most alknet nodes — it works natively with SSH auth and git. X.509 certs are for domain-hosted services and browser/WebTransport clients, which don't support RFC 7250.
|
||||
|
||||
The `TlsIdentity` enum captures both use cases plus a development mode. See OQ-12 for the full rationale.
|
||||
|
||||
### Key differences from reference implementation
|
||||
|
||||
@@ -70,10 +72,23 @@ The reference `StaticConfig` (in `alknet-main/crates/alknet-core/src/config/stat
|
||||
// The CLI binary constructs StaticConfig from its own options/config.
|
||||
// StartupOptions is NOT a core type — it belongs to the alknet CLI binary.
|
||||
// alknet-core receives a fully populated StaticConfig.
|
||||
let static_config = StaticConfig {
|
||||
listen_addr: "0.0.0.0:4433".parse()?,
|
||||
tls_cert: Some("/path/to/cert.pem".into()),
|
||||
tls_key: Some("/path/to/key.pem".into()),
|
||||
|
||||
// P2P / key-based identity (default for most nodes)
|
||||
let p2p_config = StaticConfig {
|
||||
listen_addr: Some("0.0.0.0:4433".parse()?),
|
||||
tls_identity: Some(TlsIdentity::RawKey(SecretKey::generate())),
|
||||
iroh_relay: None,
|
||||
drain_timeout: Duration::from_secs(2),
|
||||
};
|
||||
|
||||
// Domain-hosted service (relays, public services, browsers)
|
||||
let domain_config = StaticConfig {
|
||||
listen_addr: Some("0.0.0.0:4433".parse()?),
|
||||
tls_identity: Some(TlsIdentity::X509 {
|
||||
cert: "/path/to/cert.pem".into(),
|
||||
key: "/path/to/key.pem".into(),
|
||||
}),
|
||||
iroh_relay: None,
|
||||
drain_timeout: Duration::from_secs(2),
|
||||
};
|
||||
```
|
||||
@@ -148,12 +163,7 @@ pub struct RateLimitConfig {
|
||||
}
|
||||
```
|
||||
|
||||
Carries forward from the reference implementation. Note: `max_connections_per_ip` and `max_auth_attempts` appear in both `StaticConfig` and `RateLimitConfig`. The relationship is:
|
||||
|
||||
- `StaticConfig` does NOT contain rate limit fields. Rate limits are entirely dynamic.
|
||||
- `RateLimitConfig` in `DynamicConfig` is the authoritative source at runtime.
|
||||
- The CLI binary sets initial `RateLimitConfig` values when creating the initial `DynamicConfig`.
|
||||
- Hot-reloading `DynamicConfig` via `ConfigReloadHandle` replaces rate limits immediately — no restart needed.
|
||||
Carries forward from the reference implementation. Rate limits are entirely dynamic — `StaticConfig` does not contain rate limit fields. The CLI binary sets initial `RateLimitConfig` values when constructing the initial `DynamicConfig`. Hot-reloading via `ConfigReloadHandle` replaces rate limits immediately without restart.
|
||||
|
||||
## ArcSwap Pattern
|
||||
|
||||
|
||||
@@ -54,7 +54,7 @@ pub struct HandlerRegistry {
|
||||
|
||||
impl HandlerRegistry {
|
||||
pub fn new() -> Self;
|
||||
pub fn register(&mut self, handler: Arc<dyn ProtocolController>);
|
||||
pub fn register(&mut self, handler: Arc<dyn ProtocolHandler>);
|
||||
pub fn get(&self, alpn: &[u8]) -> Option<&Arc<dyn ProtocolHandler>>;
|
||||
pub fn alpn_strings(&self) -> Vec<Vec<u8>>;
|
||||
}
|
||||
@@ -170,34 +170,58 @@ This matches the reference implementation: the TLS cert encrypts and camouflages
|
||||
|
||||
## RFC 7250: Raw Public Keys in TLS
|
||||
|
||||
iroh uses RFC 7250 raw public keys instead of X.509 certificates for TLS. The implementation is ~100 lines (see `iroh/iroh/src/tls/resolver.rs`): take an Ed25519 key, wrap its SPKI public key as a `CertificateDer`, tell rustls `only_raw_public_keys() -> true`. No X.509, no CAs, no domain names, no cert renewal.
|
||||
RFC 7250 raw public keys are the **default TLS identity mode** for most alknet nodes. They eliminate the need for domain names, CAs, and certificate renewal — the Ed25519 public key IS the node's identity.
|
||||
|
||||
rustls already supports RFC 7250. This means the quinn endpoint can also use raw Ed25519 public keys instead of X.509 certs:
|
||||
iroh uses this model with its `NodeId`. The implementation is ~100 lines (see `iroh/iroh/src/tls/resolver.rs`): take an Ed25519 key, wrap its SPKI public key as a `CertificateDer`, tell rustls `only_raw_public_keys() -> true`. No X.509, no CAs, no domain names, no cert renewal.
|
||||
|
||||
- **No domain required**: A node without a domain name can use raw public keys for the quinn path — key-based identity, but with direct QUIC over UDP instead of relay-assisted connections.
|
||||
Key implications:
|
||||
|
||||
- **Default for alknet-native clients**: SSH, git, and alknet-native clients all work with raw Ed25519 keys out of the box. The same key type used for SSH auth can serve as the TLS identity. This is the most common deployment mode.
|
||||
- **No domain required**: A node without a domain name uses raw public keys for the quinn path — key-based identity with direct QUIC over UDP.
|
||||
- **Key = identity**: The Ed25519 public key IS the node's identity. No CA trust chain, no cert expiry. The key can be derived from alknet-vault.
|
||||
- **X.509 is optional**: Domain-facing identity (replicators, public services) uses X.509 certs. Key-based identity (personal nodes, P2P) uses raw public keys. Both work with the same quinn endpoint.
|
||||
- **Browser limitation**: Browsers don't support RFC 7250. For browser/WebTransport clients, X.509 certs are needed. For alknet-native clients, raw public keys work fine.
|
||||
- **X.509 is for domain-hosted services**: Domain-facing identity (replicators, public services, browsers) uses X.509 certs. This is a separate use case, not the default.
|
||||
- **Browser limitation**: Browsers don't support RFC 7250. For browser/WebTransport clients, X.509 certs are needed. For all other clients, raw public keys work fine.
|
||||
|
||||
This reframes the connectivity model. The quinn and iroh paths share the same key-based identity model via RFC 7250. They're distinguished by **connection establishment** (direct UDP vs relay-assisted), not by identity:
|
||||
The quinn and iroh paths share the same key-based identity model via RFC 7250. They're distinguished by **connection establishment** (direct UDP vs relay-assisted), not by identity:
|
||||
|
||||
| Path | Connection establishment | Identity (domain-facing) | Identity (key-facing) |
|
||||
|------|------------------------|------------------------|---------------------|
|
||||
| quinn | Direct UDP, public IP | X.509 cert (domain name) | RFC 7250 raw key |
|
||||
| iroh | Relay-assisted P2P | N/A | RFC 7250 raw key (NodeId) |
|
||||
| Path | Connection establishment | Default identity | Alternative identity |
|
||||
|------|------------------------|-----------------|---------------------|
|
||||
| quinn | Direct UDP, public IP | RFC 7250 raw key (most nodes) | X.509 cert (domain-hosted, browsers) |
|
||||
| iroh | Relay-assisted P2P | RFC 7250 raw key (NodeId) | N/A |
|
||||
|
||||
## TLS Certificate Provisioning
|
||||
## TLS Identity
|
||||
|
||||
For the quinn endpoint, `StaticConfig` provides TLS configuration via file paths:
|
||||
TLS identity in alknet has two distinct use cases, each with a different trust model and provisioning mechanism. See OQ-12 for the full rationale.
|
||||
|
||||
- **Manual**: `tls_cert` and `tls_key` file paths. Required for production use.
|
||||
- **Self-signed**: For development. The endpoint can generate a self-signed cert on startup.
|
||||
### Use case 1: P2P / Key-based identity (default)
|
||||
|
||||
The `rustls::ServerConfig` is built from cert + key + ALPN list at startup.
|
||||
Most alknet nodes use RFC 7250 raw Ed25519 public keys for TLS identity. No domain name, no CA, no certificate renewal. The Ed25519 public key IS the node's identity — the same key model as iroh's `NodeId`, but for direct QUIC connections.
|
||||
|
||||
ACME auto-provisioning (Let's Encrypt) is not in scope for v1. It will be added as a feature later (see OQ-12).
|
||||
`TlsIdentity::RawKey` in `StaticConfig` configures this mode. The endpoint builds a `rustls::ServerConfig` with `only_raw_public_keys() -> true` and a `ResolvesServerCert` that generates the certificate on-the-fly from the key, exactly as iroh does (see `iroh/iroh/src/tls/resolver.rs`).
|
||||
|
||||
The iroh endpoint does not need TLS certs — it uses `NodeId` for identity.
|
||||
This mode works natively with SSH auth (same key type) and git (SSH key-based auth). It is the default for alknet-native clients. **Browser/WebTransport clients do not support RFC 7250** — they require X.509 certificates.
|
||||
|
||||
### Use case 2: Domain-hosted services
|
||||
|
||||
Nodes that serve browser/WebTransport clients, or nodes with public domain names, use X.509 certificates. This has two sub-cases:
|
||||
|
||||
- **Manual**: Provide cert/key file paths via `TlsIdentity::X509`. The endpoint loads them at startup and builds a standard `rustls::ServerConfig`.
|
||||
- **ACME auto-provisioning**: Let's Encrypt via `rustls-acme`. The reverse-proxy project (`/workspace/@alkdev/reverse-proxy`) demonstrates the complete pattern: per-listener ACME state machine, `ResolvesServerCertAcme` rustls integration, TLS-ALPN-01 challenge handling, automatic renewal. This is a proven, solved implementation pattern. It will be adapted to alknet's `AlknetEndpoint` context as an additional `TlsIdentity` variant or `ResolvesServerCert` implementation.
|
||||
|
||||
`TlsIdentity::SelfSigned` is for development only — the endpoint generates a self-signed cert on startup. External clients will not trust it.
|
||||
|
||||
### iroh endpoint identity
|
||||
|
||||
The iroh endpoint does not need TLS certificate configuration — it uses `NodeId` (Ed25519) for identity, which is RFC 7250 raw key identity built into the iroh endpoint.
|
||||
|
||||
### Identity model comparison
|
||||
|
||||
| Path | Identity model | Client compatibility | Use case |
|
||||
|------|---------------|---------------------|----------|
|
||||
| quinn + `TlsIdentity::RawKey` | RFC 7250 Ed25519 raw key | alknet-native, SSH, git | Personal nodes, P2P, most deployments |
|
||||
| quinn + `TlsIdentity::X509` | X.509 domain certificate | All clients including browsers | Relays, public services, WebTransport |
|
||||
| quinn + `TlsIdentity::SelfSigned` | X.509 self-signed cert | None (dev only) | Local development |
|
||||
| iroh | NodeId (Ed25519, RFC 7250 built-in) | alknet-native, iroh clients | NAT traversal, home servers |
|
||||
|
||||
## Graceful Shutdown
|
||||
|
||||
@@ -270,4 +294,4 @@ See [open-questions.md](../../open-questions.md) for full details.
|
||||
|
||||
- **OQ-04**: Resolved — HandlerRegistry is static at startup.
|
||||
- **OQ-05**: Resolved — multi-connectivity endpoint with quinn + iroh, both feature-gated.
|
||||
- **OQ-12**: Resolved — start with file paths in StaticConfig, add ACME later.
|
||||
- **OQ-12**: Resolved — two distinct TLS identity use cases: RFC 7250 raw keys (default, P2P) and X.509 certs (domain-hosted, browsers). ACME is a proven pattern from the reverse-proxy project, not speculative future work.
|
||||
Reference in New Issue
Block a user