Files
alknet/docs/architecture/decisions/028-auth-irpc-service.md
glm-5.1 19b3d3a078 docs: write Phase 0 architecture foundation — ADRs 026-034, spec docs, and task updates
Phase 0a — ADRs (9 new):
- ADR-026: Transport/interface separation (three-layer model)
- ADR-027: Crate decomposition (core, secret, storage, flowgraph, napi, CLI)
- ADR-028: Auth as irpc service (AuthProtocol behind feature flag)
- ADR-029: Identity as core type (Identity + IdentityProvider in alknet-core)
- ADR-030: Static/dynamic config split (ArcSwap, ConfigReloadHandle)
- ADR-031: Forwarding policy (rule-based allow/deny, TransportKind-aware)
- ADR-032: Event boundary discipline (domain, irpc, call protocol boundaries)
- ADR-033: OperationEnv universal composition (three dispatch paths)
- ADR-034: Head/worker terminology (replace hub/spoke)

Phase 0b — New spec documents (7):
- identity.md, services.md, interface.md, configuration.md,
  storage.md, flowgraph.md, secret-service.md

Updated existing docs:
- auth.md: reference identity.md for canonical definitions, add AuthProtocol
- open-questions.md: resolve OQ-12, OQ-16, OQ-18, OQ-22, OQ-23-25
- README.md: add all new docs, ADRs 026-034

Marked 19 architecture tasks as completed.
2026-06-07 09:32:58 +00:00

5.8 KiB

ADR-028: Auth as irpc Service

Status

Accepted

Context

For head nodes serving many users, in-memory key lookup via ArcSwap<DynamicConfig> doesn't scale. Loading all authorized keys into RAM and atomic-swapping the entire set on each reload works for small deployments but requires holding every key in memory. For production deployments with hundreds or thousands of users, auth verification should query a database on demand rather than holding all keys in memory.

The current ArcSwap<DynamicConfig> approach works for CLI and single-node setups. What's needed is an async boundary that allows auth verification to go through a service — locally via channels for minimal deployments, or via irpc for production deployments where auth runs on a separate process or node.

The critical design point: callers go through the IdentityProvider trait (ADR-029). The irpc service is one way to satisfy the trait. Both paths produce the same result — an Identity or rejection. The trait is the contract; the service is an implementation path.

Decision

Auth verification is provided via an irpc service protocol, with IdentityProvider as the interface contract and ConfigIdentityProvider (ArcSwap-backed) as the default implementation.

IdentityProvider Trait (ADR-029) — The Contract

Callers depend on IdentityProvider, not on any concrete implementation:

pub trait IdentityProvider: Send + Sync + 'static {
    fn resolve_from_fingerprint(&self, fingerprint: &str) -> Option<Identity>;
    fn resolve_from_token(&self, token: &AuthToken) -> Option<Identity>;
}

ConfigIdentityProvider — Default Implementation

Reads from ArcSwap<DynamicConfig.auth>. No database needed. Every authorized key gets a default scope set. This is the default for CLI and single-node deployments.

AuthProtocol irpc Service — Behind Feature Flag

#[rpc_requests(message = AuthMessage)]
#[derive(Debug, Serialize, Deserialize)]
enum AuthProtocol {
    #[rpc(tx=oneshot::Sender<AuthResult>)]
    #[wrap(VerifyPubkey)]
    VerifyPubkey { fingerprint: String, key_data: Vec<u8> },

    #[rpc(tx=oneshot::Sender<AuthResult>)]
    #[wrap(VerifyToken)]
    VerifyToken { token_bytes: Vec<u8>, timestamp: u64 },

    #[rpc(tx=oneshot::Sender<()>)]
    #[wrap(ReloadKeys)]
    ReloadKeys,

    #[rpc(tx=oneshot::Sender<bool>)]
    #[wrap(CheckAccess)]
    CheckAccess { identity: Identity, operation: String },
}

enum AuthResult {
    Ok(Identity),
    Denied(String),
}

The AuthProtocol is behind the irpc feature flag in alknet-core. Nodes that only do SSH tunneling don't need the service layer overhead. When the feature is disabled, auth goes through IdentityProvider directly.

AuthServiceImpl

Two implementations exist:

  • ConfigAuthService — backed by ConfigIdentityProvider (ArcSwap path). Wraps the trait in an irpc service for deployments that use the service layer but don't have SQLite.
  • StorageAuthService — backed by SQLite peer_credentials and api_keys tables (in alknet-storage). Queries on demand. Can maintain an LRU cache for hot fingerprints. This is the production implementation.

Both produce the same AuthResult — an Identity or a denial. Callers don't know or care which backend is running.

Integration with IdentityProvider

The irpc service and the trait compose. A caller goes through IdentityProvider, which may internally delegate to the irpc service, or may satisfy the request locally via ConfigIdentityProvider. The deployment topology determines the path:

  • Minimal (CLI, single-node): ConfigIdentityProvider reads from ArcSwap<DynamicConfig>. No irpc overhead.
  • Production with local auth: AuthServiceImpl wraps StorageIdentityProvider locally. The handler calls IdentityProvider which routes to the local irpc service.
  • Distributed auth: Handler on a worker node calls IdentityProvider which routes to a remote auth irpc service over QUIC.

ConfigService Integration

AuthProtocol::ReloadKeys triggers reload of the dynamic config's auth section. For the ConfigIdentityProvider path, this is equivalent to ConfigReloadHandle::reload(). For the StorageIdentityProvider path, this refreshes the LRU cache. Both update atomically — ongoing connections are unaffected, new connections pick up changes.

Consequences

  • Positive: Minimal deployments use ArcSwap without irpc overhead. No database dependency for CLI users.
  • Positive: Production deployments wire StorageIdentityProvider behind the irpc service. Auth scales to thousands of users without loading all keys into memory.
  • Positive: The IdentityProvider trait is the only contract callers depend on. This keeps alknet-core lean and testable.
  • Positive: Feature flag (irpc) keeps core lean for deployments that don't need the service layer.
  • Positive: Both paths produce identical Identity results. Behavioral parity is enforced by the shared Identity type.
  • Negative: Two implementations must be kept in sync. ConfigIdentityProvider and StorageIdentityProvider must produce the same Identity for the same input. Integration tests should verify this.
  • Negative: The irpc feature flag adds conditional compilation complexity. The core must compile and work without it, and the service layer must work with it enabled.

References