Files
alknet/docs/architecture/decisions/028-auth-irpc-service.md
glm-5.1 e7941da04a docs: clarify phase boundaries — Phase 1 vs downstream concerns
The architecture specs were implying that StorageIdentityProvider, irpc
service implementations, and application services (agent, Docker, etc.)
already exist. This commit makes the phasing explicit:

- services.md: deployment topology now clearly labels 'Current (Phase 1)'
  vs 'Future (Phase 2+)', notes that application services are downstream
- identity.md: StorageIdentityProvider labeled 'Future — Phase 2+',
  clarifying alknet-storage doesn't exist yet
- storage.md: adds phase note that the crate hasn't been built yet,
  StorageIdentityProvider is a future impl
- ADR-028: ConfigAuthService is Phase 1 path, StorageAuthService is
  Phase 2+ contract
- call-protocol.md: Agent Service Pattern section explicitly framed as
  a downstream application concern, not a core requirement
2026-06-07 10:29:52 +00:00

5.9 KiB

ADR-028: Auth as irpc Service

Status

Accepted

Context

For head nodes serving many users, in-memory key lookup via ArcSwap<DynamicConfig> doesn't scale. Loading all authorized keys into RAM and atomic-swapping the entire set on each reload works for small deployments but requires holding every key in memory. For production deployments with hundreds or thousands of users, auth verification should query a database on demand rather than holding all keys in memory.

The current ArcSwap<DynamicConfig> approach works for CLI and single-node setups. What's needed is an async boundary that allows auth verification to go through a service — locally via channels for minimal deployments, or via irpc for production deployments where auth runs on a separate process or node.

The critical design point: callers go through the IdentityProvider trait (ADR-029). The irpc service is one way to satisfy the trait. Both paths produce the same result — an Identity or rejection. The trait is the contract; the service is an implementation path.

Decision

Auth verification is provided via an irpc service protocol, with IdentityProvider as the interface contract and ConfigIdentityProvider (ArcSwap-backed) as the default implementation.

IdentityProvider Trait (ADR-029) — The Contract

Callers depend on IdentityProvider, not on any concrete implementation:

pub trait IdentityProvider: Send + Sync + 'static {
    fn resolve_from_fingerprint(&self, fingerprint: &str) -> Option<Identity>;
    fn resolve_from_token(&self, token: &AuthToken) -> Option<Identity>;
}

ConfigIdentityProvider — Default Implementation

Reads from ArcSwap<DynamicConfig.auth>. No database needed. Every authorized key gets a default scope set. This is the default for CLI and single-node deployments.

AuthProtocol irpc Service — Behind Feature Flag

#[rpc_requests(message = AuthMessage)]
#[derive(Debug, Serialize, Deserialize)]
enum AuthProtocol {
    #[rpc(tx=oneshot::Sender<AuthResult>)]
    #[wrap(VerifyPubkey)]
    VerifyPubkey { fingerprint: String, key_data: Vec<u8> },

    #[rpc(tx=oneshot::Sender<AuthResult>)]
    #[wrap(VerifyToken)]
    VerifyToken { token_bytes: Vec<u8>, timestamp: u64 },

    #[rpc(tx=oneshot::Sender<()>)]
    #[wrap(ReloadKeys)]
    ReloadKeys,

    #[rpc(tx=oneshot::Sender<bool>)]
    #[wrap(CheckAccess)]
    CheckAccess { identity: Identity, operation: String },
}

enum AuthResult {
    Ok(Identity),
    Denied(String),
}

The AuthProtocol is behind the irpc feature flag in alknet-core. Nodes that only do SSH tunneling don't need the service layer overhead. When the feature is disabled, auth goes through IdentityProvider directly.

AuthServiceImpl

Two implementations exist (the second is a future phase):

  • ConfigAuthService — backed by ConfigIdentityProvider (ArcSwap path). Wraps the trait in an irpc service for deployments that use the service layer but don't have SQLite. This is the Phase 1 path: it ships with alknet-core.
  • StorageAuthService — backed by SQLite peer_credentials and api_keys tables (in alknet-storage, not yet built). Queries on demand. Can maintain an LRU cache for hot fingerprints. This is a Phase 2+ implementation — the contract is defined here so alknet-storage can implement it later.

Both produce the same AuthResult — an Identity or a denial. Callers don't know or care which backend is running.

Integration with IdentityProvider

The irpc service and the trait compose. A caller goes through IdentityProvider, which may internally delegate to the irpc service, or may satisfy the request locally via ConfigIdentityProvider. The deployment topology determines the path:

  • Minimal (CLI, single-node): ConfigIdentityProvider reads from ArcSwap<DynamicConfig>. No irpc overhead.
  • Production with local auth: AuthServiceImpl wraps StorageIdentityProvider locally. The handler calls IdentityProvider which routes to the local irpc service.
  • Distributed auth: Handler on a worker node calls IdentityProvider which routes to a remote auth irpc service over QUIC.

ConfigService Integration

AuthProtocol::ReloadKeys triggers reload of the dynamic config's auth section. For the ConfigIdentityProvider path, this is equivalent to ConfigReloadHandle::reload(). For the StorageIdentityProvider path, this refreshes the LRU cache. Both update atomically — ongoing connections are unaffected, new connections pick up changes.

Consequences

  • Positive: Minimal deployments use ArcSwap without irpc overhead. No database dependency for CLI users.
  • Positive: Production deployments wire StorageIdentityProvider behind the irpc service. Auth scales to thousands of users without loading all keys into memory.
  • Positive: The IdentityProvider trait is the only contract callers depend on. This keeps alknet-core lean and testable.
  • Positive: Feature flag (irpc) keeps core lean for deployments that don't need the service layer.
  • Positive: Both paths produce identical Identity results. Behavioral parity is enforced by the shared Identity type.
  • Negative: Two implementations must be kept in sync. ConfigIdentityProvider and StorageIdentityProvider must produce the same Identity for the same input. Integration tests should verify this.
  • Negative: The irpc feature flag adds conditional compilation complexity. The core must compile and work without it, and the service layer must work with it enabled.

References