docs(architecture): add ADR-023, resolve OQ-24 — operation error schemas

ADR-023 adds error_schemas to OperationSpec so operations can declare
their domain-level failure modes (FILE_NOT_FOUND, RATE_LIMITED, etc.)
distinct from protocol-level codes (NOT_FOUND, FORBIDDEN, etc.). The
call.error payload gains an optional 'details' field carrying the typed
error payload conforming to the declared schema. from_openapi/to_openapi
map OpenAPI response status codes to/from ErrorDefinitions, making the
adapter contract from ADR-017 faithful on the error axis.

Also fixes W2 (KeyVersionMismatch stale comment in encryption.md —
ADR-021 implements rotation without this variant) and W4
(derive_encryption_key_for_version missing from service.md method list).

Spec updates: operation-registry.md (OperationSpec, ErrorDefinition,
Handler error mapping, services/schema), call-protocol.md (call.error
payload, CallError, ResponseEnvelope), README.md, overview.md,
open-questions.md (OQ-24), call/README.md, encryption.md, service.md.
This commit is contained in:
2026-06-21 10:26:18 +00:00
parent 1cedc4eeba
commit 3e238a471b
9 changed files with 478 additions and 26 deletions

View File

@@ -7,7 +7,7 @@ last_updated: 2026-06-20
## Current State
**Pre-implementation.** The project has completed a pivot from a three-layer model to an ALPN-as-service model. The greenfield workspace contains only `alknet-vault` (stable — implementation exists) and research/reference material. Foundational ADRs (001022) are in place, including the BiStream type definition (ADR-007), vault integration (ADR-008), ALPN router/endpoint (ADR-010), AuthContext structure (ADR-011), call protocol stream model (ADR-012), Rust as canonical implementation language (ADR-013), secret material flow with capability injection (ADR-014), privilege model with authority context (ADR-015), abort cascade for nested calls (ADR-016), call protocol client and adapter contract (ADR-017), vault standalone crate (ADR-018), vault assembly-layer-only access (ADR-019), HD derivation for encryption keys (ADR-020), key rotation via version-indexed paths (ADR-021), and handler registration, provenance, and composition authority (ADR-022). The alknet-core, alknet-call, and alknet-vault crate specs are in draft.
**Pre-implementation.** The project has completed a pivot from a three-layer model to an ALPN-as-service model. The greenfield workspace contains only `alknet-vault` (stable — implementation exists) and research/reference material. Foundational ADRs (001023) are in place, including the BiStream type definition (ADR-007), vault integration (ADR-008), ALPN router/endpoint (ADR-010), AuthContext structure (ADR-011), call protocol stream model (ADR-012), Rust as canonical implementation language (ADR-013), secret material flow with capability injection (ADR-014), privilege model with authority context (ADR-015), abort cascade for nested calls (ADR-016), call protocol client and adapter contract (ADR-017), vault standalone crate (ADR-018), vault assembly-layer-only access (ADR-019), HD derivation for encryption keys (ADR-020), key rotation via version-indexed paths (ADR-021), handler registration, provenance, and composition authority (ADR-022), and operation error schemas (ADR-023). The alknet-core, alknet-call, and alknet-vault crate specs are in draft.
**Next step**: Review the vault spec documents, then begin implementation. All open questions for the core and call crates are resolved; the vault crate has one deferred OQ (OQ-21, remote vault administration) that does not block implementation.
@@ -57,6 +57,7 @@ last_updated: 2026-06-20
| [020](decisions/020-hd-derivation-for-encryption-keys.md) | HD Derivation for Encryption Keys | Accepted |
| [021](decisions/021-key-rotation-via-version-indexed-paths.md) | Key Rotation via Version-Indexed Paths | Accepted |
| [022](decisions/022-handler-registration-provenance-and-composition-authority.md) | Handler Registration, Provenance, and Composition Authority | Proposed |
| [023](decisions/023-operation-error-schemas.md) | Operation Error Schemas | Proposed |
## Open Questions
@@ -85,6 +86,7 @@ See [open-questions.md](open-questions.md) for the full tracker.
- **OQ-20**: Encryption key derivation — HD derivation from BIP39 seed, not PBKDF2; salt field unused in v2 (wire-format compat) (ADR-020)
- **OQ-22**: Key rotation — version-indexed derivation paths; `rotate` method re-encrypts (ADR-021)
- **OQ-23**: Handler identity registration path — registration bundle with provenance, composition authority, scoped env, capabilities (ADR-022)
- **OQ-24**: Operation error schemas — declared domain errors with typed `details` payload; adapter fidelity for `from_openapi`/`to_openapi` (ADR-023)
**Deferred (not active):**
- **OQ-09**: WASM target boundaries — design constraint, not deliverable

View File

@@ -33,6 +33,7 @@ Structured RPC over QUIC: operations, request/response, streaming subscriptions,
| [016](../../decisions/016-abort-cascade-for-nested-calls.md) | Abort Cascade for Nested Calls | `call.aborted` cascades to descendants; default `abort-dependents`, `continue-running` opt-in |
| [017](../../decisions/017-call-protocol-client-and-adapter-contract.md) | Call Protocol Client and Adapter Contract | `CallClient` opens connections; `from_call` imports remote ops; connection direction independent of call direction |
| [022](../../decisions/022-handler-registration-provenance-and-composition-authority.md) | Handler Registration, Provenance, and Composition Authority | Registration bundle carries provenance, composition authority, scoped env, capabilities |
| [023](../../decisions/023-operation-error-schemas.md) | Operation Error Schemas | Operations declare domain errors; `call.error` carries typed `details`; adapter fidelity |
## Relevant Open Questions

View File

@@ -1,6 +1,6 @@
---
status: draft
last_updated: 2026-06-21
last_updated: 2026-06-22
---
# Call Protocol
@@ -127,19 +127,28 @@ The `payload` of a `call.requested` event has this shape:
```json
{
"code": "NOT_FOUND",
"message": "operation not found: /fs/readFile",
"retryable": false
"code": "FILE_NOT_FOUND",
"message": "file not found: /etc/nonexistent",
"retryable": false,
"details": { "path": "/etc/nonexistent", "errno": 2 }
}
```
Error codes use an extensible string enum. The protocol defines the following codes:
- `NOT_FOUND` — operation not in registry
Error codes use an extensible string enum. The protocol defines the following **protocol-level codes** (emitted by the dispatch machinery, not by handlers):
- `NOT_FOUND` — operation not in registry (or Internal op called from wire)
- `FORBIDDEN` — access denied (insufficient scopes or unauthenticated)
- `INVALID_INPUT` — input doesn't match the operation's JSON Schema
- `INTERNAL` — handler error
- `INTERNAL` — handler error, panic, connection failure
- `TIMEOUT` — request timed out (retryable: true)
Operations may also declare **operation-level domain codes** in their `error_schemas` (ADR-023) — e.g., `FILE_NOT_FOUND`, `RATE_LIMITED`, `INSUFFICIENT_CREDITS`. These are emitted by handlers and carry a `details` payload conforming to the declared `ErrorDefinition.schema`. Protocol-level errors omit `details` or carry protocol-specific context (e.g., the operation name for `NOT_FOUND`).
Fields:
- `code` — the error code (protocol-level or operation-level)
- `message` — human-readable error message. For logging and debugging, not for programmatic handling. Clients should switch on `code`, not parse `message`.
- `retryable` — whether the caller should retry. `true` for transient failures, `false` for permanent ones.
- `details` — optional. When the code matches a declared `ErrorDefinition`, `details` conforms to that definition's schema. This is the typed error payload — it makes errors structured instead of string-matched. See ADR-023.
New error codes may be added in future versions. Clients should treat unknown error codes as `INTERNAL` with `retryable: false`.
### Protocol Operations
@@ -304,13 +313,14 @@ pub struct ResponseEnvelope {
}
pub struct CallError {
pub code: String,
pub message: String,
pub code: String, // protocol-level (NOT_FOUND, FORBIDDEN, ...) or operation-level (ADR-023)
pub message: String, // human-readable, for logging — not for programmatic handling
pub retryable: bool,
pub details: Option<Value>, // typed error payload, conforms to ErrorDefinition.schema (ADR-023)
}
```
Local dispatch produces `ResponseEnvelope` with no serialization overhead. The `CallAdapter` converts `ResponseEnvelope` to `EventEnvelope` for the wire.
Local dispatch produces `ResponseEnvelope` with no serialization overhead. The `CallAdapter` converts `ResponseEnvelope` to `EventEnvelope` for the wire. When a handler returns a `CallError` whose `code` matches a declared `ErrorDefinition`, the `details` field carries the typed error payload. See ADR-023.
### Connection and Stream Lifecycle
@@ -356,6 +366,7 @@ Handlers clean up resources when their call is cancelled (in Rust, the future is
| Abort cascade for nested calls | [ADR-016](../../decisions/016-abort-cascade-for-nested-calls.md) | `call.aborted` cascades to descendants; default `abort-dependents`, `continue-running` opt-in |
| Call protocol client and adapter contract | [ADR-017](../../decisions/017-call-protocol-client-and-adapter-contract.md) | `CallClient` opens connections; `from_call` imports remote ops; connection direction independent of call direction |
| Handler registration, provenance, and composition authority | [ADR-022](../../decisions/022-handler-registration-provenance-and-composition-authority.md) | Registration bundle carries provenance, composition authority, scoped env, capabilities; dispatch path reads from bundle |
| Operation error schemas | [ADR-023](../../decisions/023-operation-error-schemas.md) | Operations declare domain errors; `call.error` carries typed `details` |
## Open Questions

View File

@@ -37,6 +37,7 @@ pub struct OperationSpec {
pub visibility: Visibility, // External (wire-callable) or Internal (composition-only)
pub input_schema: Value, // JSON Schema for input
pub output_schema: Value, // JSON Schema for output
pub error_schemas: Vec<ErrorDefinition>, // Declared domain errors (ADR-023)
pub access_control: AccessControl,
}
@@ -50,6 +51,14 @@ pub enum Visibility {
External, // Callable from the wire (call.requested from a client)
Internal, // Composition-only (env.invoke from a handler)
}
/// A declared operation-level error. See ADR-023.
pub struct ErrorDefinition {
pub code: String, // e.g., "FILE_NOT_FOUND", "RATE_LIMITED"
pub description: String, // Human-readable description
pub schema: Value, // JSON Schema for the error detail payload
pub http_status: Option<u16>, // HTTP status for adapter projection (from_openapi/to_openapi)
}
```
Operation names use slash-based paths without a leading slash, aligned with URL path conventions: `fs/readFile`, `agent/chat`, `services/list`. The leading slash is added when needed for display (`spec.path()` returns `/fs/readFile`) and for wire format (the `call.requested` payload uses `/fs/readFile`). See OQ-13 for the path format decision (single-node `service/op` vs head/worker `node/service/op`).
@@ -94,6 +103,8 @@ A handler receives:
And returns a `ResponseEnvelope` containing the result or an error. `ResponseEnvelope` is defined in [call-protocol.md](call-protocol.md#responseenvelope) — it carries the request ID and a `Result<Value, CallError>`. Local dispatch produces it with no serialization overhead; the `CallAdapter` converts it to `EventEnvelope` for the wire.
When a handler returns an error, the `CallError.code` is matched against the operation's declared `error_schemas` (ADR-023). If the code matches a declared `ErrorDefinition`, the `call.error` event carries that code and the error's detail payload. If it doesn't match, the `call.error` carries `INTERNAL`. This is how handler failures become typed errors on the wire instead of string-matched messages.
### OperationContext
```rust
@@ -272,7 +283,7 @@ These are read-only — no admin operations are exposed through the call protoco
}
```
`services/schema` accepts `{ "name": "fs/readFile" }` and returns the full `OperationSpec` including input/output JSON Schemas.
`services/schema` accepts `{ "name": "fs/readFile" }` and returns the full `OperationSpec` including input/output JSON Schemas and declared `error_schemas` (ADR-023). This enables client code generation: a client reading the schema can produce typed error enums instead of generic error handling.
### irpc Integration
@@ -392,6 +403,7 @@ The `Capabilities` type holds non-serializable, zeroized secret material. It doe
| Secret material flow and capability injection | [ADR-014](../../decisions/014-secret-material-flow-and-capability-injection.md) | Capabilities carry outbound credentials; call protocol carries no secret material |
| Privilege model and authority context | [ADR-015](../../decisions/015-privilege-model-and-authority-context.md) | `internal` = authority switch not ACL skip; External/Internal visibility; composition authority + scoped env |
| Handler registration, provenance, and composition authority | [ADR-022](../../decisions/022-handler-registration-provenance-and-composition-authority.md) | Registration bundle carries provenance, composition authority, scoped env, capabilities; dispatch path reads from bundle |
| Operation error schemas | [ADR-023](../../decisions/023-operation-error-schemas.md) | Operations declare domain errors; `call.error` carries typed `details`; adapter fidelity for `from_openapi`/`to_openapi` |
## Open Questions

View File

@@ -1,6 +1,6 @@
---
status: draft
last_updated: 2026-06-19
last_updated: 2026-06-20
---
# Encryption
@@ -194,7 +194,7 @@ pub enum EncryptionError {
Encryption(String), // encryption failed
Decryption(String), // decryption failed (wrong key, tampered data, bad UTF-8)
Decoding(String), // base64 decoding failed
KeyVersionMismatch { expected: u32, actual: u32 }, // reserved for future rotation (OQ-22)
KeyVersionMismatch { expected: u32, actual: u32 }, // unused — see note below
}
```
@@ -202,12 +202,17 @@ Decryption failures are intentionally generic — they don't distinguish
"wrong key" from "tampered data" from "corrupted storage" to avoid
leaking information to an attacker.
`KeyVersionMismatch` is **defined but unused in v2** — neither `encrypt()`
nor `decrypt()` returns it. It is reserved for future key rotation
enforcement (OQ-22), where the vault may enforce version matching before
decrypting. In v2, the `key_version` is stamped onto `EncryptedData` and
`EncryptionKey` for forward compatibility but does not gate decryption. An
implementer should not expect this variant to fire in v2.
`KeyVersionMismatch` is **defined but unused.** ADR-021 implements key
rotation via version-indexed derivation paths — `decrypt` derives the key
at the path indicated by `encrypted.key_version`, so there is no
version-mismatch to detect at the error level (every blob carries its own
version, and every version has a derivable key). This variant predates
ADR-021's rotation mechanism and is retained in the enum for source
compatibility but is not emitted by any code path in v2. An implementer
should not wire it up or expect it to fire. If a future use case requires
enforcing version constraints (e.g., "refuse to decrypt blobs older than
v3"), this variant could be repurposed — but that would be a new decision,
not part of ADR-021's rotation scheme.
## Design Decisions

View File

@@ -1,6 +1,6 @@
---
status: draft
last_updated: 2026-06-19
last_updated: 2026-06-20
---
# Service
@@ -126,6 +126,23 @@ Derive an AES-256-GCM encryption key at the given path. Same cache
behavior as `derive_ed25519`. Returns a `DerivedKey` with
`KeyType::Aes256Gcm`.
### derive_encryption_key_for_version(version) → EncryptionKey
```rust
pub fn derive_encryption_key_for_version(&self, version: u32) -> Result<EncryptionKey, VaultServiceError>;
```
Derive the encryption key for a specific key version. Maps the version to
its derivation path via `encryption_path_for_version(version)` (ADR-021):
v2 → `m/74'/2'/0'/0'`, v3 → `m/74'/2'/0'/1'`, etc. Cached by path. This is
the version-aware method that `decrypt` uses to select the correct key for
each blob — see [encryption.md](encryption.md) and ADR-021.
`derive_encryption_key(path)` (above) remains as the path-based API for
deriving at arbitrary paths. `derive_encryption_key_for_version(version)`
is the version-aware API used by `encrypt` and `decrypt`. The two share
the same cache (keyed by derivation path).
### derive_ethereum_key(path) → DerivedKey (feature-gated)
```rust
@@ -173,10 +190,10 @@ pub fn decrypt(&self, encrypted: &EncryptedData) -> Result<String, VaultServiceE
```
Decrypt an `EncryptedData` blob. Derives (and caches) the encryption key
at the version-indexed path indicated by `encrypted.key_version` (ADR-021).
Each version maps to a distinct path (`m/74'/2'/0'/{version-2}'`), so old
and new keys can coexist during partial rotation. See
[encryption.md](encryption.md).
at the version-indexed path indicated by `encrypted.key_version` via
`derive_encryption_key_for_version` (ADR-021). Each version maps to a
distinct path (`m/74'/2'/0'/{version-2}'`), so old and new keys can
coexist during partial rotation. See [encryption.md](encryption.md).
### rotate(encrypted, to_version) → EncryptedData

View File

@@ -0,0 +1,394 @@
# ADR-023: Operation Error Schemas
## Status
Proposed
## Context
The `OperationSpec` in alknet-call has `input_schema` and `output_schema` but
no `error_schemas`. The `call.error` payload (call-protocol.md L128134)
carries a `code` and `message`, where `code` is one of five infrastructure
codes: `NOT_FOUND`, `FORBIDDEN`, `INVALID_INPUT`, `INTERNAL`, `TIMEOUT`.
These five codes cover **protocol-level failures** — the call protocol
itself can always fail to find an operation, deny access, reject bad input,
time out, or hit an internal error. They are emitted by the dispatch
machinery (the registry, the adapter), not by operation handlers.
But operations also have **domain-level failures** that are not covered:
- `/fs/readFile` can fail because the file doesn't exist, the path is
invalid, or the caller lacks OS-level read permission. These are
operation-specific failures distinct from the protocol-level
`INVALID_INPUT` (schema mismatch) or `FORBIDDEN` (scope mismatch).
- `/vastai/createMachine` can fail because the account has insufficient
credits, the machine type is unavailable in the requested region, or the
upstream API rate-limited the request.
- `/agent/chat` can fail because the LLM provider returned an error, the
context window overflowed, or the model refused the request.
Today, these failures collapse into `INTERNAL` with a `message` string.
A client calling `/fs/readFile` has no way to know from the schema that it
might return `FILE_NOT_FOUND` vs `PERMISSION_DENIED` vs `INVALID_PATH`. The
caller has to parse `message` strings — the exact anti-pattern that typed
RPC is meant to avoid. This is a **type safety gap**: inputs and outputs are
typed, but errors are untyped strings.
### Why this matters for adapters
OpenAPI specs naturally include error information — response status codes
with schemas (e.g., `404: { schema: NotFoundError }`, `422: { schema:
ValidationError }`). MCP tool definitions carry error descriptions. The
`from_openapi` adapter (ADR-017 L113124) imports operations and mirrors
"the remote operation's name, namespace, type, schemas, and access control"
— but with no error schema field, error responses from the OpenAPI source
are dropped on import. `to_openapi` has nowhere to project error information
to. The same gap applies to `from_mcp`/`to_mcp`.
An OpenAPI operation that declares:
```yaml
responses:
'200': { schema: MachineList }
'401': { schema: AuthError }
'429': { schema: RateLimitError }
```
cannot be faithfully represented in alknet's `OperationSpec` today. The
adapter would import the `200` output schema and drop the error schemas —
a lossy import that silently discards the operation's failure contract.
### Prior art
The TypeScript reference (`/workspace/@alkdev/operations/src/types.ts`
L3847, L94, L112) defines `ErrorDefinitionSchema` and an optional
`errorSchemas?: ErrorDefinition[]` on `OperationSpec`:
```typescript
export const ErrorDefinitionSchema = Type.Object({
code: Type.String({ description: "Error Code e.g., INVALID_INPUT, NOT_FOUND, UNAUTHORIZED" }),
description: Type.String(),
schema: Type.Unknown(),
httpStatus: Type.Optional(Type.Number()),
});
```
The `mapError()` function (`error.ts` L2551) matches thrown errors against
the declared error schemas by code prefix — if a handler throws an error
whose message starts with a declared code, `mapError` rewrites it to a
typed `CallError` with that code. This is a proven pattern: operations
declare their error contract, the dispatch machinery maps runtime failures
to the declared codes, and clients get typed errors instead of string
parsing.
The translator agent omitted `errorSchemas` from the Rust spec, likely
because it's `Optional` in the TS schema (so dropping it doesn't break the
happy path) and because error schemas are semantically different from
input/output schemas (an operation returns one output but could return any
of several errors). That's a reasonable judgment call for a first
translation pass, but it leaves a real gap for adapters and clients.
### The general principle
This is the same principle as the Safe Exit protocol in the SDD process
(docs/sdd_process.md L19, L423): **make failure a typed, declared thing
rather than an untyped exception that crashes into whoever's listening.**
An operation that declares "I can fail with `FILE_NOT_FOUND`" is the same
shape as an agent that declares "I can fail with `TASK_AMBIGUOUS`" — both
turn an unknown unknown into a known known that the caller can handle
deliberately.
Complex systems survive not because every component is reliable, but
because failure is expected and typed. Cells have apoptosis (a declared
failure mode that protects the organism). Operations have error schemas (a
declared failure mode that lets the caller handle it). The alternative —
components that fail with untyped strings — is how you get brittle clients
that string-match error messages and break when the message wording
changes.
## Decision
### 1. `OperationSpec` gains an optional `error_schemas` field
```rust
pub struct OperationSpec {
pub name: String,
pub namespace: String,
pub op_type: OperationType,
pub visibility: Visibility,
pub input_schema: Value,
pub output_schema: Value,
pub access_control: AccessControl,
pub error_schemas: Vec<ErrorDefinition>, // NEW — empty vec = no declared errors
}
pub struct ErrorDefinition {
/// Machine-readable error code. e.g., "FILE_NOT_FOUND", "RATE_LIMITED",
/// "INSUFFICIENT_CREDITS". Distinct from the protocol-level codes
/// (NOT_FOUND, FORBIDDEN, etc.) — these are operation-level domain codes.
pub code: String,
/// Human-readable description of when this error occurs.
pub description: String,
/// JSON Schema for the error detail payload. The `call.error` event's
/// `details` field conforms to this schema when this error code is
/// returned. `Value` (serde_json::Value) carrying a JSON Schema, same
/// as input_schema/output_schema.
pub schema: Value,
/// HTTP status code for adapter projection. `from_openapi` maps OpenAPI
/// response status codes to error definitions; `to_openapi` projects
/// error definitions back to response status codes. Optional — not all
/// error sources are HTTP-backed.
pub http_status: Option<u16>,
}
```
`error_schemas` is a `Vec<ErrorDefinition>`, not `Option<Vec<...>>`. An
empty vec means "this operation declares no specific domain errors" (it may
still fail with protocol-level codes like `INTERNAL`). This avoids the
`None` vs `Some([])` ambiguity and matches the TypeScript reference's
optional-array convention.
### 2. The `call.error` payload gains an optional `details` field
```json
{
"code": "FILE_NOT_FOUND",
"message": "file not found: /etc/nonexistent",
"retryable": false,
"details": { "path": "/etc/nonexistent", "errno": 2 }
}
```
- `code` — the error code. Either a protocol-level code (`NOT_FOUND`,
`FORBIDDEN`, `INVALID_INPUT`, `INTERNAL`, `TIMEOUT`) or an
operation-level domain code from `error_schemas` (e.g.,
`FILE_NOT_FOUND`, `RATE_LIMITED`).
- `message` — human-readable error message. Unstructured — for logging and
debugging, not for programmatic handling. Clients should switch on
`code`, not parse `message`.
- `retryable` — whether the caller should retry. `true` for transient
failures (`TIMEOUT`, `RATE_LIMITED`), `false` for permanent ones
(`NOT_FOUND`, `FORBIDDEN`, `FILE_NOT_FOUND`).
- `details` — optional. When the error code matches a declared
`ErrorDefinition`, `details` conforms to that definition's `schema`. When
the error is protocol-level (`NOT_FOUND`, `FORBIDDEN`, etc.), `details`
is absent or carries protocol-specific context (e.g., the operation name
for `NOT_FOUND`). This field is the typed error payload — it's what
makes errors structured instead of string-matched.
### 3. Protocol-level vs operation-level error codes
The five existing codes are **protocol-level** — emitted by the dispatch
machinery, not by handlers:
| Code | Emitted by | Meaning |
|------|-----------|---------|
| `NOT_FOUND` | Registry | Operation not registered (or Internal op called from wire) |
| `FORBIDDEN` | Registry / ACL | Caller lacks required scopes, or unauthenticated |
| `INVALID_INPUT` | Registry | Input doesn't match `input_schema` |
| `INTERNAL` | Registry / Adapter | Handler panic, unhandled error, connection failure |
| `TIMEOUT` | Adapter | Request timed out |
Operation-level domain codes are emitted by **handlers** — the operation's
own logic determines what went wrong. They are declared in `error_schemas`
and appear in the `code` field of `call.error`. Examples: `FILE_NOT_FOUND`,
`PERMISSION_DENIED`, `RATE_LIMITED`, `INSUFFICIENT_CREDITS`,
`CONTEXT_OVERFLOW`.
The two namespaces are distinct but share the `code` field. Clients
should handle protocol-level codes uniformly (they mean the same thing
regardless of operation) and operation-level codes per-operation (they
mean what the operation's `error_schemas` says they mean). Unknown codes
— whether a future protocol code or an undeclared operation code — should
be treated as `INTERNAL` with `retryable: false` (same as the current
guidance in call-protocol.md L143).
### 4. Handler error mapping
When a handler returns an error, the dispatch machinery maps it to a
`call.error` event. The mapping:
1. If the handler returns a structured error with a `code` that matches a
declared `ErrorDefinition.code`, the `call.error` carries that code and
the error's detail payload (validated against the definition's `schema`).
2. If the handler returns a structured error with a `code` that doesn't
match any declared `ErrorDefinition`, the `call.error` carries
`INTERNAL` with the original code in `details`. This is an undeclared
error — the handler returned a typed error but didn't declare it.
3. If the handler returns an unstructured error (a string, a generic
`Error`, a panic), the `call.error` carries `INTERNAL` with
`retryable: false`. This is the current behavior for all handler
errors.
The TypeScript `mapError()` function (error.ts L2551) implements case 2
and 3 by matching error messages against declared codes. The Rust
implementation can use a typed error return from the handler (`Result<Value,
CallError>` where `CallError` carries a `code`), which is cleaner than
message-string matching — the handler returns a typed error, the registry
checks whether the code is declared, and the `call.error` is constructed
accordingly.
### 5. `from_openapi` and `to_openapi` error fidelity
`from_openapi` maps OpenAPI response status codes to `ErrorDefinition`s:
```rust
// OpenAPI: 404: { schema: NotFoundError }
// → ErrorDefinition { code: "NOT_FOUND", http_status: Some(404), schema: NotFoundError }
```
The adapter maps the OpenAPI error schema to alknet's JSON Schema format
(same conversion as input/output schemas). The `http_status` field records
the original status code so `to_openapi` can project it back.
`to_openapi` projects `error_schemas` back to OpenAPI response definitions:
```yaml
responses:
'200': { schema: <output_schema> }
'404': { schema: <error_schemas[0].schema> } # where http_status = 404
'429': { schema: <error_schemas[1].schema> } # where http_status = 429
```
This makes the adapter contract from ADR-017 faithful on the error axis —
no silent dropping of error contracts.
`from_mcp` and `to_mcp` follow the same pattern: MCP tool definitions carry
error descriptions, and the adapters map them to/from `ErrorDefinition`s.
### 6. `services/schema` exposes error schemas
`services/schema` returns the full `OperationSpec` including `error_schemas`.
A client querying `/services/schema` for `/fs/readFile` gets:
```json
{
"name": "fs/readFile",
"namespace": "fs",
"op_type": "query",
"input_schema": { ... },
"output_schema": { ... },
"error_schemas": [
{ "code": "FILE_NOT_FOUND", "description": "The file does not exist",
"schema": { "type": "object", "properties": { "path": { "type": "string" } } },
"http_status": null },
{ "code": "PERMISSION_DENIED", "description": "OS-level read permission denied",
"schema": { "type": "object", "properties": { "path": { "type": "string" }, "errno": { "type": "integer" } } },
"http_status": null }
]
}
```
This enables client code generation: a TypeScript or Rust client generator
reading the schema can produce a typed `Result<Output, FsReadFileError>`
enum instead of a generic `Result<Output, string>`.
## Consequences
**Positive:**
- Operations declare their failure modes. Clients get typed errors instead
of string-matched messages. This is the same type-safety property that
`input_schema` and `output_schema` provide, extended to the error axis.
- `from_openapi` and `to_openapi` are faithful on the error axis. An
OpenAPI operation's error contract is no longer silently dropped on
import or absent on export. The adapter contract from ADR-017 is now
complete.
- Client code generation can produce typed error enums. A client calling
`/fs/readFile` can match on `FILE_NOT_FOUND` vs `PERMISSION_DENIED`
instead of parsing `message` strings.
- The protocol-level vs operation-level distinction is explicit. Protocol
codes (`NOT_FOUND`, `FORBIDDEN`, etc.) mean the same thing regardless of
operation. Operation codes (`FILE_NOT_FOUND`, `RATE_LIMITED`) mean what
the operation declares. No conflation.
- The `details` field carries structured error context that conforms to a
schema — the error payload is typed, not a bare string. This enables
programmatic error handling (retry logic, user-facing error messages,
logging) without string parsing.
- The principle generalizes: making failure a typed, declared thing is the
same pattern as the SDD process's Safe Exit protocol (typed agent
failure) and the same pattern complex biological systems use (apoptosis
as a declared cell failure mode). The more components declare their
failure modes, the more robust the system.
**Negative:**
- `OperationSpec` gains a field. Operations that don't declare errors
(empty `error_schemas` vec) still work — the field is additive. But
operations that *should* declare errors and don't will produce `INTERNAL`
with `retryable: false`, same as today. The gap is visible but not
enforced — an operation can ship without error schemas and clients get
untyped errors for it. This is a documentation/guidance issue, not a
type-system issue.
- The `call.error` payload gains a `details` field. This is a wire-format
addition. Existing clients that only read `code` and `message` are
unaffected (they ignore `details`). New clients can read `details` for
structured error context. This is backward-compatible — `details` is
optional and absent for protocol-level errors.
- Handler error mapping adds a step to the dispatch path: the registry
checks whether the handler's error code matches a declared
`ErrorDefinition`. This is a `HashMap` lookup by code — negligible cost.
- The `http_status` field on `ErrorDefinition` is HTTP-specific. Operations
that aren't HTTP-backed (local, session, from_mcp) leave it as `None`.
This is a pragmatic choice: `from_openapi`/`to_openapi` need it, and it's
optional for everything else. A future non-HTTP adapter that needs a
different error projection field would add it — but `http_status` covers
the immediate use case.
- The TypeScript `mapError()` uses message-string matching to map thrown
errors to codes. The Rust implementation can do better (typed `CallError`
return from handlers), but this means the `Handler` type's return is
`Result<Value, CallError>` rather than `Result<Value, Box<dyn Error>>`.
This is a cleaner API but a slight constraint on handler authors — they
return typed errors, not generic ones. Mitigated: `CallError::internal()`
is available for errors that don't fit a declared code.
## Assumptions
1. **Operations can enumerate their meaningful failure modes at
registration time.** If an operation has failure modes that are only
discoverable at runtime (e.g., a dynamic API that returns novel error
codes), those would be `INTERNAL` with `details` carrying the upstream
error. The assumption is that most operations have a knowable set of
domain errors.
2. **Error codes are stable per operation.** Once an operation declares
`FILE_NOT_FOUND`, clients depend on that code. Changing it (renaming to
`NOT_FOUND_FILE`) is a breaking change for clients that match on it.
This is the same stability property as `input_schema` and
`output_schema` — the operation's interface is its contract. Adding new
error codes is additive (clients that don't know the new code treat it
as `INTERNAL`); removing or renaming codes is breaking.
3. **Protocol-level codes are distinct from operation-level codes.** If an
operation declares a code that collides with a protocol code (e.g., an
operation declares `NOT_FOUND` as a domain error), the protocol code
takes precedence in the dispatch machinery (the registry's `NOT_FOUND`
for "operation not registered" is emitted before the handler runs). The
assumption is that operations use domain-specific codes (`FILE_NOT_FOUND`)
rather than reusing protocol codes (`NOT_FOUND`). This is a naming
convention, not a type-system enforcement.
4. **`details` is optional and backward-compatible.** Existing clients that
ignore `details` continue to work. New clients read `details` for
structured context. The wire format addition is additive.
## References
- ADR-017: Call protocol client and adapter contract (adapter fidelity —
this ADR makes `from_openapi`/`to_openapi` faithful on the error axis)
- ADR-014: Secret material flow (the `details` field must not carry secret
material — same constraint as `metadata`)
- ADR-015: Privilege model (the `FORBIDDEN` protocol code covers ACL
denial; operation-level `PERMISSION_DENIED` is a distinct domain error
for OS-level permission issues)
- docs/reviews/001-pre-implementation-architecture-sanity-check.md
(finding C5, which this ADR resolves)
- docs/sdd_process.md L19, L423 (Safe Exit protocol — the general principle
of making failure typed and declared)
- TypeScript reference: `/workspace/@alkdev/operations/src/types.ts`
L3847 (`ErrorDefinitionSchema`), L94, L112 (`errorSchemas` on
`OperationSpec`), `error.ts` L2551 (`mapError`)

View File

@@ -300,4 +300,13 @@ These questions are acknowledged but not active. They will be promoted to open w
- **Door type**: One-way (security model), two-way (bundle shape)
- **Priority**: high
- **Resolution**: ADR-015 said handler identity was "set at registration by the assembly layer" but the registration API (`register(spec, handler)`) had no place for it — meaning every internal call would check ACL against `None`, reproducing the escalation gap ADR-015 was written to close. ADR-022 resolves this with a registration bundle (`HandlerRegistration`) carrying `provenance`, `composition_authority` (replacing `handler_identity: Identity` — it's a declared authority bundle, not a peer identity), `scoped_env`, and `capabilities`. The dispatch path (`build_root_context` and `OperationEnv::invoke()`) reads from the bundle. Provenance determines which ops can compose: only `Local` and `Session` get composition authority; leaves (`FromOpenAPI`, `FromMCP`, `FromCall`) get `None` — they don't compose, so they don't need it. Capabilities are per-request on `OperationContext`, populated from the bundle (resolving the closure-capture vs context ambiguity). The kernel/user analogy: user's authority checked once at the External gate; handler's composition authority used for all composition inside; scoped env bounds reachability. No intersection — the user's authority does not limit internal calls. See ADR-022.
- **Cross-references**: ADR-014, ADR-015, ADR-022, docs/reviews/001-pre-implementation-architecture-sanity-check.md (C1C4), [operation-registry.md](crates/call/operation-registry.md), [call-protocol.md](crates/call/call-protocol.md)
- **Cross-references**: ADR-014, ADR-015, ADR-022, docs/reviews/001-pre-implementation-architecture-sanity-check.md (C1C4), [operation-registry.md](crates/call/operation-registry.md), [call-protocol.md](crates/call/call-protocol.md)
### OQ-24: Operation Error Schemas
- **Origin**: [operation-registry.md](crates/call/operation-registry.md), [call-protocol.md](crates/call/call-protocol.md), ADR-017
- **Status**: resolved
- **Door type**: One-way (wire format), two-way (mapping mechanism)
- **Priority**: high
- **Resolution**: `OperationSpec` gains `error_schemas: Vec<ErrorDefinition>` where each `ErrorDefinition` carries a `code`, `description`, `schema` (JSON Schema for the error detail payload), and optional `http_status` (for adapter projection). The `call.error` payload gains an optional `details` field carrying the typed error payload. Protocol-level codes (`NOT_FOUND`, `FORBIDDEN`, `INVALID_INPUT`, `INTERNAL`, `TIMEOUT`) are distinct from operation-level domain codes (`FILE_NOT_FOUND`, `RATE_LIMITED`, etc.) — protocol codes are emitted by the dispatch machinery, operation codes by handlers. `from_openapi`/`to_openapi` map OpenAPI response status codes to/from `ErrorDefinition`s, making the adapter contract from ADR-017 faithful on the error axis. `services/schema` exposes `error_schemas` for client code generation. See ADR-023.
- **Cross-references**: ADR-017, ADR-023, docs/reviews/001-pre-implementation-architecture-sanity-check.md (C5), [operation-registry.md](crates/call/operation-registry.md), [call-protocol.md](crates/call/call-protocol.md)

View File

@@ -213,6 +213,7 @@ All design decisions are documented as ADRs in [decisions/](decisions/).
| [020](decisions/020-hd-derivation-for-encryption-keys.md) | HD Derivation for Encryption Keys | SLIP-0010 derivation from seed, not PBKDF2; salt field unused in v2 |
| [021](decisions/021-key-rotation-via-version-indexed-paths.md) | Key Rotation via Version-Indexed Paths | Version-indexed derivation paths; `rotate` re-encrypts between versions |
| [022](decisions/022-handler-registration-provenance-and-composition-authority.md) | Handler Registration, Provenance, and Composition Authority | Registration bundle carries provenance, composition authority, scoped env, capabilities; dispatch path reads from bundle |
| [023](decisions/023-operation-error-schemas.md) | Operation Error Schemas | Operations declare domain errors; `call.error` carries typed `details`; adapter fidelity for `from_openapi`/`to_openapi` |
## Open Questions