docs(architecture): add ADR-023, resolve OQ-24 — operation error schemas

ADR-023 adds error_schemas to OperationSpec so operations can declare
their domain-level failure modes (FILE_NOT_FOUND, RATE_LIMITED, etc.)
distinct from protocol-level codes (NOT_FOUND, FORBIDDEN, etc.). The
call.error payload gains an optional 'details' field carrying the typed
error payload conforming to the declared schema. from_openapi/to_openapi
map OpenAPI response status codes to/from ErrorDefinitions, making the
adapter contract from ADR-017 faithful on the error axis.

Also fixes W2 (KeyVersionMismatch stale comment in encryption.md —
ADR-021 implements rotation without this variant) and W4
(derive_encryption_key_for_version missing from service.md method list).

Spec updates: operation-registry.md (OperationSpec, ErrorDefinition,
Handler error mapping, services/schema), call-protocol.md (call.error
payload, CallError, ResponseEnvelope), README.md, overview.md,
open-questions.md (OQ-24), call/README.md, encryption.md, service.md.
This commit is contained in:
2026-06-21 10:26:18 +00:00
parent 1cedc4eeba
commit 3e238a471b
9 changed files with 478 additions and 26 deletions

View File

@@ -33,6 +33,7 @@ Structured RPC over QUIC: operations, request/response, streaming subscriptions,
| [016](../../decisions/016-abort-cascade-for-nested-calls.md) | Abort Cascade for Nested Calls | `call.aborted` cascades to descendants; default `abort-dependents`, `continue-running` opt-in |
| [017](../../decisions/017-call-protocol-client-and-adapter-contract.md) | Call Protocol Client and Adapter Contract | `CallClient` opens connections; `from_call` imports remote ops; connection direction independent of call direction |
| [022](../../decisions/022-handler-registration-provenance-and-composition-authority.md) | Handler Registration, Provenance, and Composition Authority | Registration bundle carries provenance, composition authority, scoped env, capabilities |
| [023](../../decisions/023-operation-error-schemas.md) | Operation Error Schemas | Operations declare domain errors; `call.error` carries typed `details`; adapter fidelity |
## Relevant Open Questions

View File

@@ -1,6 +1,6 @@
---
status: draft
last_updated: 2026-06-21
last_updated: 2026-06-22
---
# Call Protocol
@@ -127,19 +127,28 @@ The `payload` of a `call.requested` event has this shape:
```json
{
"code": "NOT_FOUND",
"message": "operation not found: /fs/readFile",
"retryable": false
"code": "FILE_NOT_FOUND",
"message": "file not found: /etc/nonexistent",
"retryable": false,
"details": { "path": "/etc/nonexistent", "errno": 2 }
}
```
Error codes use an extensible string enum. The protocol defines the following codes:
- `NOT_FOUND` — operation not in registry
Error codes use an extensible string enum. The protocol defines the following **protocol-level codes** (emitted by the dispatch machinery, not by handlers):
- `NOT_FOUND` — operation not in registry (or Internal op called from wire)
- `FORBIDDEN` — access denied (insufficient scopes or unauthenticated)
- `INVALID_INPUT` — input doesn't match the operation's JSON Schema
- `INTERNAL` — handler error
- `INTERNAL` — handler error, panic, connection failure
- `TIMEOUT` — request timed out (retryable: true)
Operations may also declare **operation-level domain codes** in their `error_schemas` (ADR-023) — e.g., `FILE_NOT_FOUND`, `RATE_LIMITED`, `INSUFFICIENT_CREDITS`. These are emitted by handlers and carry a `details` payload conforming to the declared `ErrorDefinition.schema`. Protocol-level errors omit `details` or carry protocol-specific context (e.g., the operation name for `NOT_FOUND`).
Fields:
- `code` — the error code (protocol-level or operation-level)
- `message` — human-readable error message. For logging and debugging, not for programmatic handling. Clients should switch on `code`, not parse `message`.
- `retryable` — whether the caller should retry. `true` for transient failures, `false` for permanent ones.
- `details` — optional. When the code matches a declared `ErrorDefinition`, `details` conforms to that definition's schema. This is the typed error payload — it makes errors structured instead of string-matched. See ADR-023.
New error codes may be added in future versions. Clients should treat unknown error codes as `INTERNAL` with `retryable: false`.
### Protocol Operations
@@ -304,13 +313,14 @@ pub struct ResponseEnvelope {
}
pub struct CallError {
pub code: String,
pub message: String,
pub code: String, // protocol-level (NOT_FOUND, FORBIDDEN, ...) or operation-level (ADR-023)
pub message: String, // human-readable, for logging — not for programmatic handling
pub retryable: bool,
pub details: Option<Value>, // typed error payload, conforms to ErrorDefinition.schema (ADR-023)
}
```
Local dispatch produces `ResponseEnvelope` with no serialization overhead. The `CallAdapter` converts `ResponseEnvelope` to `EventEnvelope` for the wire.
Local dispatch produces `ResponseEnvelope` with no serialization overhead. The `CallAdapter` converts `ResponseEnvelope` to `EventEnvelope` for the wire. When a handler returns a `CallError` whose `code` matches a declared `ErrorDefinition`, the `details` field carries the typed error payload. See ADR-023.
### Connection and Stream Lifecycle
@@ -356,6 +366,7 @@ Handlers clean up resources when their call is cancelled (in Rust, the future is
| Abort cascade for nested calls | [ADR-016](../../decisions/016-abort-cascade-for-nested-calls.md) | `call.aborted` cascades to descendants; default `abort-dependents`, `continue-running` opt-in |
| Call protocol client and adapter contract | [ADR-017](../../decisions/017-call-protocol-client-and-adapter-contract.md) | `CallClient` opens connections; `from_call` imports remote ops; connection direction independent of call direction |
| Handler registration, provenance, and composition authority | [ADR-022](../../decisions/022-handler-registration-provenance-and-composition-authority.md) | Registration bundle carries provenance, composition authority, scoped env, capabilities; dispatch path reads from bundle |
| Operation error schemas | [ADR-023](../../decisions/023-operation-error-schemas.md) | Operations declare domain errors; `call.error` carries typed `details` |
## Open Questions

View File

@@ -37,6 +37,7 @@ pub struct OperationSpec {
pub visibility: Visibility, // External (wire-callable) or Internal (composition-only)
pub input_schema: Value, // JSON Schema for input
pub output_schema: Value, // JSON Schema for output
pub error_schemas: Vec<ErrorDefinition>, // Declared domain errors (ADR-023)
pub access_control: AccessControl,
}
@@ -50,6 +51,14 @@ pub enum Visibility {
External, // Callable from the wire (call.requested from a client)
Internal, // Composition-only (env.invoke from a handler)
}
/// A declared operation-level error. See ADR-023.
pub struct ErrorDefinition {
pub code: String, // e.g., "FILE_NOT_FOUND", "RATE_LIMITED"
pub description: String, // Human-readable description
pub schema: Value, // JSON Schema for the error detail payload
pub http_status: Option<u16>, // HTTP status for adapter projection (from_openapi/to_openapi)
}
```
Operation names use slash-based paths without a leading slash, aligned with URL path conventions: `fs/readFile`, `agent/chat`, `services/list`. The leading slash is added when needed for display (`spec.path()` returns `/fs/readFile`) and for wire format (the `call.requested` payload uses `/fs/readFile`). See OQ-13 for the path format decision (single-node `service/op` vs head/worker `node/service/op`).
@@ -94,6 +103,8 @@ A handler receives:
And returns a `ResponseEnvelope` containing the result or an error. `ResponseEnvelope` is defined in [call-protocol.md](call-protocol.md#responseenvelope) — it carries the request ID and a `Result<Value, CallError>`. Local dispatch produces it with no serialization overhead; the `CallAdapter` converts it to `EventEnvelope` for the wire.
When a handler returns an error, the `CallError.code` is matched against the operation's declared `error_schemas` (ADR-023). If the code matches a declared `ErrorDefinition`, the `call.error` event carries that code and the error's detail payload. If it doesn't match, the `call.error` carries `INTERNAL`. This is how handler failures become typed errors on the wire instead of string-matched messages.
### OperationContext
```rust
@@ -272,7 +283,7 @@ These are read-only — no admin operations are exposed through the call protoco
}
```
`services/schema` accepts `{ "name": "fs/readFile" }` and returns the full `OperationSpec` including input/output JSON Schemas.
`services/schema` accepts `{ "name": "fs/readFile" }` and returns the full `OperationSpec` including input/output JSON Schemas and declared `error_schemas` (ADR-023). This enables client code generation: a client reading the schema can produce typed error enums instead of generic error handling.
### irpc Integration
@@ -392,6 +403,7 @@ The `Capabilities` type holds non-serializable, zeroized secret material. It doe
| Secret material flow and capability injection | [ADR-014](../../decisions/014-secret-material-flow-and-capability-injection.md) | Capabilities carry outbound credentials; call protocol carries no secret material |
| Privilege model and authority context | [ADR-015](../../decisions/015-privilege-model-and-authority-context.md) | `internal` = authority switch not ACL skip; External/Internal visibility; composition authority + scoped env |
| Handler registration, provenance, and composition authority | [ADR-022](../../decisions/022-handler-registration-provenance-and-composition-authority.md) | Registration bundle carries provenance, composition authority, scoped env, capabilities; dispatch path reads from bundle |
| Operation error schemas | [ADR-023](../../decisions/023-operation-error-schemas.md) | Operations declare domain errors; `call.error` carries typed `details`; adapter fidelity for `from_openapi`/`to_openapi` |
## Open Questions

View File

@@ -1,6 +1,6 @@
---
status: draft
last_updated: 2026-06-19
last_updated: 2026-06-20
---
# Encryption
@@ -194,7 +194,7 @@ pub enum EncryptionError {
Encryption(String), // encryption failed
Decryption(String), // decryption failed (wrong key, tampered data, bad UTF-8)
Decoding(String), // base64 decoding failed
KeyVersionMismatch { expected: u32, actual: u32 }, // reserved for future rotation (OQ-22)
KeyVersionMismatch { expected: u32, actual: u32 }, // unused — see note below
}
```
@@ -202,12 +202,17 @@ Decryption failures are intentionally generic — they don't distinguish
"wrong key" from "tampered data" from "corrupted storage" to avoid
leaking information to an attacker.
`KeyVersionMismatch` is **defined but unused in v2** — neither `encrypt()`
nor `decrypt()` returns it. It is reserved for future key rotation
enforcement (OQ-22), where the vault may enforce version matching before
decrypting. In v2, the `key_version` is stamped onto `EncryptedData` and
`EncryptionKey` for forward compatibility but does not gate decryption. An
implementer should not expect this variant to fire in v2.
`KeyVersionMismatch` is **defined but unused.** ADR-021 implements key
rotation via version-indexed derivation paths — `decrypt` derives the key
at the path indicated by `encrypted.key_version`, so there is no
version-mismatch to detect at the error level (every blob carries its own
version, and every version has a derivable key). This variant predates
ADR-021's rotation mechanism and is retained in the enum for source
compatibility but is not emitted by any code path in v2. An implementer
should not wire it up or expect it to fire. If a future use case requires
enforcing version constraints (e.g., "refuse to decrypt blobs older than
v3"), this variant could be repurposed — but that would be a new decision,
not part of ADR-021's rotation scheme.
## Design Decisions

View File

@@ -1,6 +1,6 @@
---
status: draft
last_updated: 2026-06-19
last_updated: 2026-06-20
---
# Service
@@ -126,6 +126,23 @@ Derive an AES-256-GCM encryption key at the given path. Same cache
behavior as `derive_ed25519`. Returns a `DerivedKey` with
`KeyType::Aes256Gcm`.
### derive_encryption_key_for_version(version) → EncryptionKey
```rust
pub fn derive_encryption_key_for_version(&self, version: u32) -> Result<EncryptionKey, VaultServiceError>;
```
Derive the encryption key for a specific key version. Maps the version to
its derivation path via `encryption_path_for_version(version)` (ADR-021):
v2 → `m/74'/2'/0'/0'`, v3 → `m/74'/2'/0'/1'`, etc. Cached by path. This is
the version-aware method that `decrypt` uses to select the correct key for
each blob — see [encryption.md](encryption.md) and ADR-021.
`derive_encryption_key(path)` (above) remains as the path-based API for
deriving at arbitrary paths. `derive_encryption_key_for_version(version)`
is the version-aware API used by `encrypt` and `decrypt`. The two share
the same cache (keyed by derivation path).
### derive_ethereum_key(path) → DerivedKey (feature-gated)
```rust
@@ -173,10 +190,10 @@ pub fn decrypt(&self, encrypted: &EncryptedData) -> Result<String, VaultServiceE
```
Decrypt an `EncryptedData` blob. Derives (and caches) the encryption key
at the version-indexed path indicated by `encrypted.key_version` (ADR-021).
Each version maps to a distinct path (`m/74'/2'/0'/{version-2}'`), so old
and new keys can coexist during partial rotation. See
[encryption.md](encryption.md).
at the version-indexed path indicated by `encrypted.key_version` via
`derive_encryption_key_for_version` (ADR-021). Each version maps to a
distinct path (`m/74'/2'/0'/{version-2}'`), so old and new keys can
coexist during partial rotation. See [encryption.md](encryption.md).
### rotate(encrypted, to_version) → EncryptedData