diff --git a/docs/architecture/README.md b/docs/architecture/README.md index f902f3d..9bb1ba3 100644 --- a/docs/architecture/README.md +++ b/docs/architecture/README.md @@ -7,7 +7,7 @@ last_updated: 2026-06-20 ## Current State -**Pre-implementation.** The project has completed a pivot from a three-layer model to an ALPN-as-service model. The greenfield workspace contains only `alknet-vault` (stable — implementation exists) and research/reference material. Foundational ADRs (001–022) are in place, including the BiStream type definition (ADR-007), vault integration (ADR-008), ALPN router/endpoint (ADR-010), AuthContext structure (ADR-011), call protocol stream model (ADR-012), Rust as canonical implementation language (ADR-013), secret material flow with capability injection (ADR-014), privilege model with authority context (ADR-015), abort cascade for nested calls (ADR-016), call protocol client and adapter contract (ADR-017), vault standalone crate (ADR-018), vault assembly-layer-only access (ADR-019), HD derivation for encryption keys (ADR-020), key rotation via version-indexed paths (ADR-021), and handler registration, provenance, and composition authority (ADR-022). The alknet-core, alknet-call, and alknet-vault crate specs are in draft. +**Pre-implementation.** The project has completed a pivot from a three-layer model to an ALPN-as-service model. The greenfield workspace contains only `alknet-vault` (stable — implementation exists) and research/reference material. Foundational ADRs (001–023) are in place, including the BiStream type definition (ADR-007), vault integration (ADR-008), ALPN router/endpoint (ADR-010), AuthContext structure (ADR-011), call protocol stream model (ADR-012), Rust as canonical implementation language (ADR-013), secret material flow with capability injection (ADR-014), privilege model with authority context (ADR-015), abort cascade for nested calls (ADR-016), call protocol client and adapter contract (ADR-017), vault standalone crate (ADR-018), vault assembly-layer-only access (ADR-019), HD derivation for encryption keys (ADR-020), key rotation via version-indexed paths (ADR-021), handler registration, provenance, and composition authority (ADR-022), and operation error schemas (ADR-023). The alknet-core, alknet-call, and alknet-vault crate specs are in draft. **Next step**: Review the vault spec documents, then begin implementation. All open questions for the core and call crates are resolved; the vault crate has one deferred OQ (OQ-21, remote vault administration) that does not block implementation. @@ -57,6 +57,7 @@ last_updated: 2026-06-20 | [020](decisions/020-hd-derivation-for-encryption-keys.md) | HD Derivation for Encryption Keys | Accepted | | [021](decisions/021-key-rotation-via-version-indexed-paths.md) | Key Rotation via Version-Indexed Paths | Accepted | | [022](decisions/022-handler-registration-provenance-and-composition-authority.md) | Handler Registration, Provenance, and Composition Authority | Proposed | +| [023](decisions/023-operation-error-schemas.md) | Operation Error Schemas | Proposed | ## Open Questions @@ -85,6 +86,7 @@ See [open-questions.md](open-questions.md) for the full tracker. - **OQ-20**: Encryption key derivation — HD derivation from BIP39 seed, not PBKDF2; salt field unused in v2 (wire-format compat) (ADR-020) - **OQ-22**: Key rotation — version-indexed derivation paths; `rotate` method re-encrypts (ADR-021) - **OQ-23**: Handler identity registration path — registration bundle with provenance, composition authority, scoped env, capabilities (ADR-022) +- **OQ-24**: Operation error schemas — declared domain errors with typed `details` payload; adapter fidelity for `from_openapi`/`to_openapi` (ADR-023) **Deferred (not active):** - **OQ-09**: WASM target boundaries — design constraint, not deliverable diff --git a/docs/architecture/crates/call/README.md b/docs/architecture/crates/call/README.md index 963caa2..8e599c3 100644 --- a/docs/architecture/crates/call/README.md +++ b/docs/architecture/crates/call/README.md @@ -33,6 +33,7 @@ Structured RPC over QUIC: operations, request/response, streaming subscriptions, | [016](../../decisions/016-abort-cascade-for-nested-calls.md) | Abort Cascade for Nested Calls | `call.aborted` cascades to descendants; default `abort-dependents`, `continue-running` opt-in | | [017](../../decisions/017-call-protocol-client-and-adapter-contract.md) | Call Protocol Client and Adapter Contract | `CallClient` opens connections; `from_call` imports remote ops; connection direction independent of call direction | | [022](../../decisions/022-handler-registration-provenance-and-composition-authority.md) | Handler Registration, Provenance, and Composition Authority | Registration bundle carries provenance, composition authority, scoped env, capabilities | +| [023](../../decisions/023-operation-error-schemas.md) | Operation Error Schemas | Operations declare domain errors; `call.error` carries typed `details`; adapter fidelity | ## Relevant Open Questions diff --git a/docs/architecture/crates/call/call-protocol.md b/docs/architecture/crates/call/call-protocol.md index d2d7f83..729cc08 100644 --- a/docs/architecture/crates/call/call-protocol.md +++ b/docs/architecture/crates/call/call-protocol.md @@ -1,6 +1,6 @@ --- status: draft -last_updated: 2026-06-21 +last_updated: 2026-06-22 --- # Call Protocol @@ -127,19 +127,28 @@ The `payload` of a `call.requested` event has this shape: ```json { - "code": "NOT_FOUND", - "message": "operation not found: /fs/readFile", - "retryable": false + "code": "FILE_NOT_FOUND", + "message": "file not found: /etc/nonexistent", + "retryable": false, + "details": { "path": "/etc/nonexistent", "errno": 2 } } ``` -Error codes use an extensible string enum. The protocol defines the following codes: -- `NOT_FOUND` — operation not in registry +Error codes use an extensible string enum. The protocol defines the following **protocol-level codes** (emitted by the dispatch machinery, not by handlers): +- `NOT_FOUND` — operation not in registry (or Internal op called from wire) - `FORBIDDEN` — access denied (insufficient scopes or unauthenticated) - `INVALID_INPUT` — input doesn't match the operation's JSON Schema -- `INTERNAL` — handler error +- `INTERNAL` — handler error, panic, connection failure - `TIMEOUT` — request timed out (retryable: true) +Operations may also declare **operation-level domain codes** in their `error_schemas` (ADR-023) — e.g., `FILE_NOT_FOUND`, `RATE_LIMITED`, `INSUFFICIENT_CREDITS`. These are emitted by handlers and carry a `details` payload conforming to the declared `ErrorDefinition.schema`. Protocol-level errors omit `details` or carry protocol-specific context (e.g., the operation name for `NOT_FOUND`). + +Fields: +- `code` — the error code (protocol-level or operation-level) +- `message` — human-readable error message. For logging and debugging, not for programmatic handling. Clients should switch on `code`, not parse `message`. +- `retryable` — whether the caller should retry. `true` for transient failures, `false` for permanent ones. +- `details` — optional. When the code matches a declared `ErrorDefinition`, `details` conforms to that definition's schema. This is the typed error payload — it makes errors structured instead of string-matched. See ADR-023. + New error codes may be added in future versions. Clients should treat unknown error codes as `INTERNAL` with `retryable: false`. ### Protocol Operations @@ -304,13 +313,14 @@ pub struct ResponseEnvelope { } pub struct CallError { - pub code: String, - pub message: String, + pub code: String, // protocol-level (NOT_FOUND, FORBIDDEN, ...) or operation-level (ADR-023) + pub message: String, // human-readable, for logging — not for programmatic handling pub retryable: bool, + pub details: Option, // typed error payload, conforms to ErrorDefinition.schema (ADR-023) } ``` -Local dispatch produces `ResponseEnvelope` with no serialization overhead. The `CallAdapter` converts `ResponseEnvelope` to `EventEnvelope` for the wire. +Local dispatch produces `ResponseEnvelope` with no serialization overhead. The `CallAdapter` converts `ResponseEnvelope` to `EventEnvelope` for the wire. When a handler returns a `CallError` whose `code` matches a declared `ErrorDefinition`, the `details` field carries the typed error payload. See ADR-023. ### Connection and Stream Lifecycle @@ -356,6 +366,7 @@ Handlers clean up resources when their call is cancelled (in Rust, the future is | Abort cascade for nested calls | [ADR-016](../../decisions/016-abort-cascade-for-nested-calls.md) | `call.aborted` cascades to descendants; default `abort-dependents`, `continue-running` opt-in | | Call protocol client and adapter contract | [ADR-017](../../decisions/017-call-protocol-client-and-adapter-contract.md) | `CallClient` opens connections; `from_call` imports remote ops; connection direction independent of call direction | | Handler registration, provenance, and composition authority | [ADR-022](../../decisions/022-handler-registration-provenance-and-composition-authority.md) | Registration bundle carries provenance, composition authority, scoped env, capabilities; dispatch path reads from bundle | +| Operation error schemas | [ADR-023](../../decisions/023-operation-error-schemas.md) | Operations declare domain errors; `call.error` carries typed `details` | ## Open Questions diff --git a/docs/architecture/crates/call/operation-registry.md b/docs/architecture/crates/call/operation-registry.md index 47db924..be7ad68 100644 --- a/docs/architecture/crates/call/operation-registry.md +++ b/docs/architecture/crates/call/operation-registry.md @@ -37,6 +37,7 @@ pub struct OperationSpec { pub visibility: Visibility, // External (wire-callable) or Internal (composition-only) pub input_schema: Value, // JSON Schema for input pub output_schema: Value, // JSON Schema for output + pub error_schemas: Vec, // Declared domain errors (ADR-023) pub access_control: AccessControl, } @@ -50,6 +51,14 @@ pub enum Visibility { External, // Callable from the wire (call.requested from a client) Internal, // Composition-only (env.invoke from a handler) } + +/// A declared operation-level error. See ADR-023. +pub struct ErrorDefinition { + pub code: String, // e.g., "FILE_NOT_FOUND", "RATE_LIMITED" + pub description: String, // Human-readable description + pub schema: Value, // JSON Schema for the error detail payload + pub http_status: Option, // HTTP status for adapter projection (from_openapi/to_openapi) +} ``` Operation names use slash-based paths without a leading slash, aligned with URL path conventions: `fs/readFile`, `agent/chat`, `services/list`. The leading slash is added when needed for display (`spec.path()` returns `/fs/readFile`) and for wire format (the `call.requested` payload uses `/fs/readFile`). See OQ-13 for the path format decision (single-node `service/op` vs head/worker `node/service/op`). @@ -94,6 +103,8 @@ A handler receives: And returns a `ResponseEnvelope` containing the result or an error. `ResponseEnvelope` is defined in [call-protocol.md](call-protocol.md#responseenvelope) — it carries the request ID and a `Result`. Local dispatch produces it with no serialization overhead; the `CallAdapter` converts it to `EventEnvelope` for the wire. +When a handler returns an error, the `CallError.code` is matched against the operation's declared `error_schemas` (ADR-023). If the code matches a declared `ErrorDefinition`, the `call.error` event carries that code and the error's detail payload. If it doesn't match, the `call.error` carries `INTERNAL`. This is how handler failures become typed errors on the wire instead of string-matched messages. + ### OperationContext ```rust @@ -272,7 +283,7 @@ These are read-only — no admin operations are exposed through the call protoco } ``` -`services/schema` accepts `{ "name": "fs/readFile" }` and returns the full `OperationSpec` including input/output JSON Schemas. +`services/schema` accepts `{ "name": "fs/readFile" }` and returns the full `OperationSpec` including input/output JSON Schemas and declared `error_schemas` (ADR-023). This enables client code generation: a client reading the schema can produce typed error enums instead of generic error handling. ### irpc Integration @@ -392,6 +403,7 @@ The `Capabilities` type holds non-serializable, zeroized secret material. It doe | Secret material flow and capability injection | [ADR-014](../../decisions/014-secret-material-flow-and-capability-injection.md) | Capabilities carry outbound credentials; call protocol carries no secret material | | Privilege model and authority context | [ADR-015](../../decisions/015-privilege-model-and-authority-context.md) | `internal` = authority switch not ACL skip; External/Internal visibility; composition authority + scoped env | | Handler registration, provenance, and composition authority | [ADR-022](../../decisions/022-handler-registration-provenance-and-composition-authority.md) | Registration bundle carries provenance, composition authority, scoped env, capabilities; dispatch path reads from bundle | +| Operation error schemas | [ADR-023](../../decisions/023-operation-error-schemas.md) | Operations declare domain errors; `call.error` carries typed `details`; adapter fidelity for `from_openapi`/`to_openapi` | ## Open Questions diff --git a/docs/architecture/crates/vault/encryption.md b/docs/architecture/crates/vault/encryption.md index a13a386..61a11f2 100644 --- a/docs/architecture/crates/vault/encryption.md +++ b/docs/architecture/crates/vault/encryption.md @@ -1,6 +1,6 @@ --- status: draft -last_updated: 2026-06-19 +last_updated: 2026-06-20 --- # Encryption @@ -194,7 +194,7 @@ pub enum EncryptionError { Encryption(String), // encryption failed Decryption(String), // decryption failed (wrong key, tampered data, bad UTF-8) Decoding(String), // base64 decoding failed - KeyVersionMismatch { expected: u32, actual: u32 }, // reserved for future rotation (OQ-22) + KeyVersionMismatch { expected: u32, actual: u32 }, // unused — see note below } ``` @@ -202,12 +202,17 @@ Decryption failures are intentionally generic — they don't distinguish "wrong key" from "tampered data" from "corrupted storage" to avoid leaking information to an attacker. -`KeyVersionMismatch` is **defined but unused in v2** — neither `encrypt()` -nor `decrypt()` returns it. It is reserved for future key rotation -enforcement (OQ-22), where the vault may enforce version matching before -decrypting. In v2, the `key_version` is stamped onto `EncryptedData` and -`EncryptionKey` for forward compatibility but does not gate decryption. An -implementer should not expect this variant to fire in v2. +`KeyVersionMismatch` is **defined but unused.** ADR-021 implements key +rotation via version-indexed derivation paths — `decrypt` derives the key +at the path indicated by `encrypted.key_version`, so there is no +version-mismatch to detect at the error level (every blob carries its own +version, and every version has a derivable key). This variant predates +ADR-021's rotation mechanism and is retained in the enum for source +compatibility but is not emitted by any code path in v2. An implementer +should not wire it up or expect it to fire. If a future use case requires +enforcing version constraints (e.g., "refuse to decrypt blobs older than +v3"), this variant could be repurposed — but that would be a new decision, +not part of ADR-021's rotation scheme. ## Design Decisions diff --git a/docs/architecture/crates/vault/service.md b/docs/architecture/crates/vault/service.md index ed67996..aa294b9 100644 --- a/docs/architecture/crates/vault/service.md +++ b/docs/architecture/crates/vault/service.md @@ -1,6 +1,6 @@ --- status: draft -last_updated: 2026-06-19 +last_updated: 2026-06-20 --- # Service @@ -126,6 +126,23 @@ Derive an AES-256-GCM encryption key at the given path. Same cache behavior as `derive_ed25519`. Returns a `DerivedKey` with `KeyType::Aes256Gcm`. +### derive_encryption_key_for_version(version) → EncryptionKey + +```rust +pub fn derive_encryption_key_for_version(&self, version: u32) -> Result; +``` + +Derive the encryption key for a specific key version. Maps the version to +its derivation path via `encryption_path_for_version(version)` (ADR-021): +v2 → `m/74'/2'/0'/0'`, v3 → `m/74'/2'/0'/1'`, etc. Cached by path. This is +the version-aware method that `decrypt` uses to select the correct key for +each blob — see [encryption.md](encryption.md) and ADR-021. + +`derive_encryption_key(path)` (above) remains as the path-based API for +deriving at arbitrary paths. `derive_encryption_key_for_version(version)` +is the version-aware API used by `encrypt` and `decrypt`. The two share +the same cache (keyed by derivation path). + ### derive_ethereum_key(path) → DerivedKey (feature-gated) ```rust @@ -173,10 +190,10 @@ pub fn decrypt(&self, encrypted: &EncryptedData) -> Result, // NEW — empty vec = no declared errors +} + +pub struct ErrorDefinition { + /// Machine-readable error code. e.g., "FILE_NOT_FOUND", "RATE_LIMITED", + /// "INSUFFICIENT_CREDITS". Distinct from the protocol-level codes + /// (NOT_FOUND, FORBIDDEN, etc.) — these are operation-level domain codes. + pub code: String, + + /// Human-readable description of when this error occurs. + pub description: String, + + /// JSON Schema for the error detail payload. The `call.error` event's + /// `details` field conforms to this schema when this error code is + /// returned. `Value` (serde_json::Value) carrying a JSON Schema, same + /// as input_schema/output_schema. + pub schema: Value, + + /// HTTP status code for adapter projection. `from_openapi` maps OpenAPI + /// response status codes to error definitions; `to_openapi` projects + /// error definitions back to response status codes. Optional — not all + /// error sources are HTTP-backed. + pub http_status: Option, +} +``` + +`error_schemas` is a `Vec`, not `Option>`. An +empty vec means "this operation declares no specific domain errors" (it may +still fail with protocol-level codes like `INTERNAL`). This avoids the +`None` vs `Some([])` ambiguity and matches the TypeScript reference's +optional-array convention. + +### 2. The `call.error` payload gains an optional `details` field + +```json +{ + "code": "FILE_NOT_FOUND", + "message": "file not found: /etc/nonexistent", + "retryable": false, + "details": { "path": "/etc/nonexistent", "errno": 2 } +} +``` + +- `code` — the error code. Either a protocol-level code (`NOT_FOUND`, + `FORBIDDEN`, `INVALID_INPUT`, `INTERNAL`, `TIMEOUT`) or an + operation-level domain code from `error_schemas` (e.g., + `FILE_NOT_FOUND`, `RATE_LIMITED`). +- `message` — human-readable error message. Unstructured — for logging and + debugging, not for programmatic handling. Clients should switch on + `code`, not parse `message`. +- `retryable` — whether the caller should retry. `true` for transient + failures (`TIMEOUT`, `RATE_LIMITED`), `false` for permanent ones + (`NOT_FOUND`, `FORBIDDEN`, `FILE_NOT_FOUND`). +- `details` — optional. When the error code matches a declared + `ErrorDefinition`, `details` conforms to that definition's `schema`. When + the error is protocol-level (`NOT_FOUND`, `FORBIDDEN`, etc.), `details` + is absent or carries protocol-specific context (e.g., the operation name + for `NOT_FOUND`). This field is the typed error payload — it's what + makes errors structured instead of string-matched. + +### 3. Protocol-level vs operation-level error codes + +The five existing codes are **protocol-level** — emitted by the dispatch +machinery, not by handlers: + +| Code | Emitted by | Meaning | +|------|-----------|---------| +| `NOT_FOUND` | Registry | Operation not registered (or Internal op called from wire) | +| `FORBIDDEN` | Registry / ACL | Caller lacks required scopes, or unauthenticated | +| `INVALID_INPUT` | Registry | Input doesn't match `input_schema` | +| `INTERNAL` | Registry / Adapter | Handler panic, unhandled error, connection failure | +| `TIMEOUT` | Adapter | Request timed out | + +Operation-level domain codes are emitted by **handlers** — the operation's +own logic determines what went wrong. They are declared in `error_schemas` +and appear in the `code` field of `call.error`. Examples: `FILE_NOT_FOUND`, +`PERMISSION_DENIED`, `RATE_LIMITED`, `INSUFFICIENT_CREDITS`, +`CONTEXT_OVERFLOW`. + +The two namespaces are distinct but share the `code` field. Clients +should handle protocol-level codes uniformly (they mean the same thing +regardless of operation) and operation-level codes per-operation (they +mean what the operation's `error_schemas` says they mean). Unknown codes +— whether a future protocol code or an undeclared operation code — should +be treated as `INTERNAL` with `retryable: false` (same as the current +guidance in call-protocol.md L143). + +### 4. Handler error mapping + +When a handler returns an error, the dispatch machinery maps it to a +`call.error` event. The mapping: + +1. If the handler returns a structured error with a `code` that matches a + declared `ErrorDefinition.code`, the `call.error` carries that code and + the error's detail payload (validated against the definition's `schema`). +2. If the handler returns a structured error with a `code` that doesn't + match any declared `ErrorDefinition`, the `call.error` carries + `INTERNAL` with the original code in `details`. This is an undeclared + error — the handler returned a typed error but didn't declare it. +3. If the handler returns an unstructured error (a string, a generic + `Error`, a panic), the `call.error` carries `INTERNAL` with + `retryable: false`. This is the current behavior for all handler + errors. + +The TypeScript `mapError()` function (error.ts L25–51) implements case 2 +and 3 by matching error messages against declared codes. The Rust +implementation can use a typed error return from the handler (`Result` where `CallError` carries a `code`), which is cleaner than +message-string matching — the handler returns a typed error, the registry +checks whether the code is declared, and the `call.error` is constructed +accordingly. + +### 5. `from_openapi` and `to_openapi` error fidelity + +`from_openapi` maps OpenAPI response status codes to `ErrorDefinition`s: + +```rust +// OpenAPI: 404: { schema: NotFoundError } +// → ErrorDefinition { code: "NOT_FOUND", http_status: Some(404), schema: NotFoundError } +``` + +The adapter maps the OpenAPI error schema to alknet's JSON Schema format +(same conversion as input/output schemas). The `http_status` field records +the original status code so `to_openapi` can project it back. + +`to_openapi` projects `error_schemas` back to OpenAPI response definitions: + +```yaml +responses: + '200': { schema: } + '404': { schema: } # where http_status = 404 + '429': { schema: } # where http_status = 429 +``` + +This makes the adapter contract from ADR-017 faithful on the error axis — +no silent dropping of error contracts. + +`from_mcp` and `to_mcp` follow the same pattern: MCP tool definitions carry +error descriptions, and the adapters map them to/from `ErrorDefinition`s. + +### 6. `services/schema` exposes error schemas + +`services/schema` returns the full `OperationSpec` including `error_schemas`. +A client querying `/services/schema` for `/fs/readFile` gets: + +```json +{ + "name": "fs/readFile", + "namespace": "fs", + "op_type": "query", + "input_schema": { ... }, + "output_schema": { ... }, + "error_schemas": [ + { "code": "FILE_NOT_FOUND", "description": "The file does not exist", + "schema": { "type": "object", "properties": { "path": { "type": "string" } } }, + "http_status": null }, + { "code": "PERMISSION_DENIED", "description": "OS-level read permission denied", + "schema": { "type": "object", "properties": { "path": { "type": "string" }, "errno": { "type": "integer" } } }, + "http_status": null } + ] +} +``` + +This enables client code generation: a TypeScript or Rust client generator +reading the schema can produce a typed `Result` +enum instead of a generic `Result`. + +## Consequences + +**Positive:** + +- Operations declare their failure modes. Clients get typed errors instead + of string-matched messages. This is the same type-safety property that + `input_schema` and `output_schema` provide, extended to the error axis. +- `from_openapi` and `to_openapi` are faithful on the error axis. An + OpenAPI operation's error contract is no longer silently dropped on + import or absent on export. The adapter contract from ADR-017 is now + complete. +- Client code generation can produce typed error enums. A client calling + `/fs/readFile` can match on `FILE_NOT_FOUND` vs `PERMISSION_DENIED` + instead of parsing `message` strings. +- The protocol-level vs operation-level distinction is explicit. Protocol + codes (`NOT_FOUND`, `FORBIDDEN`, etc.) mean the same thing regardless of + operation. Operation codes (`FILE_NOT_FOUND`, `RATE_LIMITED`) mean what + the operation declares. No conflation. +- The `details` field carries structured error context that conforms to a + schema — the error payload is typed, not a bare string. This enables + programmatic error handling (retry logic, user-facing error messages, + logging) without string parsing. +- The principle generalizes: making failure a typed, declared thing is the + same pattern as the SDD process's Safe Exit protocol (typed agent + failure) and the same pattern complex biological systems use (apoptosis + as a declared cell failure mode). The more components declare their + failure modes, the more robust the system. + +**Negative:** + +- `OperationSpec` gains a field. Operations that don't declare errors + (empty `error_schemas` vec) still work — the field is additive. But + operations that *should* declare errors and don't will produce `INTERNAL` + with `retryable: false`, same as today. The gap is visible but not + enforced — an operation can ship without error schemas and clients get + untyped errors for it. This is a documentation/guidance issue, not a + type-system issue. +- The `call.error` payload gains a `details` field. This is a wire-format + addition. Existing clients that only read `code` and `message` are + unaffected (they ignore `details`). New clients can read `details` for + structured error context. This is backward-compatible — `details` is + optional and absent for protocol-level errors. +- Handler error mapping adds a step to the dispatch path: the registry + checks whether the handler's error code matches a declared + `ErrorDefinition`. This is a `HashMap` lookup by code — negligible cost. +- The `http_status` field on `ErrorDefinition` is HTTP-specific. Operations + that aren't HTTP-backed (local, session, from_mcp) leave it as `None`. + This is a pragmatic choice: `from_openapi`/`to_openapi` need it, and it's + optional for everything else. A future non-HTTP adapter that needs a + different error projection field would add it — but `http_status` covers + the immediate use case. +- The TypeScript `mapError()` uses message-string matching to map thrown + errors to codes. The Rust implementation can do better (typed `CallError` + return from handlers), but this means the `Handler` type's return is + `Result` rather than `Result>`. + This is a cleaner API but a slight constraint on handler authors — they + return typed errors, not generic ones. Mitigated: `CallError::internal()` + is available for errors that don't fit a declared code. + +## Assumptions + +1. **Operations can enumerate their meaningful failure modes at + registration time.** If an operation has failure modes that are only + discoverable at runtime (e.g., a dynamic API that returns novel error + codes), those would be `INTERNAL` with `details` carrying the upstream + error. The assumption is that most operations have a knowable set of + domain errors. + +2. **Error codes are stable per operation.** Once an operation declares + `FILE_NOT_FOUND`, clients depend on that code. Changing it (renaming to + `NOT_FOUND_FILE`) is a breaking change for clients that match on it. + This is the same stability property as `input_schema` and + `output_schema` — the operation's interface is its contract. Adding new + error codes is additive (clients that don't know the new code treat it + as `INTERNAL`); removing or renaming codes is breaking. + +3. **Protocol-level codes are distinct from operation-level codes.** If an + operation declares a code that collides with a protocol code (e.g., an + operation declares `NOT_FOUND` as a domain error), the protocol code + takes precedence in the dispatch machinery (the registry's `NOT_FOUND` + for "operation not registered" is emitted before the handler runs). The + assumption is that operations use domain-specific codes (`FILE_NOT_FOUND`) + rather than reusing protocol codes (`NOT_FOUND`). This is a naming + convention, not a type-system enforcement. + +4. **`details` is optional and backward-compatible.** Existing clients that + ignore `details` continue to work. New clients read `details` for + structured context. The wire format addition is additive. + +## References + +- ADR-017: Call protocol client and adapter contract (adapter fidelity — + this ADR makes `from_openapi`/`to_openapi` faithful on the error axis) +- ADR-014: Secret material flow (the `details` field must not carry secret + material — same constraint as `metadata`) +- ADR-015: Privilege model (the `FORBIDDEN` protocol code covers ACL + denial; operation-level `PERMISSION_DENIED` is a distinct domain error + for OS-level permission issues) +- docs/reviews/001-pre-implementation-architecture-sanity-check.md + (finding C5, which this ADR resolves) +- docs/sdd_process.md L19, L423 (Safe Exit protocol — the general principle + of making failure typed and declared) +- TypeScript reference: `/workspace/@alkdev/operations/src/types.ts` + L38–47 (`ErrorDefinitionSchema`), L94, L112 (`errorSchemas` on + `OperationSpec`), `error.ts` L25–51 (`mapError`) \ No newline at end of file diff --git a/docs/architecture/open-questions.md b/docs/architecture/open-questions.md index 0e47c16..ddf420e 100644 --- a/docs/architecture/open-questions.md +++ b/docs/architecture/open-questions.md @@ -300,4 +300,13 @@ These questions are acknowledged but not active. They will be promoted to open w - **Door type**: One-way (security model), two-way (bundle shape) - **Priority**: high - **Resolution**: ADR-015 said handler identity was "set at registration by the assembly layer" but the registration API (`register(spec, handler)`) had no place for it — meaning every internal call would check ACL against `None`, reproducing the escalation gap ADR-015 was written to close. ADR-022 resolves this with a registration bundle (`HandlerRegistration`) carrying `provenance`, `composition_authority` (replacing `handler_identity: Identity` — it's a declared authority bundle, not a peer identity), `scoped_env`, and `capabilities`. The dispatch path (`build_root_context` and `OperationEnv::invoke()`) reads from the bundle. Provenance determines which ops can compose: only `Local` and `Session` get composition authority; leaves (`FromOpenAPI`, `FromMCP`, `FromCall`) get `None` — they don't compose, so they don't need it. Capabilities are per-request on `OperationContext`, populated from the bundle (resolving the closure-capture vs context ambiguity). The kernel/user analogy: user's authority checked once at the External gate; handler's composition authority used for all composition inside; scoped env bounds reachability. No intersection — the user's authority does not limit internal calls. See ADR-022. -- **Cross-references**: ADR-014, ADR-015, ADR-022, docs/reviews/001-pre-implementation-architecture-sanity-check.md (C1–C4), [operation-registry.md](crates/call/operation-registry.md), [call-protocol.md](crates/call/call-protocol.md) \ No newline at end of file +- **Cross-references**: ADR-014, ADR-015, ADR-022, docs/reviews/001-pre-implementation-architecture-sanity-check.md (C1–C4), [operation-registry.md](crates/call/operation-registry.md), [call-protocol.md](crates/call/call-protocol.md) + +### OQ-24: Operation Error Schemas + +- **Origin**: [operation-registry.md](crates/call/operation-registry.md), [call-protocol.md](crates/call/call-protocol.md), ADR-017 +- **Status**: resolved +- **Door type**: One-way (wire format), two-way (mapping mechanism) +- **Priority**: high +- **Resolution**: `OperationSpec` gains `error_schemas: Vec` where each `ErrorDefinition` carries a `code`, `description`, `schema` (JSON Schema for the error detail payload), and optional `http_status` (for adapter projection). The `call.error` payload gains an optional `details` field carrying the typed error payload. Protocol-level codes (`NOT_FOUND`, `FORBIDDEN`, `INVALID_INPUT`, `INTERNAL`, `TIMEOUT`) are distinct from operation-level domain codes (`FILE_NOT_FOUND`, `RATE_LIMITED`, etc.) — protocol codes are emitted by the dispatch machinery, operation codes by handlers. `from_openapi`/`to_openapi` map OpenAPI response status codes to/from `ErrorDefinition`s, making the adapter contract from ADR-017 faithful on the error axis. `services/schema` exposes `error_schemas` for client code generation. See ADR-023. +- **Cross-references**: ADR-017, ADR-023, docs/reviews/001-pre-implementation-architecture-sanity-check.md (C5), [operation-registry.md](crates/call/operation-registry.md), [call-protocol.md](crates/call/call-protocol.md) \ No newline at end of file diff --git a/docs/architecture/overview.md b/docs/architecture/overview.md index 232854c..809a23a 100644 --- a/docs/architecture/overview.md +++ b/docs/architecture/overview.md @@ -213,6 +213,7 @@ All design decisions are documented as ADRs in [decisions/](decisions/). | [020](decisions/020-hd-derivation-for-encryption-keys.md) | HD Derivation for Encryption Keys | SLIP-0010 derivation from seed, not PBKDF2; salt field unused in v2 | | [021](decisions/021-key-rotation-via-version-indexed-paths.md) | Key Rotation via Version-Indexed Paths | Version-indexed derivation paths; `rotate` re-encrypts between versions | | [022](decisions/022-handler-registration-provenance-and-composition-authority.md) | Handler Registration, Provenance, and Composition Authority | Registration bundle carries provenance, composition authority, scoped env, capabilities; dispatch path reads from bundle | +| [023](decisions/023-operation-error-schemas.md) | Operation Error Schemas | Operations declare domain errors; `call.error` carries typed `details`; adapter fidelity for `from_openapi`/`to_openapi` | ## Open Questions