tasks: decompose vault, core, call crates into 28 atomic implementation tasks

Break down the three initial crates (alknet-vault, alknet-core, alknet-call)
into dependency-ordered task files for implementation agents.

Structure:
- tasks/vault/ (10 tasks) — drift fixes from ADR-025/026 refactor, review,
  spec sync. Vault is independent and can run fully in parallel with core/call.
- tasks/core/ (6 tasks) — crate init, core types, config, auth, endpoint,
  review. Core is foundational; call depends on it.
- tasks/call/ (12 tasks) — split into registry/ and protocol/ topic subdirs
  reflecting the two subsystems. CallAdapter is the merge point.

Key decisions:
- Drifts 3+9+10 grouped as one task (key-versioning-rotation) — the complete
  ADR-021 rotation feature that doesn't compile in pieces
- Reviews injected at end of each crate phase (vault, core, call)
- Vault spec-sync task removes the drift table and bumps doc status to stable
- ACME deferred in core/endpoint (noted as TODO; X509 manual certs for now)
- OperationEnv kept as a trait (load-bearing for ADR-024 layering)

Validated: 28 tasks, no cycles, 11 generations of parallel work.
Critical path runs through call (11 tasks). Vault completes by generation 4.
6 high-risk tasks identified (21%): irpc-removal, endpoint, operation-context,
operation-env, call-adapter, abort-cascade.
This commit is contained in:
2026-06-23 12:41:47 +00:00
parent 2e34590522
commit 098fd8b9b9
28 changed files with 4271 additions and 0 deletions

View File

@@ -0,0 +1,86 @@
---
id: vault/poisoned-lock-recovery
name: Replace unwrap() on RwLock acquisition with poisoned-lock recovery via unwrap_or_else
status: pending
depends_on: [vault/irpc-removal]
scope: narrow
risk: low
impact: component
level: implementation
---
## Description
Fix drift item #2: `VaultServiceHandle` methods use `unwrap()` on every
`RwLock` acquisition (read and write locks). A poisoned lock (caused by a panic
while the lock was held) would brick the vault for all subsequent operations.
Replace with `unwrap_or_else(|e| e.into_inner())` to recover the inner data from
a poisoned lock, or explicit error propagation where appropriate.
### Current state
`service.rs` uses `.unwrap()` on `RwLock` read and write acquisitions at
approximately lines 142, 161, 182, 191, 196, 227, 264, 307, 340, 367 (line
numbers may shift after the irpc removal task — match by pattern: every
`.read().unwrap()` and `.write().unwrap()` call in `VaultServiceHandle` method
bodies).
### Target state
For read locks:
```rust
let inner = self.inner.read().unwrap_or_else(|e| e.into_inner());
```
For write locks:
```rust
let mut inner = self.inner.write().unwrap_or_else(|e| e.into_inner());
```
The rationale: a poisoned lock means a panic occurred while the lock was held.
The data may be in an inconsistent state, but bricking the vault (panicking on
every subsequent call) is worse than attempting to continue. The vault's
operations are idempotent reads (derive) and state transitions (lock/unlock) —
recovering the inner data and continuing is the pragmatic choice. If the data
is truly corrupted, the next operation will fail with a normal error, not a
panic.
### No unwrap() or expect() outside tests
This is a general constraint for the vault: no `unwrap()` or `expect()` outside
test code. After fixing the RwLock acquisitions, audit the rest of `service.rs`
for any remaining `unwrap()`/`expect()` calls and replace them with proper error
propagation (`?` operator, explicit `Result` returns, or
`unwrap_or_else(|e| e.into_inner())` for lock recovery).
### Scope
This task touches `service.rs` only. It depends on the irpc removal task (drift
#4) because that task restructures `service.rs` — doing this first would cause
merge conflicts.
## Acceptance Criteria
- [ ] All `.read().unwrap()` calls in `VaultServiceHandle` methods replaced with `.read().unwrap_or_else(|e| e.into_inner())`
- [ ] All `.write().unwrap()` calls in `VaultServiceHandle` methods replaced with `.write().unwrap_or_else(|e| e.into_inner())`
- [ ] No `unwrap()` or `expect()` calls remain in `service.rs` outside of test code
- [ ] Unit test: vault remains usable after a simulated panic (poison the lock, verify next call recovers)
- [ ] `cargo test` succeeds
- [ ] `cargo clippy` succeeds with no warnings
## References
- docs/architecture/crates/vault/README.md — Known Source Drift table item #2
- docs/architecture/crates/vault/service.md — Security Constraints: No unwrap() outside tests
- docs/architecture/decisions/025-vault-local-only-dispatch.md — ADR-025
## Notes
> A panic in one vault operation must not brick the vault for all other
> operations. The poisoned-lock recovery via `unwrap_or_else(|e| e.into_inner())`
> is the standard Rust pattern for this. This task depends on the irpc removal
> task because both modify `service.rs` heavily.
## Summary
> To be filled on completion