tasks: decompose vault, core, call crates into 28 atomic implementation tasks
Break down the three initial crates (alknet-vault, alknet-core, alknet-call) into dependency-ordered task files for implementation agents. Structure: - tasks/vault/ (10 tasks) — drift fixes from ADR-025/026 refactor, review, spec sync. Vault is independent and can run fully in parallel with core/call. - tasks/core/ (6 tasks) — crate init, core types, config, auth, endpoint, review. Core is foundational; call depends on it. - tasks/call/ (12 tasks) — split into registry/ and protocol/ topic subdirs reflecting the two subsystems. CallAdapter is the merge point. Key decisions: - Drifts 3+9+10 grouped as one task (key-versioning-rotation) — the complete ADR-021 rotation feature that doesn't compile in pieces - Reviews injected at end of each crate phase (vault, core, call) - Vault spec-sync task removes the drift table and bumps doc status to stable - ACME deferred in core/endpoint (noted as TODO; X509 manual certs for now) - OperationEnv kept as a trait (load-bearing for ADR-024 layering) Validated: 28 tasks, no cycles, 11 generations of parallel work. Critical path runs through call (11 tasks). Vault completes by generation 4. 6 high-risk tasks identified (21%): irpc-removal, endpoint, operation-context, operation-env, call-adapter, abort-cascade.
This commit is contained in:
86
tasks/vault/poisoned-lock-recovery.md
Normal file
86
tasks/vault/poisoned-lock-recovery.md
Normal file
@@ -0,0 +1,86 @@
|
||||
---
|
||||
id: vault/poisoned-lock-recovery
|
||||
name: Replace unwrap() on RwLock acquisition with poisoned-lock recovery via unwrap_or_else
|
||||
status: pending
|
||||
depends_on: [vault/irpc-removal]
|
||||
scope: narrow
|
||||
risk: low
|
||||
impact: component
|
||||
level: implementation
|
||||
---
|
||||
|
||||
## Description
|
||||
|
||||
Fix drift item #2: `VaultServiceHandle` methods use `unwrap()` on every
|
||||
`RwLock` acquisition (read and write locks). A poisoned lock (caused by a panic
|
||||
while the lock was held) would brick the vault for all subsequent operations.
|
||||
Replace with `unwrap_or_else(|e| e.into_inner())` to recover the inner data from
|
||||
a poisoned lock, or explicit error propagation where appropriate.
|
||||
|
||||
### Current state
|
||||
|
||||
`service.rs` uses `.unwrap()` on `RwLock` read and write acquisitions at
|
||||
approximately lines 142, 161, 182, 191, 196, 227, 264, 307, 340, 367 (line
|
||||
numbers may shift after the irpc removal task — match by pattern: every
|
||||
`.read().unwrap()` and `.write().unwrap()` call in `VaultServiceHandle` method
|
||||
bodies).
|
||||
|
||||
### Target state
|
||||
|
||||
For read locks:
|
||||
```rust
|
||||
let inner = self.inner.read().unwrap_or_else(|e| e.into_inner());
|
||||
```
|
||||
|
||||
For write locks:
|
||||
```rust
|
||||
let mut inner = self.inner.write().unwrap_or_else(|e| e.into_inner());
|
||||
```
|
||||
|
||||
The rationale: a poisoned lock means a panic occurred while the lock was held.
|
||||
The data may be in an inconsistent state, but bricking the vault (panicking on
|
||||
every subsequent call) is worse than attempting to continue. The vault's
|
||||
operations are idempotent reads (derive) and state transitions (lock/unlock) —
|
||||
recovering the inner data and continuing is the pragmatic choice. If the data
|
||||
is truly corrupted, the next operation will fail with a normal error, not a
|
||||
panic.
|
||||
|
||||
### No unwrap() or expect() outside tests
|
||||
|
||||
This is a general constraint for the vault: no `unwrap()` or `expect()` outside
|
||||
test code. After fixing the RwLock acquisitions, audit the rest of `service.rs`
|
||||
for any remaining `unwrap()`/`expect()` calls and replace them with proper error
|
||||
propagation (`?` operator, explicit `Result` returns, or
|
||||
`unwrap_or_else(|e| e.into_inner())` for lock recovery).
|
||||
|
||||
### Scope
|
||||
|
||||
This task touches `service.rs` only. It depends on the irpc removal task (drift
|
||||
#4) because that task restructures `service.rs` — doing this first would cause
|
||||
merge conflicts.
|
||||
|
||||
## Acceptance Criteria
|
||||
|
||||
- [ ] All `.read().unwrap()` calls in `VaultServiceHandle` methods replaced with `.read().unwrap_or_else(|e| e.into_inner())`
|
||||
- [ ] All `.write().unwrap()` calls in `VaultServiceHandle` methods replaced with `.write().unwrap_or_else(|e| e.into_inner())`
|
||||
- [ ] No `unwrap()` or `expect()` calls remain in `service.rs` outside of test code
|
||||
- [ ] Unit test: vault remains usable after a simulated panic (poison the lock, verify next call recovers)
|
||||
- [ ] `cargo test` succeeds
|
||||
- [ ] `cargo clippy` succeeds with no warnings
|
||||
|
||||
## References
|
||||
|
||||
- docs/architecture/crates/vault/README.md — Known Source Drift table item #2
|
||||
- docs/architecture/crates/vault/service.md — Security Constraints: No unwrap() outside tests
|
||||
- docs/architecture/decisions/025-vault-local-only-dispatch.md — ADR-025
|
||||
|
||||
## Notes
|
||||
|
||||
> A panic in one vault operation must not brick the vault for all other
|
||||
> operations. The poisoned-lock recovery via `unwrap_or_else(|e| e.into_inner())`
|
||||
> is the standard Rust pattern for this. This task depends on the irpc removal
|
||||
> task because both modify `service.rs` heavily.
|
||||
|
||||
## Summary
|
||||
|
||||
> To be filled on completion
|
||||
Reference in New Issue
Block a user