Break down the three initial crates (alknet-vault, alknet-core, alknet-call) into dependency-ordered task files for implementation agents. Structure: - tasks/vault/ (10 tasks) — drift fixes from ADR-025/026 refactor, review, spec sync. Vault is independent and can run fully in parallel with core/call. - tasks/core/ (6 tasks) — crate init, core types, config, auth, endpoint, review. Core is foundational; call depends on it. - tasks/call/ (12 tasks) — split into registry/ and protocol/ topic subdirs reflecting the two subsystems. CallAdapter is the merge point. Key decisions: - Drifts 3+9+10 grouped as one task (key-versioning-rotation) — the complete ADR-021 rotation feature that doesn't compile in pieces - Reviews injected at end of each crate phase (vault, core, call) - Vault spec-sync task removes the drift table and bumps doc status to stable - ACME deferred in core/endpoint (noted as TODO; X509 manual certs for now) - OperationEnv kept as a trait (load-bearing for ADR-024 layering) Validated: 28 tasks, no cycles, 11 generations of parallel work. Critical path runs through call (11 tasks). Vault completes by generation 4. 6 high-risk tasks identified (21%): irpc-removal, endpoint, operation-context, operation-env, call-adapter, abort-cascade.
86 lines
3.3 KiB
Markdown
86 lines
3.3 KiB
Markdown
---
|
|
id: vault/poisoned-lock-recovery
|
|
name: Replace unwrap() on RwLock acquisition with poisoned-lock recovery via unwrap_or_else
|
|
status: pending
|
|
depends_on: [vault/irpc-removal]
|
|
scope: narrow
|
|
risk: low
|
|
impact: component
|
|
level: implementation
|
|
---
|
|
|
|
## Description
|
|
|
|
Fix drift item #2: `VaultServiceHandle` methods use `unwrap()` on every
|
|
`RwLock` acquisition (read and write locks). A poisoned lock (caused by a panic
|
|
while the lock was held) would brick the vault for all subsequent operations.
|
|
Replace with `unwrap_or_else(|e| e.into_inner())` to recover the inner data from
|
|
a poisoned lock, or explicit error propagation where appropriate.
|
|
|
|
### Current state
|
|
|
|
`service.rs` uses `.unwrap()` on `RwLock` read and write acquisitions at
|
|
approximately lines 142, 161, 182, 191, 196, 227, 264, 307, 340, 367 (line
|
|
numbers may shift after the irpc removal task — match by pattern: every
|
|
`.read().unwrap()` and `.write().unwrap()` call in `VaultServiceHandle` method
|
|
bodies).
|
|
|
|
### Target state
|
|
|
|
For read locks:
|
|
```rust
|
|
let inner = self.inner.read().unwrap_or_else(|e| e.into_inner());
|
|
```
|
|
|
|
For write locks:
|
|
```rust
|
|
let mut inner = self.inner.write().unwrap_or_else(|e| e.into_inner());
|
|
```
|
|
|
|
The rationale: a poisoned lock means a panic occurred while the lock was held.
|
|
The data may be in an inconsistent state, but bricking the vault (panicking on
|
|
every subsequent call) is worse than attempting to continue. The vault's
|
|
operations are idempotent reads (derive) and state transitions (lock/unlock) —
|
|
recovering the inner data and continuing is the pragmatic choice. If the data
|
|
is truly corrupted, the next operation will fail with a normal error, not a
|
|
panic.
|
|
|
|
### No unwrap() or expect() outside tests
|
|
|
|
This is a general constraint for the vault: no `unwrap()` or `expect()` outside
|
|
test code. After fixing the RwLock acquisitions, audit the rest of `service.rs`
|
|
for any remaining `unwrap()`/`expect()` calls and replace them with proper error
|
|
propagation (`?` operator, explicit `Result` returns, or
|
|
`unwrap_or_else(|e| e.into_inner())` for lock recovery).
|
|
|
|
### Scope
|
|
|
|
This task touches `service.rs` only. It depends on the irpc removal task (drift
|
|
#4) because that task restructures `service.rs` — doing this first would cause
|
|
merge conflicts.
|
|
|
|
## Acceptance Criteria
|
|
|
|
- [ ] All `.read().unwrap()` calls in `VaultServiceHandle` methods replaced with `.read().unwrap_or_else(|e| e.into_inner())`
|
|
- [ ] All `.write().unwrap()` calls in `VaultServiceHandle` methods replaced with `.write().unwrap_or_else(|e| e.into_inner())`
|
|
- [ ] No `unwrap()` or `expect()` calls remain in `service.rs` outside of test code
|
|
- [ ] Unit test: vault remains usable after a simulated panic (poison the lock, verify next call recovers)
|
|
- [ ] `cargo test` succeeds
|
|
- [ ] `cargo clippy` succeeds with no warnings
|
|
|
|
## References
|
|
|
|
- docs/architecture/crates/vault/README.md — Known Source Drift table item #2
|
|
- docs/architecture/crates/vault/service.md — Security Constraints: No unwrap() outside tests
|
|
- docs/architecture/decisions/025-vault-local-only-dispatch.md — ADR-025
|
|
|
|
## Notes
|
|
|
|
> A panic in one vault operation must not brick the vault for all other
|
|
> operations. The poisoned-lock recovery via `unwrap_or_else(|e| e.into_inner())`
|
|
> is the standard Rust pattern for this. This task depends on the irpc removal
|
|
> task because both modify `service.rs` heavily.
|
|
|
|
## Summary
|
|
|
|
> To be filled on completion |