Files

glm-5.2 ec315e9499 docs(research): extend alknet-filesystem POC — distributed sync via automerge CRDT

Third POC iteration (alknet-fs-sync-poc, 9/9 tests) proves multi-node
path-tree sync:

- Path tree modeled as automerge CRDT document, synced via automerge's
  sync protocol over iroh QUIC connections
- Each node has a local replica; writes are local + immediate (no
  network latency); sync is async, gossip-style, eventually consistent
- Concurrent writes to different paths converge cleanly; concurrent
  writes to same path resolve via LWW (NFS-equivalent semantics)
- Content (blobs) and metadata (path tree) sync separately — automerge
  for path edges, iroh-blobs for file bytes
- Branch inheritance works through automerge sync

Key finding: automerge concurrent put_object on same key creates a
conflict, not a merge. Root structures must be created by one node and
synced before other nodes write. This is a design constraint for the
spec.

24 total tests pass across both POC crates. All remaining unknowns are
implementation-scope, not feasibility blockers.

2026-06-20 17:36:39 +00:00

37 KiB

Raw Permalink Blame History

alknet-filesystem: POC Research Summary

Status: Research complete on the three highest-leverage unknowns (path-tree layer, write path, distributed sync); the approach is viable enough to spec. Remaining unknowns are implementation-scope, not feasibility. Date: 2026-06-20 Scope: Captures what the three POC iterations proved, what unknowns they closed, what remains open, and the architectural direction they establish. Source material for the eventual alknet-filesystem crate spec.

Executive Summary

A POC (alknet-filesystem-poc, /workspace/alknet-filesystem-poc) was completed that resolves the two largest sources of feasibility uncertainty around building a content-addressed, branch-aware, mountable filesystem from three orthogonal layers: SQLite (path tree + application file format), iroh-blobs (content-addressed blob store), and honker (durable pub/sub + queue + locks inside the same SQLite file).

The POC was built in three iterations:

Path-tree layer (Tier 1) — proved that a SQLite-backed path tree over an iroh-blobs MemStore gives Fossil-style branching with free content dedup, honker notify-on-commit inside the same transaction as path-tree mutations, and free multi-tenant isolation via a bucket_id column. 8 tests.
Write path — proved that "branch on write, merge on close" reconciles the fundamental mismatch between content-addressed storage (BLAKE3 must hash the complete file) and filesystem write semantics (chunks arrive incrementally, possibly out of order). A concurrent reader sees the old version until close() commits atomically; crash/abort leaves the old version intact. 7 tests.
Distributed sync — proved that modeling the path tree as an automerge CRDT document, synced via automerge's sync protocol over iroh QUIC connections, gives multi-node convergence with local-latency writes. Concurrent writes to different paths converge cleanly; concurrent writes to the same path resolve via LWW (NFS-equivalent semantics). Content (blobs) and metadata (path tree) sync separately. 9 tests.

15 + 9 = 24 tests pass across two POC crates. All deps are published crates.io versions (no workspace path deps), so the POCs are portable.

The three layers compose cleanly for both the read path and the write path. The remaining unknowns — FsStore/redb vs SQLite, actual SFTP wiring, network distribution, GC/tag management — are implementation details rather than architectural risks.

Background: The Insight

The POC originated from a correction on X: SQLite is not just a database, it's a legitimate application file format. The key observations from that link:

BLOBs < ~100KB are faster inline in SQLite than as filesystem files. This inverts the usual "databases are slow for big binaries" assumption at exactly the granularity iroh-blobs already cares about (16 KiB chunk groups).
Atomic transactions over path-tree metadata, independent of content blobs.
The schema is the documentation. An SQL schema defines the file format more concisely than a custom format spec.

iroh-blobs had already discovered this independently: its FsStore uses a hybrid approach (DESIGN.md, /workspace/iroh-blobs/DESIGN.md) with redb (an embedded KV DB, SQLite-shaped philosophy) for small blobs and filesystem files for large blobs, with four metadata tables (blobs, tags, inline_data, inline_outboard — see src/store/fs/meta/tables.rs). The path-tree layer is the missing piece iroh-blobs deliberately doesn't provide: iroh answers "given a BLAKE3 hash, where are the bytes?" but not "given a path, which hash does it resolve to?"

The architectural insight: SQLite for path edges, iroh for content, honker for coordination. Each layer does what it's best at, and the boundaries are clean.

The Three Layers

Layer 1: iroh-blobs — content-addressed blob storage

Crate: iroh-blobs 0.103 from crates.io, MemStore backend (no redb, no fsync rabbit hole — deliberate POC scope choice). Source reviewed: /workspace/iroh-blobs (v0.100 local checkout) + published 0.103 source in cargo cache.

iroh-blobs provides content-addressed blob storage with BLAKE3 verified streaming. A blob is a sequence of bytes of arbitrary size, identified by its 32-byte BLAKE3 hash. Content is dedup'd by definition: same hash = same bytes. This is the layer that gives us the "many agents / many forks share most content" property — content is shared across branches because it lives under its hash, not under its path.

Key API surface used by the POC:

MemStore::new() — in-memory store, no persistence
store.blobs().add_slice(bytes).with_tag().await → TagInfo (returns the hash + creates a persistent named tag protecting the blob from GC)
store.blobs().reader(hash) → BlobReader (implements AsyncRead + AsyncSeek)
store.blobs().has(hash) → bool

One sharp edge worth recording: Hash::new(buf) computes a BLAKE3 hash of the input; Hash::from_bytes(bytes) wraps 32 raw bytes as a Hash. Round-tripping a hash through hex requires from_bytes, not new. This cost ~20 minutes of debugging and is worth documenting for anyone touching the iroh API.

Layer 2: SQLite — path tree + application file format

Crate: rusqlite 0.39 (bundled), pinned to match honker-core's rusqlite version. Source: src/schema.rs, src/fs.rs

SQLite stores the path tree: a mapping from (bucket, branch, path) to an irob Link (64-hex BLAKE3 hash). The schema has four core tables:

buckets — top-level isolation unit (multi-tenant). One row per tenant.
branches — Fossil-style named snapshots with parent pointers. A branch starts empty and inherits everything from its parent.
paths — one row per (bucket, branch, path) entry that has been overridden or created on this branch (only the delta from the parent). link is the BLAKE3 hash; size is cached so fstat is a single indexed lookup.
tombstones — deletion markers. A path may exist on a parent branch but be deleted on a child; the tombstone stops the chain walk from inheriting.

Plus two write-path tables (see Write Path below):

write_sessions — active write sessions (one per open file handle)
write_chunks — chunks for active write sessions (one row per write() call, stored inline as BLOBs)

The core read operation is a recursive CTE that walks the branch parent chain:

WITH RECURSIVE chain(branch_id, parent_id, depth) AS (
    SELECT b.id, b.parent_id, 0 FROM branches b
      JOIN buckets bk ON bk.id = b.bucket_id
     WHERE bk.name = ?1 AND b.name = ?2
    UNION ALL
    SELECT b.id, b.parent_id, c.depth + 1 FROM chain c
      JOIN branches b ON b.id = c.parent_id
)
SELECT p.kind, p.link, p.symlink_to, p.size
  FROM chain c JOIN paths p ON p.branch_id = c.branch_id
  LEFT JOIN tombstones t ON t.branch_id IN (SELECT branch_id FROM chain WHERE depth <= c.depth)
                                    AND t.path = p.path
 WHERE p.path = ?3 AND t.id IS NULL
 ORDER BY c.depth ASC LIMIT 1

This returns the first matching path row that isn't tombstoned on a closer branch. The ORDER BY c.depth ASC LIMIT 1 correctly picks the closest branch's override. Even on the POC's naive schema, resolves are sub-millisecond in-memory.

Layer 3: honker — durable pub/sub + queue + locks inside the SQLite file

Crate: honker-core 0.2.4 (not honker — see below), bundled-sqlite feature. Source: src/schema.rs (bootstrap), src/watch.rs (listener)

honker provides pub/sub, durable queues, named locks, rate limits, and a scheduler — all as SQL functions registered on your own rusqlite connection. The key integration point:

honker_core::apply_default_pragmas(conn)?;      // WAL, synchronous=NORMAL, ...
honker_core::attach_notify(conn)?;              // _honker_notifications table + notify() SQL function
honker_core::attach_honker_functions(conn)?;    // enqueue, claim, lock_acquire, stream_publish, cron, ...
honker_core::bootstrap_honker_schema(conn)?;   // queue/stream/scheduler tables

This registers notify(channel, payload) as a SQL scalar function on the same connection that owns the path-tree tables. The critical property: calling SELECT notify(...) inside the same transaction as a path-tree mutation means the event is atomic with the data change. A watcher wakes on commit, not on poll. A rolled-back mutation produces no event. This is the transactional-outbox pattern, built in.

Why honker-core not honker: The honker crate opens its own SQLite connection and manages its own database handle. To get the SQL functions on your connection — the whole point of the transactional-outbox property — you need honker-core, which exposes attach_honker_functions(conn) for any rusqlite connection. The honker crate is the ergonomic Rust wrapper for the "honker manages everything" use case; honker-core is the foundation for the "I own the connection" use case. The POC is the latter.

honker is single-machine, file-backed (explicitly: "two servers writing the same .db over NFS is not a Honker deployment strategy"). That's fine for the local-VFS layer; the distributed part is iroh's job at the blob layer. The split is clean: honker coordinates local state + local workers; iroh coordinates cross-node content. They don't compete.

POC Iteration 1: Path-Tree Layer (Tier 1)

Scope: SQLite path-tree + iroh-blobs MemStore + honker notify, no SFTP, no FsStore, no network. The goal was to validate the SQLite+iroh seam and the branching model cheaply.

What it proved

1. SQLite is a workable path-tree layer over content-addressed blobs. Path rows store BLAKE3 hashes; bytes live in iroh. put/get roundtrips cleanly. Atomic path-tree mutations (rename, mkdir, unlink) compose with iroh blob add/delete. rename is O(1) on path edges regardless of file size — content stays in iroh under its hash, not under its path. This is the property git also gets right. (Test: put_get_roundtrip, rename_is_o1_on_path_edges)

2. Fossil-style branching gives free content sharing across forks. A child branch inherits parent files via the recursive-CTE chain walk. Same content hashes to the same link (free dedup via content addressing). Writes on a child are invisible on the parent. Tombstones hide parent files on child branches. (Tests: branch_inherits_parent_content, branch_modifications_do_not_leak_to_parent, tombstone_hides_parent_file_on_child, content_is_deduped_across_branches)

3. honker notify-on-commit works inside the same transaction. notify() is called inside the path-tree mutation; watcher receives the event on commit. (Test: watch_fires_on_commit)

4. Buckets (multi-tenancy) are free. A bucket_id column on every row = isolation. Alpha files don't leak into Beta. Auth is an adapter problem (which connection sees which buckets). (Test: multi_bucket_isolation)

The `share-check` demo

Running cargo run -- share-check demonstrates the content-sharing property end-to-end:

== content sharing across branches ==
main  shared.txt = "hello from main\n"
agent-a shared.txt = "hello from main\n"    ← inherited from parent, same hash
agent-a agent-a.txt = "agent a working here\n"
agent-a main-only.txt resolved = false     ← tombstoned on agent-a

== dir listing on agent-a (parent chain walk) ==
  agent-a.txt      kind=File link=f7a5aa575a40…
  shared.txt       kind=File link=b9d5a428d102…

shared.txt is byte-identical across main and agent-a (content shared by hash). main-only.txt is hidden on agent-a by the tombstone. agent-a.txt is only on the branch.

POC Iteration 2: Write Path — "Branch on Write, Merge on Close"

Scope: Chunked writes (SFTP-style open → write at offset × N → close), crash/abort semantics, concurrent-reader isolation. The deal-breaker question: can content-addressed storage serve as the backend for a mountable filesystem's write path, or is the BLAKE3-must-hash-the-whole-thing constraint a fundamental mismatch?

The problem

BLAKE3 must hash the complete file to produce the content address. But filesystem writes arrive as chunks — SFTP writes in ~32KB chunks, possibly pipelined and out of order. You can't hash until you have all the bytes. Where do partial writes live? What does a concurrent reader see? What happens on crash?

The solution: "branch on write, merge on close"

A write session is a short-lived branch:

open(path, WRITE) creates a temp child branch of the target branch. Inserts a write_sessions row.
write(offset, chunk) inserts a row in write_chunks — one transaction per chunk, crash-safe. Chunks may arrive out of order (SFTP pipelines writes). Chunk-sized BLOBs (~32KB) are SQLite's sweet spot per the appfileformat paper's "BLOBs < 100KB faster inline" finding. The offset is the unique key — two writes to the same offset overwrite (last-write-wins within a session, matching POSIX overlapping-write semantics).
Reads on the target branch during the session see the old version via the parent chain walk. This is POSIX "concurrent readers see old version until close commits", for free, from the branching model — no separate snapshot mechanism needed.
close() assembles chunks in offset order → BLAKE3 hash → iroh add_bytes → updates the path row on the target branch → notify() → marks session closed. All the SQLite parts in one transaction. The new version becomes visible atomically.
abort() or crash: the session row stays open/aborted, the target branch is untouched, the old version is still readable. Orphaned sessions can be found and cleaned up later.

What it proved

Test	What it proves
`chunked_write_close_produces_correct_hash`	Assembling chunks + BLAKE3 hashing produces the same hash as a whole-file write. Content addressing works through the chunk boundary.
`chunked_write_out_of_order`	SFTP-style pipelined writes (offset 6 before offset 0) assemble correctly. The offset is the key, not arrival order.
`concurrent_reader_sees_old_version_during_write`	The key POSIX property. During a write session, reads return the old version. After `close()`, reads return the new version. For free, from the branching model.
`abort_leaves_old_version_untouched`	Simulated crash/abort: old version survives, chunks discarded.
`abort_on_new_file_leaves_no_trace`	Aborting a write to a new path doesn't create a phantom entry.
`large_chunked_file_writes_correctly`	1MB file in 32KB chunks assembles, hashes, and reads back correctly. Matches whole-file hash.
`multiple_concurrent_write_sessions_same_path`	Two sessions on the same path don't corrupt each other's chunks (session-scoped). Last close wins.

The bug that was found and fixed

The initial chunk_idx = offset / 32768 computation was wrong — two writes at offsets 0 and 6 (both < 32768) collided on chunk_idx = 0, causing the INSERT OR REPLACE to overwrite the first chunk with the second. The out-of-order test caught this immediately. Fix: use the offset itself as the unique key. This is the kind of thing the POC exists to catch — a spec written without the POC would have shipped this bug.

Why SQLite + honker wins for the write path specifically

The write path is where the SQLite-vs-redb-vs-filesystem decision matters most, and SQLite wins for three reasons that all show up here:

Chunk-sized BLOBs are SQLite's sweet spot. SFTP writes in ~32KB chunks. The appfileformat page's "BLOBs < 100KB are faster inline in SQLite than as filesystem files" finding is exactly this case. redb could do it too, but now you have two databases for one transaction.
The chunks, the path tree, and the honker notification are one transaction. No dual-write between "where chunks live" and "where the path tree lives." No fsync ordering problem (the thing iroh's DESIGN.md spent the most words on — "files are hard," the bitfield/data/outboard write-ordering problem). One WAL, one commit boundary.
honker coordinates the session. Named locks on the path prevent concurrent writers from stomping each other (not yet wired in the POC — see Open Unknowns). The write session can be a honker job — enqueue on open, track progress, close completes the job, notify fires on merge. Crash leaves an orphaned job you can see and clean up.

POC Iteration 3: Distributed Sync — Automerge CRDT over iroh QUIC

POC: alknet-fs-sync-poc (/workspace/alknet-fs-sync-poc) Crates: automerge 0.10, iroh 1.0, iroh-blobs 0.103 Tests: 9/9 pass (4 local + 5 cross-node)

The question

The first two POCs proved the local architecture. The open question was: how do multiple nodes see each other's path-tree changes? The cache-invalidation/staleness problem — Node B has a local copy of hash X (correct, content-addressed), Node A updates the path to point to hash Y, B's cache is correct but stale.

The solution: path tree as automerge CRDT

Model the path tree as an automerge document. Each node has a local replica. Writes are local + immediate (no network latency). Sync happens via automerge's sync protocol over iroh's QUIC connections — gossip-style, eventually consistent. Conflicts on the same path merge via last-write-wins (LWW) on scalar values, which is what NFS does too.

The architecture splits cleanly:

Layer	What	How
Content (blobs)	File bytes, content-addressed	iroh-blobs — BLAKE3 hash → blob. Network-transferable via iroh's blob protocol.
Path tree (metadata)	Path → hash mapping, branches, tombstones	Automerge document, synced via iroh QUIC. Local replica per node.
Local write path	Chunked writes, crash safety	SQLite write_chunks + honker (from first POC).
Cross-node sync	"Node A changed path X to hash Y"	Automerge sync messages over iroh QUIC. Gossip-style, eventually consistent.

Content and metadata sync are separate: automerge syncs the path tree (small — path edges only, not file bytes); iroh-blobs syncs the content (large — actual file bytes, content-addressed). A node that learns a new path→hash mapping via automerge sync can fetch the blob from a peer that has it via iroh-blobs' network protocol. This keeps the automerge doc small and the sync fast.

Document structure

{
  "branches": {                    // branch_name → { parent: "main" | null }
    "main":     { "parent": null },
    "agent-a":  { "parent": "main" }
  },
  "paths": {                       // "{branch}:{path}" → { kind, link, size }
    "main:hello.txt":     { "kind": "file", "link": "abc123...", "size": 42 },
    "agent-a:agent.txt":  { "kind": "file", "link": "def456...", "size": 20 }
  },
  "tombstones": {                  // "{branch}:{path}" → true
    "agent-a:old.txt": true
  }
}

The branch chain walk (resolve) is done in Rust by reading the branches map and following parent pointers — same recursive logic as the SQLite CTE, but operating on the CRDT's local replica.

Sync protocol

Same pattern as the iroh-automerge example (/workspace/iroh-examples/iroh-automerge): length-prefixed automerge::sync::Message exchanged over a QUIC bidi stream. The PathTreeSync struct implements iroh::protocol::ProtocolHandler, accepting incoming sync connections and running the sync loop until both sides converge.

For full convergence with concurrent writes on both sides, 2-3 sync rounds are needed (A→B, B→A, A→B). In a production system, sync would be continuous (like the iroh-automerge-repo example with samod), not one-shot.

What it proved

Test	What it proves
`sync_node_a_write_visible_on_node_b`	The core test. Node A writes a file, syncs to Node B, Node B resolves the same path to the same hash. CRDT sync works.
`sync_concurrent_writes_different_paths_converge`	Both nodes write different files concurrently, sync, both see both files. No conflict on different paths.
`sync_concurrent_writes_same_path_lww`	Both nodes write the SAME path with different values, sync, both converge to the same value (LWW). NFS-equivalent semantics.
`sync_branch_inheritance_across_nodes`	Node A creates a branch with a parent and a file. After sync, Node B resolves a file on the child branch by walking the parent chain — even though the branch was created on Node A.
`content_separate_from_metadata_sync`	Node A writes a file (path tree + blob), syncs path tree to Node B. B can resolve path→hash (metadata synced), but B's blob store is empty (content not synced). This validates the metadata/content separation — B knows the file exists but needs to fetch the content separately.

The bug that was found and fixed: concurrent root-map initialization

The most important finding from this POC iteration: automerge concurrent put_object on the same key creates a conflict, not a merge. If two nodes independently call put_object(ROOT, "paths", Map), automerge sees two concurrent writes to the same key and keeps both as conflicting values. When merged, one map wins and the other's contents are invisible.

This affected two places:

Root maps (branches, paths, tombstones): if both nodes lazily create them on first write, the maps conflict.
Branch creation: if both nodes independently ensure_branch("main"), the branch objects conflict.

The fix: one node initializes the root maps and branches, syncs to the other node, then both write independently. The ensure_branch method creates all root maps eagerly (idempotent — only creates if absent). This mirrors the real use case: a node joins the network, syncs the current state, then writes locally. A spec must call this out as a design constraint: root structures must be created by one node and synced before other nodes write.

What this means for the spec

The three-layer architecture (SQLite path-tree, iroh content store, honker coordination) extends cleanly to the distributed case by making the path tree an automerge CRDT. The remaining unknowns are now all implementation-scope:

SFTP wiring — mechanical, the Handler trait maps 1:1 to the PathTree + WriteSession API.
FsStore/redb — needed for production persistence, coexists with automerge for the path tree.
Continuous sync — the POC does one-shot sync; production needs continuous sync (like samod's Repo with sync_with).
Content fetch on read — when a node learns a new path→hash but doesn't have the blob, it needs to fetch via iroh-blobs' network protocol. The content_separate_from_metadata_sync test validates this is the correct behavior.
GC and tag management — same as before, plus cross-node GC coordination.

The stack

┌─────────────────────────────────────────────────────────────────┐
│  SFTP / SSH (russh-sftp Handler trait) — not yet wired           │
│  open / read / write / close / readdir / rename / unlink / ...   │
│  maps 1:1 to PathTree + WriteSession API                         │
└────────────────────────────┬────────────────────────────────────┘
                             │
                             ▼
┌─────────────────────────────────────────────────────────────────┐
│  PathTree (SQLite)                                               │
│  buckets, branches, paths, tombstones                            │
│  recursive-CTE chain walk for branch-aware resolve               │
│  WriteSession: branch-on-write, merge-on-close                   │
│  honker notify() inside every mutation txn                       │
└────────────────────────────┬────────────────────────────────────┘
                             │
              ┌──────────────┴──────────────┐
              ▼                             ▼
┌─────────────────────────────┐  ┌────────────────────────────────┐
│  iroh-blobs (content store)  │  │  honker (in the SQLite file)   │
│  BLAKE3 content addressing   │  │  notify/listen (watch)        │
│  MemStore (POC) /            │  │  durable queues (sync/replicate)│
│  FsStore (production, redb)  │  │  named locks (writer coord)   │
│  Tags + GC (mark-sweep)      │  │  scheduler (maintenance cron)  │
└─────────────────────────────┘  └────────────────────────────────┘

Layer separation

Concern	Layer	Why
Path → content hash mapping	SQLite path tree	Hierarchical, indexed, transactional, schema-as-doc
Content bytes	iroh-blobs	Content-addressed, dedup, verified streaming, network transfer
Path-tree mutations	SQLite txn	Atomic, crash-safe, single WAL
Filesystem events (watch/inotify)	honker notify()	Atomic with the mutation, wake-on-commit not poll
Background sync/replication kicks	honker queues	Transactional outbox: enqueue in same txn as mutation
Writer coordination	honker named locks	Prevent concurrent writers on same path
Branching / forking	branches table + chain walk	Fossil-style, content shared by hash, only path edges diverge
Multi-tenancy	bucket_id column	Free — just a where-clause, auth is an adapter problem
Chunked writes	write_sessions + write_chunks	Branch-on-write, merge-on-close, crash-safe per-chunk

SFTP mapping (conceptual, not yet wired)

The russh-sftp Handler trait (/workspace/russh-sftp/src/server/handler.rs) is a near 1:1 mirror of POSIX FS syscalls translated to SFTP packets. Each maps onto SQL operations against the path-tree tables:

SFTP op	PathTree operation
`open(path, O_RDONLY)`	`resolve(bucket, branch, path)` → `Link` → iroh `BlobReader`
`open(path, O_WRONLY)`	`WriteSession::open(bucket, branch, path)`
`write(handle, offset, data)`	`WriteSession::write_chunk(offset, data)`
`close(handle)`	`WriteSession::close()` → hash → merge → notify
`readdir(path)`	`list_dir(bucket, branch, path)`
`stat`/`lstat`/`fstat`	`resolve(...)` → entry size/kind (indexed lookup, cheaper than real FS fstat)
`rename(from, to)`	`PathTree::rename(...)` — O(1) on edges, content stays in iroh
`remove(path)`	`PathTree::unlink(...)` — tombstone, content GC'd later
`mkdir`/`rmdir`	`PathTree::mkdir`/`unlink` with `kind=dir`
`symlink`/`readlink`	`PathTree::symlink`/`resolve` with `kind=symlink`
`extended`	SFTP escape hatch — could expose iroh-specific ops (get ticket, resolve to hash)

The russh-sftp client's File already implements AsyncRead + AsyncSeek + AsyncWrite with pipelined writes (write_nowait + ack window), and the iroh BlobReader already supports range reads. The two trait surfaces line up. SeekFrom::End (round-trip-costly in real SFTP — calls fstat) becomes a single indexed SQLite lookup, so it's actually cheaper than a real FS fstat.

Multi-tenancy / buckets

The "bucket" concept maps to S3's bucket format (from the rudolfs reference, /workspace/@alkdev/alknet/docs/research/references/gitlfs/rudolfs-reference.md). In rudolfs, StorageKey = (Namespace, Oid) where Namespace = (org, project) — tenant isolation by URL path. In the POC, bucket_id on every row achieves the same isolation with a single where-clause. Auth from the SSH/SFTP connection determines which buckets are visible — it's an adapter problem, not a storage problem.

The rudolfs caching layer (LRU + disk cache → permanent storage, with fanout() to stream to both client and cache simultaneously) is a useful pattern for the production version: a local iroh-blobs FsStore as cache, a remote iroh node as permanent storage. The decorator composition pattern (Verify ↔ Encrypted ↔ Cached ↔ Retrying(Disk → S3)) translates directly: Verify becomes BLAKE3 verification (built into iroh), Cached becomes the local iroh store, S3 becomes the remote iroh node.

Open Unknowns (For Future POCs)

These are the unknowns that remain after the POC. None are feasibility blockers (the basic mechanics work); they are scope/work-quantity questions that affect spec sizing.

1. FsStore (redb + filesystem) vs SQLite — the two-database question (scoping, not feasibility)

The POC used MemStore deliberately — no redb, no fsync rabbit hole, no partial-file lifecycle. The production version needs FsStore for persistence of large blobs. The open question: is having both redb (iroh's metadata) and SQLite (our path tree) in one process a problem?

Two embedded databases means two WAL files, two fsync paths, two crash-recovery stories. The likely answer is "fine, they serve different purposes" — redb stores blob metadata + inline data, SQLite stores path trees + write chunks + honker tables. But it needs validation.

The alternative — forking iroh-blobs to use SQLite instead of redb — is a big maintenance commitment. The POC's write_chunks table proves SQLite can handle chunk-sized inline BLOBs at iroh's granularity, so the swap is mechanically possible. But it should only be done if the two-database coexistence proves problematic, not for aesthetics. A scoping probe would run FsStore alongside the SQLite path tree and measure: double-fsync overhead, WAL contention, operational confusion.

2. Incomplete blobs in a distributed context (partially solved)

The "many agents" scenario's incomplete-blob problem is now partially solved. The content_separate_from_metadata_sync test in the sync POC proves that a node can learn a new path→hash mapping via automerge sync without having the blob locally. The remaining design question: what does the read return when the blob isn't local? Does it block? Does it trigger a fetch? Does it return EIO? A design probe would wire iroh-blobs' network fetch into the read path: resolve → miss → async fetch from a peer that has it → block or return EIO temporarily. The honker queue is the coordination mechanism for background fetching.

3. SFTP wiring (mechanical, not design)

The Handler trait maps 1:1 to the PathTree + WriteSession API (see SFTP mapping table above). Wiring it is straightforward but non-trivial: handle management (open files, open directories), error code mapping (SQLite errors → SFTP status codes), fsync@openssh.com extension negotiation, and the extended channel for iroh-specific ops. A POC would implement the Handler trait and test with an actual sshfs mount.

4. honker named locks for writer coordination (mechanical)

The POC's multiple_concurrent_write_sessions_same_path test shows that two sessions on the same path don't corrupt each other's chunks (session-scoped), and last-close wins. But a real filesystem needs explicit locking: honker_lock_acquire('path:<bucket>:<branch>:<path>', writer_id, timeout) to prevent concurrent writers from stomping each other. The lock is a SQL function already registered on the connection — it just needs to be called in WriteSession::open. Not wired in the POC; straightforward to add.

5. GC and tag management (design)

iroh-blobs uses tags + mark-sweep GC: blobs are protected from deletion by tags (persistent or temp), and GC walks all tags to find reachable hashes, then sweeps everything else. The path tree needs to manage tags: when a path row points to a Link, that blob needs a persistent tag so it survives GC. When a path is tombstoned (unlinked), the tag can be removed and the blob becomes eligible for GC. The mapping from path rows to tags is the design question. A POC would wire tags.create() / tags.delete() into upsert_path / unlink and verify GC reclaims orphaned content.

6. Branch chain depth and performance (perf, deferred)

The recursive CTE walks the full parent chain on every resolve. For shallow chains (2-3 levels, as in "main → agent-a → working-session") this is sub-millisecond. For deep chains (many nested forks, or long-lived agent workspaces with many snapshot branches), performance could degrade. A materialized view or a "resolved paths" cache table (updated on commit) would solve this if it becomes an issue. Worth a perf probe with realistic branch depths before spec.

7. Snapshot / commit semantics (design)

The POC has branches but no explicit "commit" or "snapshot" operation — a branch is just a name, and writes to it are immediate. A real filesystem (especially one backing git) needs snapshot points: "this branch was at this state at this time." The branches table has created_at but no snapshot history. The design question: is a snapshot a new branch (Fossil's model), or is it a recorded point-in-time within a branch (git's model)? Fossil's model maps more naturally to the existing recursive CTE.

Test Coverage

running 15 tests
test abort_leaves_old_version_untouched ... ok
test abort_on_new_file_leaves_no_trace ... ok
test branch_inherits_parent_content ... ok
test branch_modifications_do_not_leak_to_parent ... ok
test chunked_write_close_produces_correct_hash ... ok
test chunked_write_out_of_order ... ok
test concurrent_reader_sees_old_version_during_write ... ok
test content_is_deduped_across_branches ... ok
test large_chunked_file_writes_correctly ... ok
test multi_bucket_isolation ... ok
test multiple_concurrent_write_sessions_same_path ... ok
test put_get_roundtrip ... ok
test rename_is_o1_on_path_edges ... ok
test tombstone_hides_parent_file_on_child ... ok
test watch_fires_on_commit ... ok

POCs

alknet-filesystem-poc (local architecture)

src/
  main.rs           # CLI: mkfs, ls, put, get, rm, mv, branch, watch, share-check
  schema.rs         # SQLite schema (paths, branches, tombstones, write_sessions,
                    #   write_chunks) + honker bootstrap
  fs.rs             # PathTree: resolve, put_file, mkdir, symlink, unlink, rename,
                    #   list_dir, read_file, open_write
                    # recursive-CTE chain walk for branch-aware resolve
  blob_bridge.rs    # iroh MemStore adapter: put_bytes → Link, get_bytes → bytes
  branch.rs         # Branch listing / creation
  write_session.rs  # Chunked write: open, write_chunk, close, abort
                    # branch-on-write, merge-on-close
  watch.rs          # honker notify/listen wrapper for fs events
  lib.rs            # re-exports
tests/
  integration.rs    # 15 tests: 8 path-tree + 7 write-path

alknet-fs-sync-poc (distributed sync)

src/
  path_tree.rs      # AutomergePathTree: automerge-backed path tree with
                    #   branch chain walk, put_file, resolve, list_dir, unlink
  sync.rs           # PathTreeSync: iroh ProtocolHandler that exchanges
                    #   automerge sync::Messages over QUIC bidi streams
  blob_bridge.rs    # iroh MemStore adapter (same as first POC)
  lib.rs            # re-exports
tests/
  integration.rs    # 9 tests: 4 local + 5 cross-node sync

References

POC 1 (local): /workspace/alknet-filesystem-poc — Cargo.toml, src/, tests/integration.rs (15 tests)
POC 2 (sync): /workspace/alknet-fs-sync-poc — Cargo.toml, src/, tests/integration.rs (9 tests)
SQLite appfileformat: https://sqlite.org/appfileformat.html
iroh-blobs source (v0.100): /workspace/iroh-blobs — DESIGN.md (blob store tradeoffs, hybrid approach, files-are-hard), src/store/fs/ (FsStore, redb tables, EntryState), src/api/blobs.rs (Blobs API)
iroh-blobs published (v0.103): cargo cache — store/mem.rs (MemStore), api/blobs.rs (AddProgress, BlobReader)
honker: https://honker.dev/docs/ — honker-core API: attach_honker_functions, attach_notify, bootstrap_honker_schema, apply_default_pragmas
honker source: ~/.cargo/registry/src/*/honker-core-0.2.4/src/ — lib.rs, honker_ops.rs
russh-sftp source: /workspace/russh-sftp/src/ — server/handler.rs (Handler trait), client/fs/file.rs (pipelined AsyncWrite), client/fs/dir.rs
rudolfs reference: /workspace/@alkdev/alknet/docs/research/references/gitlfs/rudolfs-reference.md — decorator pattern, LRU cache, namespace/bucket isolation, fanout streaming
iroh-blobs research docs: /workspace/@alkdev/alknet/docs/research/references/iroh/iroh-blobs/ — overview, storage, key types, transfer protocol
russh-sftp research docs: /workspace/@alkdev/alknet/docs/research/references/ssh/russh-sftp/ — overview, client API, server API, wire protocol
iroh-automerge example: /workspace/iroh-examples/iroh-automerge/ — src/protocol.rs (SyncDoc over QUIC bidi stream pattern)
iroh-automerge-repo example: /workspace/iroh-examples/iroh-automerge-repo/ — src/lib.rs (samod Repo with continuous sync, TokioFilesystemStorage)
automerge source: ~/.cargo/registry/src/*/automerge-0.10.0/ — src/automerge.rs (AutoCommit, fork, merge), src/sync.rs (SyncDoc trait, Message), src/iter/map_range.rs (MapRangeItem)

37 KiB Raw Permalink Blame History Unescape Escape

alknet-filesystem: POC Research Summary

Executive Summary

Background: The Insight

The Three Layers

Layer 1: iroh-blobs — content-addressed blob storage

Layer 2: SQLite — path tree + application file format

Layer 3: honker — durable pub/sub + queue + locks inside the SQLite file

POC Iteration 1: Path-Tree Layer (Tier 1)

What it proved

The share-check demo

POC Iteration 2: Write Path — "Branch on Write, Merge on Close"

The problem

The solution: "branch on write, merge on close"

What it proved

The bug that was found and fixed

Why SQLite + honker wins for the write path specifically

POC Iteration 3: Distributed Sync — Automerge CRDT over iroh QUIC

The question

The solution: path tree as automerge CRDT

Document structure

Sync protocol

What it proved

The bug that was found and fixed: concurrent root-map initialization

What this means for the spec

The stack

Layer separation

SFTP mapping (conceptual, not yet wired)

Multi-tenancy / buckets

Open Unknowns (For Future POCs)

1. FsStore (redb + filesystem) vs SQLite — the two-database question (scoping, not feasibility)

2. Incomplete blobs in a distributed context (partially solved)

3. SFTP wiring (mechanical, not design)

4. honker named locks for writer coordination (mechanical)

5. GC and tag management (design)

6. Branch chain depth and performance (perf, deferred)

7. Snapshot / commit semantics (design)

Test Coverage

POCs

alknet-filesystem-poc (local architecture)

alknet-fs-sync-poc (distributed sync)

References

37 KiB

Raw Permalink Blame History

The `share-check` demo