diff --git a/docs/research/references/iroh/iroh-blobs/01-overview-and-architecture.md b/docs/research/references/iroh/iroh-blobs/01-overview-and-architecture.md new file mode 100644 index 0000000..7ae785f --- /dev/null +++ b/docs/research/references/iroh/iroh-blobs/01-overview-and-architecture.md @@ -0,0 +1,138 @@ +# iroh-blobs: Overview and Architecture + +**Version**: 0.100.0 +**Repository**: https://github.com/n0-computer/iroh-blobs +**License**: MIT OR Apache-2.0 +**Rust Edition**: 2021 +**MSRV**: 1.89 + +## What It Is + +`iroh-blobs` is a Rust crate for content-addressed blob transfer over QUIC connections, built on top of [iroh](https://docs.rs/iroh). It implements a request-response protocol for streaming BLAKE3-verified data between peers, along with store implementations for persisting blobs locally. + +The core value proposition: transfer arbitrary-sized data with **cryptographic integrity guaranteed in-stream** — every 16 KiB chunk group can be verified against the BLAKE3 hash tree as it arrives, without waiting for the complete transfer. + +## Core Concepts + +| Concept | Description | +|---------|-------------| +| **Blob** | A sequence of bytes of arbitrary size, identified by its BLAKE3 hash. No metadata. | +| **Link** | A 32-byte BLAKE3 hash of a blob — the content address. | +| **HashSeq** | A blob whose content is a sequence of BLAKE3 hashes (each 32 bytes). Length must be a multiple of 32. | +| **Provider** | The side serving data. Waits for incoming requests and responds. | +| **Requester** | The side requesting data. Initiates connections and sends requests. | +| **Tag** | A persistent named reference to a `HashAndFormat`, protecting blobs from garbage collection. | +| **TempTag** | An ephemeral in-memory reference that protects content while the process runs. | +| **Chunk** | The fundamental BLAKE3 unit: 1024 bytes. | +| **Chunk Group** | Iroh's grouping of 16 chunks (16 KiB), the minimum granularity for range requests and verification. | + +## Architecture Diagram + +``` +┌─────────────────────────────────────────────────────┐ +│ Application │ +│ │ +│ ┌──────────┐ ┌──────────┐ ┌──────────────────┐ │ +│ │ Blobs │ │ Tags │ │ Downloader │ │ +│ │ API │ │ API │ │ API │ │ +│ └────┬─────┘ └────┬─────┘ └───────┬──────────┘ │ +│ │ │ │ │ +│ └──────────────┴────────────────┘ │ +│ │ │ +│ ┌───────┴───────┐ │ +│ │ Store (API) │ ← Actor-based, RPC │ +│ │ Commands │ message passing │ +│ └───────┬───────┘ │ +│ │ │ +│ ┌─────────────┼─────────────┐ │ +│ │ │ │ │ +│ ┌─────┴─────┐ ┌────┴────┐ ┌─────┴─────┐ │ +│ │ MemStore │ │ FsStore │ │ Readonly │ │ +│ │ │ │ (redb + │ │ MemStore │ │ +│ │ │ │ fs) │ │ │ │ +│ └────────────┘ └─────────┘ └───────────┘ │ +└─────────────────────────────────────────────────────┘ + +┌─────────────────────────────────────────────────────┐ +│ Network Layer │ +│ │ +│ ┌──────────────────┐ ┌──────────────────────┐ │ +│ │ BlobsProtocol │ │ Remote (Client) │ │ +│ │ (Provider side) │ │ (Requester side) │ │ +│ │ │ │ │ │ +│ │ handle_conn() │ │ Remote::fetch() │ │ +│ │ handle_stream() │ │ Remote::local() │ │ +│ └────────┬─────────┘ └──────────┬───────────┘ │ +│ │ │ │ +│ └──────── iroh QUIC ───────┘ │ +│ ALPN: /iroh-bytes/4 │ +└─────────────────────────────────────────────────────┘ +``` + +## Module Structure + +``` +iroh-blobs/src/ +├── lib.rs # Crate root, re-exports +├── hash.rs # Hash, BlobFormat, HashAndFormat +├── hashseq.rs # HashSeq type +├── format.rs # Format module (Collection) +│ └── collection.rs # Collection type with metadata +├── protocol.rs # Wire protocol types (GetRequest, etc.) +│ └── range_spec.rs # ChunkRangesSeq, RangeSpec wire encoding +├── net_protocol.rs # BlobsProtocol (iroh ProtocolHandler) +├── provider.rs # Server-side request handling +│ └── events.rs # Event system (connect/disconnect/progress) +├── get.rs # Client-side FSM for getting data +│ ├── error.rs # GetError, GetResult types +│ └── request.rs # Request execution helpers +├── api/ # High-level store API +│ ├── blobs.rs # Blob operations (add, export, read, etc.) +│ │ └── reader.rs # BlobReader (AsyncRead + AsyncSeek) +│ ├── downloader.rs # Multi-source download coordinator +│ ├── remote.rs # Remote peer interaction (fetch, observe) +│ ├── tags.rs # Tag management API +│ ├── proto.rs # Store command protocol (RPC messages) +│ └── proto/ # Proto sub-modules +│ └── bitfield.rs # Bitfield type for chunk tracking +├── store/ # Storage implementations +│ ├── mod.rs # IROH_BLOCK_SIZE, GcConfig +│ ├── mem.rs # MemStore (in-memory, mutable) +│ ├── fs.rs # FsStore (filesystem + redb hybrid) +│ ├── readonly_mem.rs # Read-only memory store +│ ├── gc.rs # Garbage collection +│ ├── util.rs # Shared utilities (Tag, SparseMemFile, etc.) +│ └── test.rs # Test utilities +├── ticket.rs # BlobTicket (shareable connection info) +├── metrics.rs # Prometheus metrics definitions +└── util/ # Utilities + ├── channel.rs # Channel helpers + ├── connection_pool.rs # Connection pooling + ├── stream.rs # Stream abstractions + └── temp_tag.rs # TempTag, TagCounter, TempTags scope management +``` + +## Key Dependencies + +| Dependency | Purpose | +|------------|---------| +| `bao-tree` | BLAKE3 verified streaming, outboard storage, BaoTree encoding/decoding | +| `iroh` | QUIC networking, endpoint, router | +| `irpc` | RPC framework for store commands | +| `postcard` | Wire serialization (compact, no-schema) | +| `redb` | Embedded key-value database (fs-store feature) | +| `range-collections` | RangeSet2 / ChunkRanges for chunk tracking | +| `bytes` | Efficient byte buffer handling | + +## Feature Flags + +| Feature | Default | Description | +|---------|---------|-------------| +| `fs-store` | ✅ | Filesystem-based store with redb + file hybrid | +| `rpc` | ✅ | RPC support via `noq` / `irpc` | +| `metrics` | ❌ | Prometheus metrics | +| `hide-proto-docs` | ✅ | Hides protocol docs from rustdocs | + +## BLAKE3 Block Size + +The crate uses a fixed block size of `IROH_BLOCK_SIZE = BlockSize::from_chunk_log(4)`, which means each chunk group is 2^4 = 16 chunks = 16 × 1024 = 16,384 bytes (16 KiB). This is the minimum granularity for range requests and verification. \ No newline at end of file diff --git a/docs/research/references/iroh/iroh-blobs/02-key-types.md b/docs/research/references/iroh/iroh-blobs/02-key-types.md new file mode 100644 index 0000000..a7655de --- /dev/null +++ b/docs/research/references/iroh/iroh-blobs/02-key-types.md @@ -0,0 +1,195 @@ +# iroh-blobs: Key Types and Data Structures + +## Hash + +```rust +// src/hash.rs +pub struct Hash(blake3::Hash); // 32-byte BLAKE3 hash, wraps blake3::Hash +``` + +The fundamental content-address. Created via `Hash::new(data)` or `Hash::from_bytes([u8; 32])`. Has a constant `Hash::EMPTY` for the empty blob. Supports hex display, serde (compact binary for non-human-readable), and is stored as a 32-byte fixed array in redb. + +Wire format: 32 raw bytes (postcard serialization). No framing overhead. + +## BlobFormat + +```rust +pub enum BlobFormat { + Raw, // A single blob + HashSeq, // A sequence of BLAKE3 hashes +} +``` + +Distinguishes between a raw binary blob and a hash sequence. Wire format: single byte (0 = Raw, 1 = HashSeq). + +## HashAndFormat + +```rust +pub struct HashAndFormat { + pub hash: Hash, + pub format: BlobFormat, +} +``` + +Pairs a hash with its format. Wire format: 33 bytes (32 for hash + 1 for format). Display format: hex string, optionally prefixed with 's' for HashSeq. + +## HashSeq + +```rust +// src/hashseq.rs +pub struct HashSeq(Bytes); // Wrapper around Bytes, length must be multiple of 32 +``` + +A blob interpreted as a sequence of 32-byte BLAKE3 hashes. Created from `Bytes` via `HashSeq::new(bytes)` (returns `None` if length is not a multiple of 32). Iterable, supports `get(index)`, `pop_front()`. + +Used extensively: collections are stored as a HashSeq where the first child is metadata and subsequent children are data blobs. + +## Bitfield + +```rust +// src/api/proto/bitfield.rs +pub struct Bitfield { + pub size: u64, // Total size of the blob in bytes + pub ranges: ChunkRanges, // Which chunks are verified/present +} +``` + +Tracks which chunks of a blob are present and verified. Key methods: +- `is_complete()` — all chunks present +- `validated_size()` — how many bytes are verified +- `diff(&other)` — compute the delta between two bitfields + +Used by the observe protocol and internal state tracking. + +## Tag + +```rust +// src/store/util.rs +pub struct Tag(pub Bytes); // Named reference, arbitrary bytes, typically UTF-8 +``` + +A persistent named reference to content in the store. Tags protect content from garbage collection. Auto-generated tags use the format `"auto-2026-01-15T12:34:56.789Z"`. Tags are stored in the store's database and can be listed, created, renamed, and deleted. + +## TempTag + +```rust +// src/util/temp_tag.rs +pub struct TempTag { + inner: HashAndFormat, + on_drop: Option>, // Callback when dropped +} +``` + +An ephemeral, in-memory tag. While a `TempTag` exists, its referenced content is protected from garbage collection. When dropped, the `TagDrop` callback notifies the store to unprotect. Can be `leak()`ed to make the protection permanent for the process lifetime. + +Scopes: `TempTagScope` manages groups of temp tags. `Scope::GLOBAL` is the default scope. Batches of operations can create scoped temp tags that are cleaned up together. + +## BlobTicket + +```rust +// src/ticket.rs +pub struct BlobTicket { + addr: EndpointAddr, // How to reach the provider (includes EndpointId, relay URL, direct addresses) + format: BlobFormat, // Raw or HashSeq + hash: Hash, // What to retrieve +} +``` + +A shareable token containing everything needed to retrieve a blob from a provider. Serialized via `iroh_tickets::Ticket` trait (base32-encoded with "blob" prefix). Wire format uses postcard with a variant discriminator. + +```rust +// Creating a ticket +let ticket = BlobTicket::new(addr, hash, BlobFormat::Raw); + +// From a ticket string +let ticket: BlobTicket = ticket_str.parse()?; +``` + +## ChunkRanges and ChunkRangesSeq + +### ChunkRanges + +```rust +pub type ChunkRanges = RangeSet2; // From range_collections crate +``` + +A set of non-overlapping chunk ranges. Supports boolean operations (union, intersection, difference). The fundamental unit is `ChunkNum` (a u64 newtype representing a 1024-byte BLAKE3 chunk). + +Helper trait `ChunkRangesExt` provides: +- `ChunkRanges::all()` — all chunks +- `ChunkRanges::bytes(range)` — byte range rounded up to chunk boundaries +- `ChunkRanges::chunks(range)` — chunk range from u64 bounds +- `ChunkRanges::last_chunk()` — the very last chunk (for size verification) +- `ChunkRanges::chunk(n)` — a single chunk +- `ChunkRanges::offset(n)` — a single byte offset rounded to chunk + +### ChunkRangesSeq + +```rust +// src/protocol/range_spec.rs +pub struct ChunkRangesSeq(SmallVec<[(u64, ChunkRanges); 2]>); +``` + +A sequence of `ChunkRanges`, one per blob in a HashSeq. Uses run-length encoding: stores `(offset, ranges)` pairs, where offset is the first blob index with that range spec. Unspecified indices default to the most recent range (or empty for finite sequences). + +Key methods: +- `ChunkRangesSeq::all()` — request everything (root + all children, forever) +- `ChunkRangesSeq::root()` — request only the root blob +- `ChunkRangesSeq::empty()` — request nothing +- `ChunkRangesSeq::from_ranges(ranges)` — from explicit iterator +- `ChunkRangesSeq::from_ranges_infinite(ranges)` — last range repeats forever +- `.iter_non_empty_infinite()` — iterate only non-empty ranges +- `.is_blob()` — true if requesting a single blob (offset 0 with one entry) + +### RangeSpec (Wire Format) + +```rust +pub struct RangeSpec(SmallVec<[u64; 2]>); +``` + +The on-wire encoding of `ChunkRanges`. Uses alternating spans: first span is deselected, second is selected, etc. SmallVec avoids allocation for the common case of a single range. + +Examples: +- `[]` — empty (nothing selected) +- `[0]` — everything from chunk 0 selected (entire blob) +- `[2, 5, 3, 1]` — chunks 2-7 and 10-11 selected +- `[u64::MAX]` — only the last chunk (size proof) + +### ChunkRangesSeq Wire Format + +Serialized as `(SmallVec<[(u64, RangeSpec); 2]>)` where each element is `(delta_offset, rangespec)`. The `delta_offset` is the distance from the previous entry. Uses postcard varint encoding for compact transmission. + +## Store Command Protocol + +The store API uses an RPC-style command pattern via `irpc`. Each command has a `Command` enum variant with typed request/response channels: + +```rust +#[rpc_requests(message = Command, alias = "Msg", rpc_feature = "rpc")] +pub enum Request { + ListBlobs(ListRequest), + Batch(BatchRequest), + DeleteBlobs(BlobDeleteRequest), + ImportBao(ImportBaoRequest), // streaming: rx bao items, tx result + ExportBao(ExportBaoRequest), // streaming: tx encoded items + ExportRanges(ExportRangesRequest), // streaming: tx range data + Observe(ObserveRequest), // streaming: tx bitfield updates + BlobStatus(BlobStatusRequest), + ImportBytes(ImportBytesRequest), + ImportByteStream(ImportByteStreamRequest), // duplex streaming + ImportPath(ImportPathRequest), + ExportPath(ExportPathRequest), + ListTags(ListTagsRequest), + SetTag(SetTagRequest), + DeleteTags(DeleteTagsRequest), + RenameTag(RenameTagRequest), + CreateTag(CreateTagRequest), + CreateTempTag(CreateTempTagRequest), + ListTempTags(ListTempTagsRequest), + SyncDb(SyncDbRequest), + WaitIdle(WaitIdleRequest), + Shutdown(ShutdownRequest), + ClearProtected(ClearProtectedRequest), +} +``` + +This allows both local (in-process) and remote (RPC) store access through the same API surface. \ No newline at end of file diff --git a/docs/research/references/iroh/iroh-blobs/03-transfer-protocol.md b/docs/research/references/iroh/iroh-blobs/03-transfer-protocol.md new file mode 100644 index 0000000..eaed16f --- /dev/null +++ b/docs/research/references/iroh/iroh-blobs/03-transfer-protocol.md @@ -0,0 +1,249 @@ +# iroh-blobs: Transfer Protocol + +## Overview + +The transfer protocol is a **request-response** protocol operating over QUIC streams (via iroh). The ALPN is `b"/iroh-bytes/4"`. + +The requester opens a bidirectional QUIC stream, sends a request, and the provider responds with BLAKE3-verified streaming data on the same stream. + +**Key properties**: +- Data integrity is verified in-stream — every 16 KiB chunk group can be independently verified against the BLAKE3 hash tree +- No upper limit on blob or collection size — streaming design avoids buffering entire transfers +- Zero round-trip overhead for multiple small blobs (via HashSeq/GetManyRequest) +- Range requests supported at chunk granularity + +## Request Types + +```rust +pub enum Request { + Get(GetRequest), + Observe(ObserveRequest), + Slot2, Slot3, Slot4, Slot5, Slot6, Slot7, // Reserved + Push(PushRequest), + GetMany(GetManyRequest), +} +``` + +Wire format: 1-byte discriminator (postcard-encoded `RequestType` enum), followed by postcard-serialized request body. + +### GetRequest + +```rust +pub struct GetRequest { + pub hash: Hash, // BLAKE3 hash of the root blob + pub ranges: ChunkRangesSeq, // What ranges to request +} +``` + +The most common request type. The `ranges` field uses `ChunkRangesSeq` to express which parts of the root blob and its children to request. + +**Common patterns**: + +```rust +// Request an entire single blob +let req = GetRequest::blob(hash); +// -> ChunkRangesSeq with a single element: all chunks of the root + +// Request a HashSeq (root + all children) +let req = GetRequest::all(hash); +// -> ChunkRangesSeq::all() - infinite sequence of "all chunks" + +// Request parts of a single blob +let req = GetRequest::builder() + .root(ChunkRanges::bytes(0..1000)) + .build(hash); + +// Request a HashSeq with specific child ranges +let req = GetRequest::builder() + .root(ChunkRanges::all()) // full root (the hash seq) + .child(1, ChunkRanges::bytes(0..100)) // partial child 1 + .next(ChunkRanges::all()) // full remaining children + .build_open(hash); // build_open = last range repeats forever +``` + +### GetManyRequest + +```rust +pub struct GetManyRequest { + pub hashes: Vec, // Sorted, deduplicated list of hashes + pub ranges: ChunkRangesSeq, // Ranges for each hash (no root entry) +} +``` + +Like a `GetRequest` for a HashSeq, but the hashes are provided by the requester instead of looked up from the provider. This avoids the provider needing to have a pre-existing HashSeq blob. + +```rust +let req = GetManyRequest::builder() + .hash(hash1, ChunkRanges::all()) + .hash(hash2, ChunkRanges::all()) + .build(); +// Deduplicates and sorts hashes automatically +``` + +### PushRequest + +```rust +pub struct PushRequest(GetRequest); // Wraps a GetRequest +``` + +The inverse of a GetRequest — the requester pushes data to the provider. The request describes what will be sent, followed by the actual data stream. Providers may reject push requests (disabled by default via `EventMask`). + +### ObserveRequest + +```rust +pub struct ObserveRequest { + pub hash: Hash, + pub ranges: RangeSpec, // Which ranges to observe +} +``` + +Subscribes to availability changes for a blob's bitfield. The provider sends `ObserveItem` updates as chunks become available. + +## Response Format + +### For Get/GetMany/Push + +The response is BLAKE3-verified streaming data (bao-tree format). For each blob in the request: + +1. **8-byte size header** (little-endian u64) — the total size of the blob +2. **BLAKE3 verified stream** — encoded data for the requested ranges, using bao-tree's mixed encoding: + - `BaoContentItem::Parent(node, (left_hash, right_hash))` — internal hash tree nodes (64 bytes each) + - `BaoContentItem::Leaf(Leaf { offset, data })` — actual data chunks + +The data is sent in order: ascending chunks for each blob, blobs in HashSeq order. + +**Verification**: The requester validates each chunk group against the expected BLAKE3 hash tree. Invalid data is detected within at most 16 KiB of reception. Missing data (provider doesn't have a chunk) causes the provider to close the stream at the point where data becomes unavailable. + +### For Observe + +The provider sends length-prefixed `ObserveItem` messages: + +```rust +pub struct ObserveItem { + pub size: u64, // Blob size + pub ranges: ChunkRanges, // Available chunks +} +``` + +Updates are sent as deltas — only the new chunks that have become available since the last update. + +## Error Handling + +Error codes for stream/connection closure: + +| Code | Name | Meaning | +|------|------|---------| +| 0 | StreamDropped | RecvStream was dropped | +| 1 | ProviderTerminating | Provider is shutting down | +| 2 | RequestReceived | Only one request per stream allowed | +| 1 (application) | ERR_PERMISSION | Permission denied | +| 2 (application) | ERR_LIMIT | Rate limited | +| 3 (application) | ERR_INTERNAL | Internal error | + +## Client-Side FSM (Get) + +The `get::fsm` module implements the get request as a **finite state machine** for maximum control: + +``` +AtInitial + │ (open QUIC stream) + ▼ +AtConnected + │ (send request, drop writer) + ▼ +ConnectedNext ─┬─ StartRoot(hash, ranges) // offset 0 = root blob + ├─ StartChild(offset, ranges) // offset > 0 = child blob + └─ Closing // empty request + │ +AtStartRoot / AtStartChild + │ (determine hash for child) + ▼ +AtBlobHeader + │ (read 8-byte size) + ▼ +AtBlobContent + │ (stream BLAKE3-verified items) + ├─ More(content_item) → AtBlobContent // loop + └─ Done → AtEndBlob + │ +AtEndBlob + │ (iterate to next blob in sequence) + ├─ MoreChildren(AtStartChild) + └─ Closing + │ (drain remaining bytes) + ▼ + Stats (transfer statistics) +``` + +Each state transition is explicit. The FSM gives the consumer full control: +- `AtBlobContent::next()` returns `BlobContentNext::More((content, item))` or `BlobContentNext::Done(end)` +- `AtBlobHeader::next()` reads the size header and creates a `ResponseDecoder` +- `AtStartChild::next(hash)` requires the caller to supply the hash (from the HashSeq) + +### Stats Tracking + +```rust +pub struct Stats { + pub payload_bytes_read: u64, // Actual data bytes + pub other_bytes_read: u64, // Hash pairs, headers + pub payload_bytes_written: u64, // For push + pub other_bytes_written: u64, // For push + pub elapsed: Duration, +} +``` + +## Provider-Side Handling + +```rust +pub async fn handle_connection(connection: Connection, store: Store, events: EventSender); +``` + +The provider accepts QUIC streams on a connection. For each stream: +1. Read the request type byte +2. Deserialize the request +3. Dispatch to `handle_get`, `handle_get_many`, `handle_observe`, or `handle_push` +4. For `handle_get`: iterate over the `ChunkRangesSeq`, streaming each blob via `store.export_bao(hash, ranges)` +5. For HashSeq requests: load the root blob, parse it as `HashSeq`, then stream each requested child + +### Event System + +The provider can emit events for monitoring and access control: + +```rust +pub struct EventMask { + pub connected: ConnectMode, // None, Notify, Intercept + pub get: RequestMode, // None, Notify, Intercept, NotifyLog, InterceptLog, Disabled + pub get_many: RequestMode, + pub push: RequestMode, // Disabled by default! + pub observe: ObserveMode, + pub throttle: ThrottleMode, // None, Intercept +} +``` + +- **None**: No events, requests processed normally +- **Notify**: Events sent but cannot block requests +- **Intercept**: Events sent as RPC requests; handler can reject with `AbortReason` +- **Disabled**: All requests of this type rejected + +Progress events: `TransferStarted`, `TransferProgress`, `TransferCompleted`, `TransferAborted`. + +## Collection Format + +```rust +pub struct Collection { + blobs: Vec<(String, Hash)>, // Named references to child blobs +} +``` + +Wire format (as a HashSeq blob): +1. First child blob: `CollectionMeta` serialized with postcard +2. Remaining children: the actual data blobs + +```rust +pub struct CollectionMeta { + header: [u8; 13], // Must be b"CollectionV0." + names: Vec, // Names for each child blob +} +``` + +The header `b"CollectionV0."` is a magic number for format identification. The meta blob's hash becomes the first entry in the HashSeq, followed by the hashes of each data blob. Names correspond 1:1 with data blobs (excluding the meta entry). \ No newline at end of file diff --git a/docs/research/references/iroh/iroh-blobs/04-storage.md b/docs/research/references/iroh/iroh-blobs/04-storage.md new file mode 100644 index 0000000..120a395 --- /dev/null +++ b/docs/research/references/iroh/iroh-blobs/04-storage.md @@ -0,0 +1,250 @@ +# iroh-blobs: Storage Architecture + +## Overview + +iroh-blobs provides three store implementations sharing a common `Store` API surface: + +| Store | Location | Mutable | Use Case | +|-------|----------|---------|----------| +| `MemStore` | In-memory | ✅ | Small data, testing, WASM | +| `FsStore` | Filesystem + redb | ✅ | Production, large data | +| `ReadonlyMemStore` | In-memory | ❌ | Static data serving | + +All stores implement the same RPC-based command protocol (`Command` enum), allowing both local in-process and remote RPC access through the same `Store` type. + +## Store API Surface + +The `Store` type (from `api::Store`) is the primary interface. It's accessed via typed sub-APIs: + +```rust +let store: Store = /* ... */; + +// Blob operations +store.blobs() // → Blobs API (add, export, read, delete, observe, etc.) +store.tags() // → Tags API (create, list, set, delete, rename) + +// Direct operations +store.add_bytes(data) // → AddProgress +store.add_slice(data) // → TempTag (convenience) +store.get_bytes(hash) // → Result +store.has(hash) // → bool +store.shutdown() // Clean shutdown +store.wait_idle() // Wait for all tasks to complete +store.sync_db() // Sync database to disk (FsStore) +``` + +## Blobs API + +```rust +let blobs = store.blobs(); + +// Import +blobs.add_slice(data) // → AddProgress (raw format) +blobs.add_bytes(data) // → AddProgress (raw format) +blobs.add_bytes_with_opts(AddBytesOptions{..}) // → AddProgress (with format) +blobs.import_byte_stream(format) // → streaming import + +// Export +blobs.reader(hash) // → BlobReader (AsyncRead + AsyncSeek) +blobs.export(hash, path) // → export to filesystem +blobs.export_bao(hash, ranges) // → ExportBao (BLAKE3 verified stream) +blobs.export_ranges(hash, ranges) // → ExportRanges (raw data ranges) + +// Observe (subscribe to chunk availability) +blobs.observe(hash) // → ObserveAt (bitfield stream) + +// Status +blobs.status(hash) // → BlobStatus (NotFound/Partial/Complete) + +// Import BAO-encoded data +blobs.import_bao_bytes(hash, ranges, data) // → import verified BAO stream +blobs.import_bao_reader(hash, ranges, reader) // → import from async reader + +// Batch operations (scoped temp tags) +blobs.batch() // → Batch (auto-cleanup scope) + +// Delete +blobs.delete(hashes) // → force delete (use GC normally) +``` + +## Tags API + +```rust +let tags = store.tags(); + +tags.set(name, value) // Set a persistent tag +tags.create(value) // Auto-generate a tag name, return Tag +tags.get(name) // → Option +tags.list() // → Stream +tags.list_hash_seq() // → Stream (only HashSeq format) +tags.delete(name) // Delete a tag +tags.delete_range(range) // Delete tags in range +tags.delete_prefix(prefix) // Delete tags with prefix +tags.rename(from, to) // Atomically rename a tag +tags.temp_tag(value) // → TempTag (ephemeral protection) +``` + +## MemStore Architecture + +The in-memory store uses a simple actor pattern: + +``` +MemStore (ApiClient) + │ + └── Actor (tokio task) + ├── State + │ ├── data: HashMap // All blob data + │ ├── tags: BTreeMap // Persistent tags + │ └── empty_hash: BaoFileHandle // Special entry for empty blob + ├── tasks: JoinSet // Spawned import/export tasks + ├── temp_tags: TempTags // Ephemeral protection + ├── protected: HashSet // GC-protected hashes + └── idle_waiters: Vec> // Wait-idle notifications +``` + +### BaoFileHandle / BaoFileStorage + +```rust +pub enum BaoFileStorage { + Partial(PartialMemStorage), // Still downloading + Complete(CompleteStorage), // Fully available +} + +pub struct PartialMemStorage { + data: SparseMemFile, // Sparse byte array for data + outboard: SparseMemFile, // Sparse byte array for BLAKE3 hash tree + size: SizeInfo, // Known/estimated size + bitfield: Bitfield, // Which chunks are verified +} + +pub struct CompleteStorage { + data: Bytes, // Complete data + outboard: Bytes, // Complete outboard (hash tree) +} +``` + +The `watch::Sender` pattern allows subscribers to observe state changes (for the `observe` API). + +### Data Flow (Import) + +1. `add_bytes(data)` → compute outboard via `PreOrderMemOutboard::create()` → transition `Partial → Complete` +2. `import_bao(hash, size, stream)` → receive `BaoContentItem` stream → write to `PartialMemStorage` → update bitfield → transition to `Complete` when all chunks present + +### Data Flow (Export) + +1. `export_bao(hash, ranges)` → look up `BaoFileHandle` → `traverse_ranges_validated(data, outboard, &ranges, tx)` — streams validated BAO data + +## FsStore Architecture (Hybrid Store) + +The filesystem store uses a **hybrid approach** that stores small data inline in redb and large data as files on disk. + +### Design Rationale (from DESIGN.md) + +- **Databases** are good for small blobs (low per-entry overhead, fast random access) +- **Filesystems** are good for large blobs (OS-level caching, direct file access) +- **Neither alone** works well for both cases + +### Layout + +``` +/ +├── db/ # redb database +│ ├── metadata table # Hash → EntryState +│ ├── inline_data table # Hash → Bytes (for small blobs) +│ ├── inline_outboard table # Hash → Bytes (for small outboards) +│ └── tags table # Tag → HashAndFormat +├── data/.data # Large blob data files +├── data/.outboard # Large outboard files +├── data/.sizes # Size tracking for partial files +└── data/.bitfield # Validated chunk tracking for partial files +``` + +### EntryState + +```rust +// Simplified from src/store/fs/entry_state.rs +pub enum EntryState { + Complete(CompleteEntryState), + Partial(PartialEntryState), +} + +pub struct CompleteEntryState { + pub data: DataLocation, // Inline, Owned (canonical path), or External (user path) + pub outboard: OutboardLocation, // Inline, Owned, or NotNeeded + pub size: u64, +} + +pub enum DataLocation { + Inline, // Stored in redb inline_data table + Owned, // File at canonical path .data + External(Vec), // User-owned file paths +} + +pub enum OutboardLocation { + Inline, // Stored in redb inline_outboard table + Owned, // File at canonical path .outboard + NotNeeded, // Data ≤ 16 KiB, no outboard needed +} + +pub struct PartialEntryState { + // Either we know the verified size, or we don't yet + pub verified_size: Option, +} +``` + +### Thresholds + +- **Data inline threshold**: 16 KiB (default) — blobs smaller than this are stored entirely in redb +- **Outboard inline threshold**: 16 KiB (default) — outboards smaller than this are stored in redb +- Data ≤ 16 KiB has no outboard (not needed for verification of a single chunk group) + +### Blob Lifecycle + +**Adding a local file (known data, unknown hash)**: +1. Compute the full BLAKE3 hash and outboard +2. Atomically move the file into the store under the hash name +3. Apply inlining rules: small files → redb, large files → filesystem + +**Syncing from remote (known hash, unknown data)**: +1. Start with no data — keep state in memory (not in database) +2. As chunks arrive, write incrementally to partial files +3. Once size is known to exceed the inline threshold, create database entry + filesystem files +4. On completion, transition to `Complete` state and apply inlining rules + +**Deletion**: +- Tags protect content from GC +- `TempTag` provides ephemeral (process-lifetime) protection +- HashSeq tags protect the root blob AND all referenced child blobs +- GC is mark-and-sweep: mark all reachable content via tags → sweep (delete) everything else +- Explicit `force` deletion bypasses protection (emergency use only) + +### FsStore Actor Architecture + +``` +FsStore (ApiClient) + │ + └── MainActor (tokio task) + ├── TaskContext { config, db_actor_sender } + ├── EntityMap: HashMap // Currently active entities + ├── JoinSet // Running tasks + ├── TempTags // Ephemeral protection + ├── ProtectedSet // GC protection + └── idle_waiters +``` + +The FsStore uses an **entity manager** pattern where each hash gets a `BaoFileHandle` (like MemStore) when active, and entries are cleaned up when tasks complete. + +## Garbage Collection + +```rust +pub struct GcConfig { + pub interval: Duration, + pub add_protected: Option, // Optional callback to add more protected hashes +} +``` + +GC is a two-phase process: +1. **Mark**: Walk all tags (persistent + temp), collect reachable hashes. For HashSeq format, traverse the hash sequence to find all child hashes. +2. **Sweep**: Delete all blobs not in the reachable set, in batches of 100. + +GC runs automatically at a configurable interval via `run_gc(store, config)`, or manually via `gc_run_once(store, live)`. \ No newline at end of file diff --git a/docs/research/references/iroh/iroh-blobs/05-remote-and-downloader.md b/docs/research/references/iroh/iroh-blobs/05-remote-and-downloader.md new file mode 100644 index 0000000..74bed91 --- /dev/null +++ b/docs/research/references/iroh/iroh-blobs/05-remote-and-downloader.md @@ -0,0 +1,202 @@ +# iroh-blobs: Remote API and Downloader + +## Remote API + +The `Remote` type (`api::remote::Remote`) provides the client-side interface for interacting with remote iroh-blobs providers. It's a thin wrapper around `ApiClient` that exposes fetch, observe, and push operations. + +```rust +let remote = store.remote(); // or Remote::from_sender(client) + +// Get local info about what we already have +let local = remote.local(hash_and_format).await?; + +// Compute what we need +let missing = local.missing(); + +// Execute a download +let stats = remote.execute_get(connection, request).await?; + +// Or use the simpler fetch API +let progress = remote.fetch(connection, hash, format, store); +``` + +### LocalInfo + +```rust +pub struct LocalInfo { + pub size: Option, // Total size if known + pub present: ChunkRanges, // Chunks we already have + pub missing: ChunkRanges, // Chunks we still need + pub hash_and_format: HashAndFormat, +} +``` + +`LocalInfo` is computed by querying the local store's bitfield for a given hash and comparing it against what a full download would require. + +### Fetch Process + +The `fetch` method handles the complete lifecycle: + +1. **Local check**: Query the store for what we already have +2. **Request computation**: If format is HashSeq, read the local HashSeq to compute precise missing ranges +3. **Connection**: Open a QUIC stream to the provider +4. **Transfer**: Use the get FSM to stream data into the store +5. **Verification**: BLAKE3 verification happens in-stream during the transfer + +For HashSeq format: +- First fetch the root blob (the HashSeq) +- Parse it to get child hashes +- For each child, check local availability and compute missing ranges +- Fetch only what's missing + +### Observe + +```rust +// Subscribe to bitfield updates from a remote provider +let mut stream = remote.observe(connection, hash).stream().await?; +while let Some(bitfield) = stream.next().await { + // Process availability updates +} +``` + +The observe protocol sends `ObserveItem` messages (size + available ranges) whenever new chunks become available on the provider. The initial message contains the full current state, subsequent messages contain deltas. + +### Push + +```rust +// Push local data to a remote provider +let progress = remote.push(connection, request, store); +``` + +Push uses the same FSM-style approach but in reverse — the local side reads from the store and writes BLAKE3-verified data to the QUIC stream. + +## Downloader API + +The `Downloader` (`api::downloader::Downloader`) coordinates downloads from multiple sources: + +```rust +let downloader = Downloader::new(store, endpoint); + +// Download from specific providers +let progress = downloader.download(DownloadRequest { + request: FiniteRequest::Get(get_request), + providers: vec![endpoint_id_1, endpoint_id_2], + strategy: SplitStrategy::Split, +}).stream(); +``` + +### SplitStrategy + +```rust +pub enum SplitStrategy { + Split, // Split the request across multiple providers + None, // Use a single provider +} +``` + +When `SplitStrategy::Split` is used, the downloader: +1. Splits the `GetRequest` into per-child requests +2. Distributes children across available providers +3. Downloads in parallel from multiple sources +4. Stores each completed child into the local store + +### DownloadRequest + +```rust +pub struct DownloadRequest { + pub request: FiniteRequest, // What to download + pub providers: Vec, // Who to download from + pub strategy: SplitStrategy, // How to split work +} + +pub enum FiniteRequest { + Get(GetRequest), + GetMany(GetManyRequest), +} +``` + +### Download Progress + +```rust +pub enum DownloadProgressItem { + TryProvider { id: EndpointId, request: Arc }, + ProviderFailed { id: EndpointId, request: Arc }, + PartComplete { request: Arc }, + Progress(u64), + DownloadError, +} +``` + +## Connection Pooling + +The `util::connection_pool::ConnectionPool` manages reusable QUIC connections: + +```rust +let pool = ConnectionPool::new(endpoint, ALPN, options); +let connection = pool.connect(endpoint_id).await?; +``` + +Options include connection timeout, idle timeout, and maximum connections per peer. + +## Integration with iroh + +### BlobsProtocol + +```rust +// src/net_protocol.rs +pub struct BlobsProtocol { + inner: Arc, // (Store, EventSender) +} + +impl ProtocolHandler for BlobsProtocol { + async fn accept(&self, conn: Connection) -> Result<(), AcceptError> { + crate::provider::handle_connection(conn, store, events).await; + Ok(()) + } + async fn shutdown(&self) { /* shutdown store */ } +} +``` + +Usage with iroh Router: + +```rust +let endpoint = Endpoint::bind(presets::N0).await?; +let store = MemStore::new(); // or FsStore::load(path).await? +let blobs = BlobsProtocol::new(&store, None); +let router = Router::builder(endpoint) + .accept(iroh_blobs::ALPN, blobs) + .spawn(); +``` + +### Creating a BlobTicket + +```rust +let endpoint = Endpoint::bind(presets::N0).await?; +endpoint.online().await; +let addr = endpoint.addr(); + +let tag = store.add_slice(b"hello world").await?; +let ticket = BlobTicket::new(addr, tag.hash, tag.format); +println!("Share this: {ticket}"); +``` + +### Fetching from a Ticket + +```rust +// On the requester side +let ticket: BlobTicket = ticket_str.parse()?; +let (addr, hash, format) = ticket.into_parts(); + +let endpoint = Endpoint::bind(presets::N0).await?; +let conn = endpoint.connect(addr, iroh_blobs::ALPN).await?; + +let request = match format { + BlobFormat::Raw => GetRequest::blob(hash), + BlobFormat::HashSeq => GetRequest::all(hash), +}; + +// Use the get FSM +let fsm = get::fsm::start(conn, request, RequestCounters::default()); +let connected = fsm.next().await?; +// ... drive the FSM to completion +``` \ No newline at end of file diff --git a/docs/research/references/iroh/iroh-blobs/06-data-flow-and-examples.md b/docs/research/references/iroh/iroh-blobs/06-data-flow-and-examples.md new file mode 100644 index 0000000..f52cb97 --- /dev/null +++ b/docs/research/references/iroh/iroh-blobs/06-data-flow-and-examples.md @@ -0,0 +1,312 @@ +# iroh-blobs: Data Flow and Complete Example + +## Complete Data Flow: Provider Side + +``` + QUIC Connection Arrives + │ + ▼ + handle_connection(conn, store, events) + │ + ┌──────────┴──────────┐ + │ Accept QUIC BIDI │ + │ streams in loop │ + └──────────┬──────────┘ + │ + handle_stream(pair, store) + │ + ┌──────────┴──────────┐ + │ Read Request type │ + │ byte + deserialize │ + └──────────┬──────────┘ + │ + ┌─────────────┬───────┼───────┬──────────────┐ + │ │ │ │ │ + handle_get handle_get handle handle (reserved) + _many _observe _push + │ │ │ │ + ▼ ▼ ▼ ▼ + ┌─────────────────────────────────────────────────┐ + │ For each (offset, ranges) in request.ranges: │ + │ │ + │ if offset == 0: │ + │ send_blob(store, 0, hash, ranges, writer) │ + │ else: │ + │ lookup hash in HashSeq[offset-1] │ + │ send_blob(store, offset, child_hash, ranges, writer) │ + │ │ + │ send_blob: │ + │ store.export_bao(hash, ranges) │ + │ .write_with_progress(writer, ctx, &hash, idx) │ + └─────────────────────────────────────────────────┘ +``` + +## Complete Data Flow: Requester Side (Get FSM) + +``` + Create GetRequest + │ + ▼ + fsm::start(connection, request, counters) + │ + ▼ + AtInitial.next() + │ (open_bi, send request) + ▼ + AtConnected.next() + │ + ┌───────────┼───────────┐ + │ │ │ + StartRoot StartChild Closing + (offset=0) (offset>0) (empty) + │ │ │ + ▼ ▼ ▼ + AtBlobHeader AtBlobHeader AtClosing + .next() .next(hash) .next() + │ │ │ + ▼ ▼ ▼ + (size, AtBlobContent) Stats + │ + ┌────────┴────────┐ + │ │ + More(item) Done + (loop back to (AtEndBlob) + AtBlobContent) │ + ┌─────┼─────┐ + │ │ + MoreChildren Closing + (AtStartChild) (AtClosing) + │ │ + └───────────┘ +``` + +### Blob Content Items + +During `AtBlobContent`, items arrive as `BaoContentItem`: + +```rust +pub enum BaoContentItem { + Parent(ParentNode), // (node, (left_hash, right_hash)) — 64 bytes + Leaf(Leaf), // { offset: u64, data: Bytes } — actual data +} +``` + +- **Parent nodes** contain BLAKE3 hash pairs for tree verification. They're overhead (~64 bytes per internal node). +- **Leaf nodes** contain actual data chunks. Each leaf's data is at most `IROH_BLOCK_SIZE` bytes (16 KiB). + +Verification is automatic: the `ResponseDecoder` from `bao-tree` validates each chunk against the expected hash tree rooted at the request hash. + +## Blob Verification and BaoTree Encoding + +### How BLAKE3 Verified Streaming Works + +1. **The hash is the root** of a binary Merkle tree +2. **Internal nodes** store `(left_child_hash, right_child_hash)` — 64 bytes each +3. **Leaf nodes** store the actual data chunks (up to 1024 bytes each in standard BLAKE3, or 16 KiB in iroh's block size) +4. **Chunk groups** (16 chunks = 16 KiB) are the minimum verification unit in iroh-blobs + +For a request with specific ranges: +- The provider traverses the tree, yielding only nodes needed to verify the requested ranges +- The requester can verify each chunk group independently after receiving its parent hash pair +- Maximum undetected corruption: 16 KiB (one chunk group) + +### Outboard Storage + +The **outboard** is the BLAKE3 hash tree stored separately from the data. For the provider: +- Small blobs (≤16 KiB): outboard is empty (not needed, single chunk group) +- Large blobs: outboard stored as `PreOrderMemOutboard` (in-memory) or as a file (filesystem store) + +For the requester, the outboard is built incrementally as data arrives. + +## Import and Export Flows + +### Import Bytes (Local Data) + +``` +add_bytes(data) / add_slice(data) + │ + ▼ +ImportBytesRequest { data, format, scope } + │ + ▼ +Actor::import_bytes() + │ 1. Send AddProgressItem::Size(len) + │ 2. Send AddProgressItem::CopyDone + │ 3. Compute outboard: PreOrderMemOutboard::create(&data, IROH_BLOCK_SIZE) + │ 4. Return ImportEntry { data, outboard, scope, format, tx } + │ + ▼ +Actor::finish_import() + │ 1. Get hash from outboard.root() + │ 2. Get or create BaoFileHandle for hash + │ 3. Transition BaoFileStorage::Partial → Complete + │ 4. Create TempTag for the hash_and_format + │ 5. Send AddProgressItem::Done(temp_tag) +``` + +### Import BAO Stream (Remote Data) + +``` +import_bao_bytes(hash, ranges, data) / import_bao_reader(hash, ranges, reader) + │ + ▼ +ImportBaoRequest { hash, size } + │ + ▼ +Actor::import_bao() + │ 1. Set size on partial entry + │ 2. Create BaoTree for the size + │ 3. For each BaoContentItem from stream: + │ - Parent: write hash pair to outboard + │ - Leaf: write data to storage, update bitfield + │ - If bitfield becomes complete: transition Partial → Complete + │ 4. Send result +``` + +### Export BAO + +``` +export_bao(hash, ranges) → ExportBao + │ + ▼ +Actor::export_bao() + │ 1. Look up BaoFileHandle for hash + │ 2. If not found: send EncodeError::NotFound and return + │ 3. Create BaoTreeSender from data + outboard readers + │ 4. Call traverse_ranges_validated(data, outboard, &ranges, tx) + │ → streams validated BAO items to the sender +``` + +### Export Path (To Filesystem) + +``` +export(hash, target_path) → ExportPath + │ + ▼ +Actor::export_path() + │ 1. Look up BaoFileHandle for hash + │ 2. Create parent directories if needed + │ 3. Create file at target_path + │ 4. Send ExportProgressItem::Size(total_size) + │ 5. Read data from store in 64 KiB chunks + │ 6. Write to file, yielding ExportProgressItem::CopyProgress(offset) + │ 7. Send ExportProgressItem::Done +``` + +## Observe Protocol Detail + +``` +Requester Provider + │ │ + │ ObserveRequest {hash, ranges} │ + │─────────────────────────────────►│ + │ │ + │ ObserveItem {size, ranges} │ (initial state) + │◄─────────────────────────────────│ + │ │ + │ ... (time passes, more data │ + │ becomes available) │ + │ │ + │ ObserveItem {size, ranges} │ (delta update) + │◄─────────────────────────────────│ + │ │ + │ ... (continue until │ + │ requester stops │ + │ or connection closes) │ + │ │ + │ STOP_STREAM │ + │─────────────────────────────────►│ +``` + +The observe protocol uses `Bitfield::diff()` to send only the new chunks since the last update, minimizing bandwidth. + +## Full Working Example + +```rust +use iroh::{protocol::Router, Endpoint, endpoint::presets}; +use iroh_blobs::{store::mem::MemStore, BlobsProtocol, ticket::BlobTicket, BlobFormat}; + +// === Provider Side === +async fn provider() -> anyhow::Result<()> { + let endpoint = Endpoint::bind(presets::N0).await?; + let store = MemStore::new(); + + // Add some data + let tag = store.add_slice(b"Hello, iroh-blobs!").await?; + + let _ = endpoint.online().await; + let addr = endpoint.addr(); + + // Create ticket for sharing + let ticket = BlobTicket::new(addr, tag.hash, BlobFormat::Raw); + println!("Ticket: {ticket}"); + + // Start serving + let blobs = BlobsProtocol::new(&store, None); + let router = Router::builder(endpoint) + .accept(iroh_blobs::ALPN, blobs) + .spawn(); + + tokio::signal::ctrl_c().await?; + router.shutdown().await?; + Ok(()) +} + +// === Requester Side === +async fn requester(ticket: BlobTicket) -> anyhow::Result<()> { + let (addr, hash, format) = ticket.into_parts(); + + let endpoint = Endpoint::bind(presets::N0).await?; + let conn = endpoint.connect(addr, iroh_blobs::ALPN).await?; + + // Build request based on format + let request = match format { + BlobFormat::Raw => iroh_blobs::protocol::GetRequest::blob(hash), + BlobFormat::HashSeq => iroh_blobs::protocol::GetRequest::all(hash), + }; + + // Use the get FSM + let start = iroh_blobs::get::fsm::start(conn, request, Default::default()); + let connected = start.next().await?; + let connected = connected.next().await?; + + match connected { + iroh_blobs::get::fsm::ConnectedNext::StartRoot(at_root) => { + let (at_content, size) = at_root.next().next().await?; + let (at_end, data) = at_content.concatenate_into_vec().await?; + println!("Got {} bytes: {:?}", size, data); + // ... + } + iroh_blobs::get::fsm::ConnectedNext::StartChild(at_child) => { + // Need to know the child hash + } + iroh_blobs::get::fsm::ConnectedNext::Closing(at_closing) => { + println!("Empty response"); + } + } + + Ok(()) +} +``` + +## Simplified Fetch (Using Store + Remote) + +```rust +// The simplest way to download data +let store = MemStore::new(); +let remote = store.remote(); + +// Fetch with automatic local availability checking +let result = remote.fetch(connection, hash, format, &store).await?; +// Result includes Stats with transfer metrics +``` + +## Key Error Types + +| Error Type | Location | Purpose | +|------------|----------|---------| +| `GetError` | `get::error` | Errors during get FSM | +| `ExportBaoError` | `api` | Errors during BAO export | +| `RequestError` | `api` | Store command errors | +| `DecodeError` | `get::fsm` | BAO stream decode errors | +| `ProgressError` | `provider::events` | Provider event errors | \ No newline at end of file diff --git a/docs/research/references/iroh/iroh-blobs/README.md b/docs/research/references/iroh/iroh-blobs/README.md new file mode 100644 index 0000000..e839cf8 --- /dev/null +++ b/docs/research/references/iroh/iroh-blobs/README.md @@ -0,0 +1,60 @@ +# iroh-blobs Reference Documentation + +This directory contains a comprehensive reference for the `iroh-blobs` crate (v0.100.0), a Rust library for content-addressed blob transfer over QUIC connections using BLAKE3 verified streaming. + +## Documents + +1. **[Overview and Architecture](01-overview-and-architecture.md)** — Core concepts, module structure, feature flags, and architecture diagram. Start here. + +2. **[Key Types and Data Structures](02-key-types.md)** — Detailed reference for `Hash`, `BlobFormat`, `HashAndFormat`, `HashSeq`, `Bitfield`, `Tag`, `TempTag`, `BlobTicket`, `ChunkRanges`/`ChunkRangesSeq`/`RangeSpec`, and the store command protocol. + +3. **[Transfer Protocol](03-transfer-protocol.md)** — Wire protocol specification: request types (`GetRequest`, `GetManyRequest`, `PushRequest`, `ObserveRequest`), response format (BLAKE3 verified streaming), the client-side FSM, provider handling, event system, and the Collection format. + +4. **[Storage Architecture](04-storage.md)** — Store implementations: `MemStore` (in-memory), `FsStore` (hybrid redb + filesystem), `ReadonlyMemStore`. Covers the actor pattern, `BaoFileHandle`/`BaoFileStorage`, partial/complete states, the hybrid inline/file approach, entry states, blob lifecycle, and garbage collection. + +5. **[Remote API and Downloader](05-remote-and-downloader.md)** — `Remote` API for fetching from/observing/pushing to peers, `Downloader` for multi-source downloads, connection pooling, and iroh integration via `BlobsProtocol`. + +6. **[Data Flow and Examples](06-data-flow-and-examples.md)** — End-to-end data flow diagrams for provider and requester sides, BLAKE3 verification mechanics, import/export flows, observe protocol detail, and complete working examples. + +## Quick Reference + +### Creating a Provider + +```rust +use iroh::{protocol::Router, Endpoint, endpoint::presets}; +use iroh_blobs::{store::mem::MemStore, BlobsProtocol}; + +let endpoint = Endpoint::bind(presets::N0).await?; +let store = MemStore::new(); +let tag = store.add_slice(b"data").await?; +let blobs = BlobsProtocol::new(&store, None); +let router = Router::builder(endpoint) + .accept(iroh_blobs::ALPN, blobs) + .spawn(); +``` + +### Key Constants + +| Constant | Value | Meaning | +|----------|-------|---------| +| `ALPN` | `b"/iroh-bytes/4"` | QUIC ALPN protocol identifier | +| `IROH_BLOCK_SIZE` | `BlockSize::from_chunk_log(4)` | 16 KiB chunk groups | +| `MAX_MESSAGE_SIZE` | `1 MiB` | Maximum request message size | +| `Hash::EMPTY` | BLAKE3 of `b""` | Hash of the empty blob | + +### Core Crate Exports + +```rust +pub use hash::{BlobFormat, Hash, HashAndFormat}; +pub use hashseq::HashSeq; +pub use net_protocol::BlobsProtocol; +pub use protocol::ALPN; +pub mod api; // Store API, Blobs, Tags, Downloader, Remote +pub mod format; // Collection type +pub mod get; // Client-side FSM +pub mod protocol; // Wire protocol types (GetRequest, etc.) +pub mod provider; // Server-side handling +pub mod store; // Storage implementations +pub mod ticket; // BlobTicket +pub mod util; // Connection pool, temp tags, stream helpers +``` \ No newline at end of file diff --git a/docs/research/references/iroh/iroh-docs/01-overview-and-architecture.md b/docs/research/references/iroh/iroh-docs/01-overview-and-architecture.md new file mode 100644 index 0000000..8cfe362 --- /dev/null +++ b/docs/research/references/iroh/iroh-docs/01-overview-and-architecture.md @@ -0,0 +1,98 @@ +# iroh-docs: Overview and Architecture + +> Reference document for the `iroh-docs` crate (v0.98.0). +> Source: `/workspace/iroh-docs` + +## What Is iroh-docs? + +`iroh-docs` is a Rust crate implementing **multi-dimensional key-value documents with an efficient synchronization protocol**. It provides: + +1. **A CRDT-based document model** — Replicas (documents) hold entries identified by namespace + author + key, with content-addressed values (BLAKE3 hashes). +2. **Range-based set reconciliation** — An efficient sync protocol based on [Aljoscha Meyer's paper](https://arxiv.org/abs/2212.13567) for reconciling sets between peers. +3. **Live sync via gossip** — Real-time document updates propagated through an iroh-gossip swarm. +4. **Persistent storage** — A `redb`-backed store supporting both in-memory and file-based modes. + +## High-Level Architecture + +``` +┌──────────────────────────────────────────────────────────────┐ +│ Docs (Protocol) │ +│ ┌─────────────────────────────────────────────────────────┐ │ +│ │ Engine │ │ +│ │ ┌──────────┐ ┌──────────────┐ ┌───────────────────┐ │ │ +│ │ │ LiveActor│ │ GossipState │ │ SyncHandle/Actor │ │ │ +│ │ │ (events) │ │ (iroh-gossip)│ │ (store + sync) │ │ │ +│ │ └──────────┘ └──────────────┘ └───────────────────┘ │ │ +│ └─────────────────────────────────────────────────────────┘ │ +│ │ +│ ┌────────────────┐ ┌────────────────┐ ┌────────────────┐ │ +│ │ Replica │ │ SignedEntry │ │ Author/ │ │ +│ │ (sync.rs) │ │ Entry/Record │ │ Namespace keys │ │ +│ └────────────────┘ └────────────────┘ └────────────────┘ │ +│ │ +│ ┌─────────────────────────────────────────────────────────┐ │ +│ │ Store (redb) │ │ +│ │ Authors │ Namespaces │ Records │ RecordsByKey │ ... │ │ +│ └─────────────────────────────────────────────────────────┘ │ +└──────────────────────────────────────────────────────────────┘ +``` + +### Module Layout + +| Module | Purpose | +|--------|---------| +| `sync.rs` | Core types: `Replica`, `Entry`, `SignedEntry`, `Record`, `RecordIdentifier`, `Capability`, events | +| `keys.rs` | Cryptographic key types: `Author`, `NamespaceSecret`, `AuthorId`, `NamespaceId` | +| `ranger.rs` | Range-based set reconciliation algorithm implementation | +| `heads.rs` | `AuthorHeads` — latest timestamps per author for efficient sync decisions | +| `store/` | Storage abstraction and `redb`-backed persistent store | +| `store/fs.rs` | File-based `Store` implementation with redb tables | +| `store/pubkeys.rs` | `PublicKeyStore` trait for caching expanded ed25519 public keys | +| `actor.rs` | `SyncHandle` / Actor — single-threaded executor for store and replica operations | +| `engine/` | Live sync coordination: `Engine`, `LiveActor`, `GossipState`, `NamespaceStates` | +| `engine/live.rs` | The `LiveActor` event loop: handles sync, gossip, content download | +| `engine/gossip.rs` | Integration with `iroh-gossip` for broadcasting document operations | +| `engine/state.rs` | `NamespaceStates` — tracks per-namespace, per-peer sync state | +| `net/` | Network protocol: ALPN `/iroh-sync/1`, connection handling | +| `net/codec.rs` | Wire codec: length-prefixed postcard-serialized `Message` frames | +| `protocol.rs` | `Docs` struct (the `ProtocolHandler`) and `Builder` | +| `api/` | irpc-based RPC API for external access | +| `ticket.rs` | `DocTicket` — shareable document capability + peer addresses | + +## Key Design Principles + +1. **Two-key identity model**: Every entry is uniquely identified by (namespace, author, key). The namespace key provides write authorization; the author key provides attribution. + +2. **Content-addressed values**: Entries store a BLAKE3 hash + length, not the actual content. Content blobs are handled separately by `iroh-blobs`. + +3. **Prefix deletion**: An entry with key "foo" acts as a tombstone for all entries whose keys start with "foo/" (prefix deletion semantics). This enables hierarchical key structures. + +4. **Last-writer-wins with per-author timestamps**: Entries are ordered by (timestamp, hash). Newer entries dominate older ones. Different authors can have entries for the same key simultaneously (multi-dimensional). + +5. **Actor-based concurrency**: All store and replica mutations go through a single `SyncHandle` actor thread, eliminating the need for locks on the store. + +6. **Event-driven live sync**: The `LiveActor` coordinates gossip, direct sync, and content downloads through a `tokio::select!` event loop. + +## Dependencies + +Key dependencies from `Cargo.toml`: + +| Crate | Purpose | +|-------|---------| +| `iroh` | Networking: endpoints, connections, protocol routing | +| `iroh-blobs` | Content-addressed blob storage and transfer | +| `iroh-gossip` | Gossip protocol for broadcasting updates | +| `iroh-tickets` | Ticket-based sharing mechanism | +| `redb` | Embedded key-value store for persistence | +| `ed25519-dalek` | Ed25519 signatures for entries | +| `blake3` | Hashing (fingerprints + content hashes) | +| `postcard` | Serialization (wire format for sync protocol) | +| `irpc` / `noq` | RPC framework for API | + +## Feature Flags + +| Feature | Default | Description | +|---------|---------|-------------| +| `metrics` | Yes | Enables iroh-metrics instrumentation | +| `rpc` | Yes | Enables irpc-based RPC API (depends on `noq`) | +| `fs-store` | Yes | Enables persistent file-based store | \ No newline at end of file diff --git a/docs/research/references/iroh/iroh-docs/02-document-model.md b/docs/research/references/iroh/iroh-docs/02-document-model.md new file mode 100644 index 0000000..9182537 --- /dev/null +++ b/docs/research/references/iroh/iroh-docs/02-document-model.md @@ -0,0 +1,201 @@ +# iroh-docs: Document Model and CRDT Details + +## Core Data Model + +### Namespace (Document Identity) + +A **Namespace** is the identity of a document. It consists of: + +- **`NamespaceSecret`** — An Ed25519 signing key (32 bytes) that grants write capability +- **`NamespacePublicKey`** — The corresponding verifying key (32 bytes) +- **`NamespaceId`** — A `[u8; 32]` that is the byte representation of the public key; this serves as the unique identifier for a document/replica + +``` +NamespaceSecret (signing key) ──derives──▶ NamespacePublicKey (verifying key) + ──into─────▶ NamespaceId ([u8; 32]) +``` + +### Author (Writer Identity) + +An **Author** represents a writer identity within a document. Multiple authors can write to the same namespace. + +- **`Author`** — An Ed25519 signing key (32 bytes) +- **`AuthorPublicKey`** — The corresponding verifying key (32 bytes) +- **`AuthorId`** — A `[u8; 32]` byte representation of the public key + +Authors are application-defined: an application might create one author per device, per user, or per session. + +### Capability + +Access to a document is controlled through a `Capability`: + +```rust +pub enum Capability { + Write(NamespaceSecret), // Full read-write access + Read(NamespaceId), // Read-only access (can sync but not insert) +} +``` + +Capabilities can be **merged** — a `Read` capability can be upgraded to `Write` if a matching `Write` is presented: + +```rust +capability.merge(other_capability) // Read + Write → Write +``` + +The raw representation is `(u8, [u8; 32])` — a kind byte followed by 32 bytes of key material. + +### Entry (The Fundamental Record) + +An **`Entry`** is the core data unit, consisting of: + +```rust +pub struct Entry { + id: RecordIdentifier, // (namespace, author, key) + record: Record, // (hash, len, timestamp) +} +``` + +#### RecordIdentifier + +```rust +pub struct RecordIdentifier(Bytes); // namespace[0..32] || author[32..64] || key[64..] +``` + +The key is a variable-length byte sequence. `RecordIdentifier` implements `Ord` by comparing namespace first, then author, then key — this ordering is critical for the range-based sync algorithm. + +#### Record + +```rust +pub struct Record { + len: u64, // byte length of the content + hash: Hash, // BLAKE3 hash of the content (32 bytes) + timestamp: u64, // microseconds since Unix epoch +} +``` + +The `Record` comparison uses `(timestamp, hash)` ordering — this is the **Last-Writer-Wins** rule for same-key entries. When two records for the same key exist, the one with the higher timestamp wins; if timestamps are equal, the higher hash wins as a tiebreaker. + +### SignedEntry (Entry with Proofs) + +```rust +pub struct SignedEntry { + signature: EntrySignature, // dual Ed25519 signatures + entry: Entry, +} +``` + +#### EntrySignature + +```rust +pub struct EntrySignature { + author_signature: Signature, // 64-byte Ed25519 signature + namespace_signature: Signature, // 64-byte Ed25519 signature +} +``` + +Both signatures cover the canonical byte encoding of the `Entry` (id + record). This means: +- The **namespace signature** proves write authorization (only holders of `NamespaceSecret` can produce valid entries) +- The **author signature** proves authorship (provides attribution and non-repudiation) + +#### Verification + +```rust +fn verify(&self, store: &S) -> Result<(), SignatureError> +``` + +Verification requires both the `NamespacePublicKey` and `AuthorPublicKey`, which are derived from the entry's namespace and author IDs. The `PublicKeyStore` trait provides caching for these expanded keys. + +### Empty Entries (Tombstones / Prefix Deletion) + +An entry is **empty** when `hash == Hash::EMPTY && len == 0`. Empty entries serve as **deletion markers**: + +- **Key deletion**: Inserting an empty entry with the exact key removes the previous entry for that key +- **Prefix deletion**: Inserting an empty entry with key "foo" removes all entries whose keys start with "foo" (prefix deletion) + +```rust +pub async fn delete_prefix(&mut self, prefix: impl AsRef<[u8]>, author: &Author) -> Result +``` + +### Insert Semantics (CRDT Rules) + +When a `SignedEntry` is inserted into a replica via `Store::put()` (the ranger store trait): + +1. **Check prefixes**: Look up all existing entries whose key is a **prefix** of the new entry's key. If any prefix entry has a value `>=` the new entry's value, the new entry is **rejected** (`InsertOutcome::NotInserted`). + +2. **Remove dominated entries**: Remove all existing entries whose key **starts with** the new entry's key (i.e., the new key is a prefix of theirs) AND whose value is `<=` the new entry's value. + +3. **Insert**: If not rejected, the new entry is stored. + +This implements a **prefix-aware last-writer-wins** CRDT: +- Newer entries for the same (namespace, author, key) tuple replace older ones +- A new entry at key "/foo" can delete all entries under "/foo/*" if it's newer +- Different authors can coexist on the same key — each author's latest entry is kept + +### Timestamp and Future Shift + +Timestamps are in **microseconds since Unix epoch**. There is a maximum allowed future shift: + +```rust +pub const MAX_TIMESTAMP_FUTURE_SHIFT: u64 = 10 * 60 * Duration::from_secs(1).as_millis() as u64; +``` + +Entries with timestamps more than 10 minutes in the future of the local clock are rejected during validation. + +### Content Status + +Each entry's content has an availability status: + +```rust +pub enum ContentStatus { + Complete, // Content blob is fully available locally + Incomplete, // Partially available + Missing, // Not available +} +``` + +This status is communicated during sync to help peers decide whether to download content. + +### AuthorHeads (Efficient Sync Optimization) + +`AuthorHeads` tracks the latest timestamp for each author in a document: + +```rust +pub struct AuthorHeads { + heads: BTreeMap, +} +``` + +This enables a quick check: `has_news_for(other)` — comparing local and remote heads to determine whether sync would yield any new entries. If all timestamps are at least as recent locally, no sync is needed. + +`AuthorHeads` can be serialized with a size limit, dropping the oldest entries when the limit is exceeded. + +## Event System + +Replicas emit events through a subscription system: + +```rust +pub enum Event { + LocalInsert { + namespace: NamespaceId, + entry: SignedEntry, + }, + RemoteInsert { + namespace: NamespaceId, + entry: SignedEntry, + from: PeerIdBytes, + should_download: bool, // based on download policy + remote_content_status: ContentStatus, + }, +} +``` + +Subscribers use `async_channel` for non-blocking notification delivery. The `ReplicaInfo::subscribe()` method registers a sender, and events are fanned out to all subscribers. + +## Validation + +Entry validation during insertion checks: + +1. **Namespace match**: The entry's namespace must match the replica's namespace +2. **Signature verification**: For non-local entries, both namespace and author signatures are verified +3. **Timestamp check**: The entry must not be more than `MAX_TIMESTAMP_FUTURE_SHIFT` in the future +4. **Empty entry check**: An empty entry must have `hash == EMPTY && len == 0`, and a non-empty entry must have `len != 0` \ No newline at end of file diff --git a/docs/research/references/iroh/iroh-docs/03-sync-protocol.md b/docs/research/references/iroh/iroh-docs/03-sync-protocol.md new file mode 100644 index 0000000..9bcaf66 --- /dev/null +++ b/docs/research/references/iroh/iroh-docs/03-sync-protocol.md @@ -0,0 +1,272 @@ +# iroh-docs: Range-Based Set Reconciliation (Ranger) + +## Overview + +The sync protocol in iroh-docs is based on **Range-Based Set Reconciliation**, implementing the algorithm described in [Aljoscha Meyer's paper (arXiv:2212.13567)](https://arxiv.org/abs/2212.13567). + +The core idea: two peers can efficiently compute the union of their entry sets by recursively partitioning the sets and comparing **fingerprints** (hashes) of partitions. When fingerprints match, no further work is needed. When they differ, the partition is subdivided until the difference can be resolved by sending the actual entries. + +## Key Abstractions + +### RangeEntry Trait + +```rust +pub trait RangeEntry: Debug + Clone { + type Key: RangeKey; + type Value: RangeValue; + + fn key(&self) -> &Self::Key; + fn value(&self) -> &Self::Value; + fn as_fingerprint(&self) -> Fingerprint; +} +``` + +`SignedEntry` implements `RangeEntry`: +- `Key` = `RecordIdentifier` (namespace || author || key bytes) +- `Value` = `Record` (timestamp, hash, len) +- Fingerprint = BLAKE3 hash of (namespace || author || key || timestamp || content_hash) + +### RangeKey Trait + +```rust +pub trait RangeKey: Sized + Debug + Ord + PartialEq + Clone + 'static { + fn is_prefix_of(&self, other: &Self) -> bool; // test-only +} +``` + +`RecordIdentifier` implements this via byte-level prefix matching: `(namespace, author, key)` where key prefix matching supports the hierarchical deletion semantics. + +### RangeValue Trait + +```rust +pub trait RangeValue: Sized + Debug + Ord + PartialEq + Clone + 'static {} +``` + +`Record` implements `RangeValue` with ordering by `(timestamp, hash)` — the Last-Writer-Wins ordering. + +### Fingerprint + +```rust +pub struct Fingerprint(pub [u8; 32]); // BLAKE3 hash +``` + +Fingerprints are computed by XOR-ing the individual entry fingerprints within a range. This means: +- The fingerprint of the empty set is `BLAKE3([])` (the hash of nothing) +- Adding/removing an entry toggles its contribution via XOR +- Equal sets produce equal fingerprints + +## Range Concept + +A `Range` represents a half-open interval `[x, y)` in the key space, with special semantics: + +```rust +pub(crate) struct Range { + x: K, + y: K, +} +``` + +- `x == y`: The entire set (all elements) +- `x < y`: Standard half-open interval `[x, y)` — includes `x`, excludes `y` +- `x > y`: Wrapping range — elements from `x` to end + beginning to `y` + +This wrapping range concept allows the algorithm to work with circular key spaces where the "first" element might be anywhere. + +## Protocol Messages + +```rust +pub type ProtocolMessage = crate::ranger::Message; +``` + +### Message Structure + +```rust +pub struct Message { + parts: Vec>, +} + +pub enum MessagePart { + RangeFingerprint(RangeFingerprint), // "Here's a fingerprint for this range" + RangeItem(RangeItem), // "Here are the entries in this range" +} + +pub struct RangeFingerprint { + range: Range, + fingerprint: Fingerprint, +} + +pub struct RangeItem { + range: Range, + values: Vec<(E, ContentStatus)>, + have_local: bool, // If true, sender already has these entries +} +``` + +The `have_local` flag is an optimization: when a peer sends entries AND indicates it already has them locally, the receiver doesn't need to send its own entries in that range back. + +### Wire Format + +Messages are serialized using `postcard` (a compact serde format) and framed with a 4-byte big-endian length prefix via `SyncCodec`: + +``` +┌─────────────────┬──────────────────────────────┐ +│ u32 BE length │ postcard-encoded Message │ +└─────────────────┴──────────────────────────────┘ +``` + +Max message size: 1 GiB (`MAX_MESSAGE_SIZE = 1024 * 1024 * 1024`). + +## Sync Algorithm Walkthrough + +### 1. Initiation (Alice → Bob) + +Alice generates the initial message: + +```rust +fn init>(store: &mut S) -> Result { + let x = store.get_first()?; // First key, or default + let range = Range::new(x.clone(), x); // "All elements" range + let fingerprint = store.get_fingerprint(&range)?; + Ok(Message { parts: vec![RangeFingerprint { range, fingerprint }] }) +} +``` + +This sends a single fingerprint covering the entire set. + +### 2. Processing (Bob processes Alice's message) + +For each part in the message: + +**Case 1: RangeFingerprint matches local fingerprint** → Nothing to do, sets are equal in this range. + +**Case 2: RangeFingerprint is empty OR range has ≤ 1 local entry** → Send all entries in the range as a `RangeItem`. + +**Case 3: Recurse** → Split the range into `split_factor` partitions, compute fingerprints, and send either `RangeFingerprint` (if partition is large) or `RangeItem` (if partition is small enough, ≤ `max_set_size`). + +### 3. Processing RangeItem + +When a peer receives a `RangeItem`: + +1. **Validate** each incoming entry using `validate_cb` +2. **Insert** valid entries via `Store::put()` (which handles prefix deletion) +3. **Notify** via `on_insert_cb` for actually-inserted entries +4. If `have_local` is false, compute the **diff** — entries in the local range not present in the received set — and send them back + +### Configuration + +```rust +struct SyncConfig { + max_set_size: usize, // Default: 1 — entries to send before using fingerprints + split_factor: usize, // Default: 2 — number of partitions per recursion step +} +``` + +With `max_set_size = 1` and `split_factor = 2`, the algorithm behaves like a binary search: each fingerprint mismatch splits the range in two and sends fingerprints for both halves. + +## Store Trait + +The `Store` trait provides the interface that the reconciliation algorithm needs: + +```rust +pub trait Store: Sized { + type Error: Debug + Send + Sync + Into + 'static; + type RangeIterator<'a>: Iterator> where Self: 'a, E: 'a; + type ParentIterator<'a>: Iterator> where Self: 'a, E: 'a; + + fn get_first(&mut self) -> Result; + fn get_fingerprint(&mut self, range: &Range) -> Result; + fn entry_put(&mut self, entry: E) -> Result<(), Self::Error>; + fn get_range(&mut self, range: Range) -> Result, Self::Error>; + fn prefixes_of(&mut self, key: &E::Key) -> Result, Self::Error>; + fn remove_prefix_filtered(&mut self, prefix: &E::Key, predicate: impl Fn(&E::Value) -> bool) -> Result; + fn initial_message(&mut self) -> Result, Self::Error>; + async fn process_message(...) -> Result>, Self::Error>; + fn put(&mut self, entry: E) -> Result; +} +``` + +### Insert Semantics in `Store::put()` + +The `put` method implements the CRDT insert logic: + +```rust +fn put(&mut self, entry: E) -> Result { + // 1. Check prefix entries — if any parent entry has value >= new entry, reject + for prefix_entry in self.prefixes_of(entry.key())? { + if entry.value() <= prefix_entry.value() { + return Ok(InsertOutcome::NotInserted); + } + } + + // 2. Remove entries whose key is prefixed by new entry's key AND whose value is <= + let removed = self.remove_prefix_filtered(entry.key(), |v| entry.value() >= v)?; + + // 3. Insert the new entry + self.entry_put(entry)?; + Ok(InsertOutcome::Inserted { removed }) +} +``` + +### InsertOutcome + +```rust +enum InsertOutcome { + NotInserted, // A newer or equal entry already exists + Inserted { removed: usize }, // Successfully inserted; reports removed entries +} +``` + +## Sync Flow at the Protocol Level + +The `Replica` type provides the sync interface: + +```rust +// Create initial message for sync +fn sync_initial_message(&mut self) -> anyhow::Result + +// Process an incoming message and produce optional reply +async fn sync_process_message( + &mut self, + message: ProtocolMessage, + from_peer: PeerIdBytes, + state: &mut SyncOutcome, +) -> Result, anyhow::Error> +``` + +### SyncOutcome + +Tracks the result of a sync session: + +```rust +pub struct SyncOutcome { + pub heads_received: AuthorHeads, // Latest timestamps per author from remote + pub num_recv: usize, // Number of entries received + pub num_sent: usize, // Number of entries sent +} +``` + +## Network Protocol (Codec) + +The sync protocol operates over a QUIC bidirectional stream: + +1. **Alice** (initiator) sends `Message::Init { namespace, message }` +2. **Bob** (responder) validates the namespace and either: + - Accepts and processes the initial message + - Rejects with `Message::Abort { reason }` +3. Both peers exchange `Message::Sync(message)` rounds until one side has no reply (convergence reached) + +The `BobState` manages the responder side, tracking namespace and `SyncOutcome` progress across message rounds. + +### Abort Reasons + +```rust +pub enum AbortReason { + NotFound, // Namespace not available + AlreadySyncing, // Already syncing this namespace + InternalServerError, +} +``` + +### Concurrent Sync Prevention + +When both peers try to sync with each other simultaneously, the system uses a deterministic tiebreaker based on comparing `EndpointId` bytes — the peer with the larger ID accepts, the other connects. \ No newline at end of file diff --git a/docs/research/references/iroh/iroh-docs/04-store-and-persistence.md b/docs/research/references/iroh/iroh-docs/04-store-and-persistence.md new file mode 100644 index 0000000..37b6405 --- /dev/null +++ b/docs/research/references/iroh/iroh-docs/04-store-and-persistence.md @@ -0,0 +1,257 @@ +# iroh-docs: Store and Persistence + +## Store Architecture + +The store is implemented in `store::fs::Store` using `redb`, an embedded key-value database. It supports two modes: + +- **In-memory**: `Store::memory()` — backed by a `Vec` via `redb::backends::InMemoryBackend` +- **Persistent**: `Store::persistent(path)` — backed by a single file on disk + +Both modes use the same `redb` table structure. + +## redb Table Schema + +### Authors Table +``` +Table: "authors-1" +Key: [u8; 32] (AuthorId) +Value: [u8; 32] (Author secret key bytes) +``` + +### Namespaces Table +``` +Table: "namespaces-2" +Key: [u8; 32] (NamespaceId) +Value: (u8, [u8; 32]) (CapabilityKind, key bytes) +``` + +The `CapabilityKind` discriminates between `Write = 1` (full key stored) and `Read = 2` (only the public key / namespace ID stored). + +### Records Table (Primary) +``` +Table: "records-1" +Key: (NamespaceId, AuthorId, key_bytes) = ([u8; 32], [u8; 32], &[u8]) +Value: (timestamp, namespace_sig, author_sig, len, hash) = (u64, &[u8; 64], &[u8; 64], u64, &[u8; 32]) +``` + +This is the main table storing all document entries. The key layout `(namespace, author, key)` enables efficient range queries for the sync algorithm. + +### Latest-Per-Author Table +``` +Table: "latest-by-author-1" +Key: (NamespaceId, AuthorId) = (&[u8; 32], &[u8; 32]) +Value: (timestamp, key_bytes) = (u64, &[u8]) +``` + +Used to quickly determine the latest entry timestamp for each author, supporting `AuthorHeads` computation and `has_news_for_us()` checks. + +### Records-By-Key Table (Index) +``` +Table: "records-by-key-1" +Key: (NamespaceId, key_bytes, AuthorId) = (&[u8; 32], &[u8], &[u8; 32]) +Value: () +``` + +An index table that enables efficient queries by key prefix, supporting `Query::key_prefix()` and `Query::key_exact()` lookups. + +### Namespace Peers Table (Multimap) +``` +MultimapTable: "sync-peers-1" +Key: &[u8; 32] (NamespaceId) +Value: (Nanos, &PeerIdBytes) (timestamp_nanos, peer_id) +``` + +Stores up to 5 (`PEERS_PER_DOC_CACHE_SIZE`) recently-useful peers per namespace. This is an LRU cache: when full, the oldest peer is evicted when a new one is registered. + +### Download Policy Table +``` +Table: "download-policy-1" +Key: &[u8; 32] (NamespaceId) +Value: &[u8] (postcard-encoded DownloadPolicy) +``` + +Per-namespace download policies controlling which content blobs to automatically download. + +## Store Operations + +### Transaction Model + +The `Store` uses a "current transaction" approach: + +```rust +enum CurrentTransaction { + None, + Read(ReadOnlyTables), + Write(TransactionAndTables), +} +``` + +- Read operations obtain a read snapshot +- Write operations batch into a write transaction +- Transactions older than `MAX_COMMIT_DELAY` (500ms) are automatically committed +- `flush()` commits any pending write transaction + +### Core Methods + +```rust +// Create/open/close replicas +fn new_replica(&mut self, namespace: NamespaceSecret) -> Result>; +fn open_replica(&mut self, namespace_id: &NamespaceId) -> Result>; +fn close_replica(&mut self, id: NamespaceId); +fn import_namespace(&mut self, capability: Capability) -> Result; + +// Author management +fn new_author(&mut self, rng: &mut R) -> Result; +fn import_author(&mut self, author: Author) -> Result<()>; +fn get_author(&mut self, author_id: &AuthorId) -> Result>; +fn delete_author(&mut self, author: AuthorId) -> Result<()>; + +// Queries +fn get_many(&mut self, namespace: NamespaceId, query: impl Into) -> Result; +fn get_exact(&mut self, namespace: NamespaceId, author: AuthorId, key: impl AsRef<[u8]>, include_empty: bool) -> Result>; +fn get_latest_for_each_author(&mut self, namespace: NamespaceId) -> Result>; + +// Sync support +fn has_news_for_us(&mut self, namespace: NamespaceId, heads: &AuthorHeads) -> Result>; +fn get_sync_peers(&mut self, namespace: &NamespaceId) -> Result>; +fn register_useful_peer(&mut self, namespace: NamespaceId, peer: PeerIdBytes) -> Result<()>; + +// Content +fn content_hashes(&mut self) -> Result; +``` + +### ImportNamespaceOutcome + +```rust +pub enum ImportNamespaceOutcome { + Inserted, // New namespace created + Upgraded, // Existing namespace upgraded from Read to Write + NoChange, // Namespace already existed with same or higher capability +} +``` + +## Query System + +The `Query` type supports flexible entry lookups: + +```rust +pub struct Query { + kind: QueryKind, + filter_author: AuthorFilter, + filter_key: KeyFilter, + limit: Option, + offset: u64, + include_empty: bool, + sort_direction: SortDirection, +} +``` + +### Query Kinds + +```rust +enum QueryKind { + Flat(FlatQuery), // Returns all matching entries + SingleLatestPerKey(SingleLatestPerKeyQuery), // Returns only latest entry per key +} +``` + +- **Flat**: Returns all entries matching the filters, sorted by `(namespace, author, key)` or `(namespace, key, author)` depending on `SortBy` +- **SingleLatestPerKey**: Groups by key and returns only the latest entry (by record value ordering) per key + +### Filters + +```rust +enum KeyFilter { + Any, // Match all keys + Exact(Bytes), // Exact key match + Prefix(Bytes), // Key starts with prefix +} + +enum AuthorFilter { + Any, // Match all authors + Exact(AuthorId), // Match specific author +} +``` + +### Builder Pattern + +```rust +// Get all entries +Query::all() + +// Get entries by author +Query::author(author_id) + +// Get entries by key prefix +Query::key_prefix(b"/path/") + +// Get single latest entry per key +Query::single_latest_per_key() + .key_prefix(b"/path/") + .author(author_id) +``` + +## Download Policy + +Controls which content blobs to automatically download after sync: + +```rust +pub enum DownloadPolicy { + NothingExcept(Vec), // Only download matching entries + EverythingExcept(Vec), // Download all except matching (default) +} + +pub enum FilterKind { + Prefix(Bytes), // Matches keys starting with bytes + Exact(Bytes), // Matches exact key +} +``` + +Default: `EverythingExcept(Vec::new())` — download everything. + +## PublicKeyStore + +The `PublicKeyStore` trait caches expanded `ed25519_dalek::VerifyingKey` objects to avoid repeated curve point decompression: + +```rust +pub trait PublicKeyStore { + fn public_key(&self, id: &[u8; 32]) -> Result; + fn namespace_key(&self, bytes: &NamespaceId) -> Result; + fn author_key(&self, bytes: &AuthorId) -> Result; +} +``` + +The `MemPublicKeyStore` implementation uses `Arc>>` for thread-safe caching. + +The `Store` itself implements `PublicKeyStore`, leveraging its redb tables for author storage and the in-memory cache for fast verification. + +## StoreInstance + +```rust +pub struct StoreInstance<'a> { + namespace: NamespaceId, + store: &'a mut Store, +} +``` + +A `StoreInstance` bundles a namespace ID with a mutable reference to the store, providing the `ranger::Store` implementation for the sync algorithm. This is what `Replica` uses internally to perform sync operations. + +## Replica + +```rust +pub struct Replica<'a, I = Box> { + store: StoreInstance<'a>, + info: I, +} +``` + +`Replica` is the primary user-facing type for document operations. It combines: +- A `StoreInstance` for data access +- `ReplicaInfo` for metadata (capability, subscribers, content status callback) + +Key methods: +- `insert(key, author, hash, len)` — Insert a new entry +- `delete_prefix(prefix, author)` — Delete entries by key prefix +- `insert_remote_entry(entry, from, content_status)` — Insert from sync +- `hash_and_insert(key, author, data)` — Hash data and insert +- `sync_initial_message()` / `sync_process_message()` — Sync protocol operations \ No newline at end of file diff --git a/docs/research/references/iroh/iroh-docs/05-engine-and-live-sync.md b/docs/research/references/iroh/iroh-docs/05-engine-and-live-sync.md new file mode 100644 index 0000000..aadb0f9 --- /dev/null +++ b/docs/research/references/iroh/iroh-docs/05-engine-and-live-sync.md @@ -0,0 +1,343 @@ +# iroh-docs: Engine and Live Sync + +## Overview + +The `Engine` is the top-level coordinator for live document synchronization. It brings together: + +1. **SyncHandle/Actor** — Single-threaded actor for all store and replica operations +2. **LiveActor** — Async event loop coordinating sync, gossip, and content downloads +3. **GossipState** — Integration with `iroh-gossip` for broadcasting updates +4. **Blobs/Downloader** — Integration with `iroh-blobs` for content transfer + +## Engine + +```rust +pub struct Engine { + pub endpoint: Endpoint, + pub sync: SyncHandle, + pub default_author: DefaultAuthor, + to_live_actor: mpsc::Sender, + actor_handle: AbortOnDropHandle<()>, + content_status_cb: ContentStatusCallback, + blob_store: iroh_blobs::api::Store, + _gc_protect_task: AbortOnDropHandle<()>, +} +``` + +### Initialization + +```rust +Engine::spawn( + endpoint, // iroh Endpoint for QUIC connections + gossip, // iroh-gossip instance + replica_store, // Store for document data + bao_store, // iroh-blobs Store for content blobs + downloader, // Downloader for fetching blobs + default_author_storage, // Where to persist the default author + protect_cb, // Optional GC protection callback +) -> Result +``` + +During spawn: +1. A `ContentStatusCallback` is created that checks blob availability in `iroh-blobs` +2. A `SyncHandle` actor is spawned on a dedicated thread +3. A `LiveActor` is spawned as a tokio task +4. The default author is loaded or created +5. A GC protection task is started (if callback provided) + +### Key Engine Methods + +```rust +// Start syncing a document with given peers +async fn start_sync(&self, namespace: NamespaceId, peers: Vec) -> Result<()> + +// Stop syncing and leave gossip swarm +async fn leave(&self, namespace: NamespaceId, kill_subscribers: bool) -> Result<()> + +// Subscribe to document events +async fn subscribe(&self, namespace: NamespaceId) -> Result>> + +// Handle incoming QUIC connections +async fn handle_connection(&self, conn: Connection) -> Result<()> + +// Shutdown the engine +async fn shutdown(&self) -> Result<()> +``` + +### GC Protection + +The `ProtectCallbackHandler` bridges iroh-docs with iroh-blobs' garbage collection: + +```rust +let (handler, protect_cb) = ProtectCallbackHandler::new(); +// protect_cb goes into iroh-blobs GC config +// handler goes into Engine::spawn +``` + +When iroh-blobs runs GC, it calls `protect_cb` which queries the docs store for all content hashes, ensuring blobs referenced by document entries are not garbage-collected. + +## SyncHandle / Actor + +The `SyncHandle` is a handle to a single-threaded actor that processes all store and replica operations sequentially: + +```rust +pub struct SyncHandle { + tx: async_channel::Sender, + join_handle: Arc>>, + metrics: Arc, +} +``` + +### Actor Architecture + +``` +External Code ──async──▶ SyncHandle ──channel──▶ Actor Thread + │ + Store (redb) + Replica operations + Flush on timeout (500ms) +``` + +The actor runs on a **dedicated OS thread** (not a tokio task), using `tokio::runtime::Builder::new_current_thread()` internally. This ensures store operations are never concurrent. + +### Action Types + +```rust +enum Action { + ImportAuthor { author, reply }, + ExportAuthor { author, reply }, + DeleteAuthor { author, reply }, + ImportNamespace { capability, reply }, + ListAuthors { reply }, + ListReplicas { reply }, + ContentHashes { reply }, + FlushStore { reply }, + Replica(NamespaceId, ReplicaAction), + Shutdown { reply }, +} + +enum ReplicaAction { + Open { reply, opts }, + Close { reply }, + GetState { reply }, + SetSync { sync, reply }, + Subscribe { sender, reply }, + Unsubscribe { sender, reply }, + InsertLocal { author, key, hash, len, reply }, + DeletePrefix { author, key, reply }, + InsertRemote { entry, from, content_status, reply }, + SyncInitialMessage { reply }, + SyncProcessMessage { message, from, state, reply }, + GetSyncPeers { reply }, + RegisterUsefulPeer { peer, reply }, + GetExact { author, key, include_empty, reply }, + GetMany { query, reply }, + DropReplica { reply }, + ExportSecretKey { reply }, + HasNewsForUs { heads, reply }, + SetDownloadPolicy { policy, reply }, + GetDownloadPolicy { reply }, +} +``` + +### Replica Opening + +When a replica is opened via the actor, an `OpenReplica` struct is created: + +```rust +struct OpenReplica { + info: ReplicaInfo, // Capability, subscribers, content status callback + sync: bool, // Whether to accept sync requests + handles: usize, // Reference count for open handles +} +``` + +Multiple handles to the same replica are supported via reference counting. + +## LiveActor + +The `LiveActor` is the central async coordinator: + +```rust +pub struct LiveActor { + inbox: mpsc::Receiver, + sync: SyncHandle, + endpoint: Endpoint, + bao_store: Store, + downloader: Downloader, + memory_lookup: MemoryLookup, + replica_events_tx: async_channel::Sender, + replica_events_rx: async_channel::Receiver, + sync_actor_tx: mpsc::Sender, + gossip: GossipState, + running_sync_connect: JoinSet, + running_sync_accept: JoinSet, + download_tasks: JoinSet, + missing_hashes: HashSet, + queued_hashes: QueuedHashes, + hash_providers: ProviderNodes, + subscribers: SubscribersMap, + state: NamespaceStates, + metrics: Arc, +} +``` + +### Event Loop + +The `LiveActor::run_inner()` loop uses `tokio::select!` with biased polling: + +```rust +tokio::select! { + biased; + msg = self.inbox.recv() => { /* handle actor messages */ } + event = self.replica_events_rx.recv() => { /* handle replica insert events */ } + res = self.running_sync_connect.join_next() => { /* sync connect finished */ } + res = self.running_sync_accept.join_next() => { /* sync accept finished */ } + res = self.download_tasks.join_next() => { /* download completed */ } + res = self.gossip.progress() => { /* gossip task progress */ } +} +``` + +### ToLiveActor Messages + +```rust +pub enum ToLiveActor { + StartSync { namespace, peers, reply }, + Leave { namespace, kill_subscribers, reply }, + Shutdown { reply }, + Subscribe { namespace, sender, reply }, + HandleConnection { conn }, + AcceptSyncRequest { namespace, peer, reply }, + IncomingSyncReport { from, report }, + NeighborContentReady { namespace, node, hash }, + NeighborUp { namespace, peer }, + NeighborDown { namespace, peer }, +} +``` + +### Gossip Operations (Op) + +```rust +pub enum Op { + Put(SignedEntry), // New entry inserted + ContentReady(Hash), // Content blob now available + SyncReport(SyncReport), // Heads summary after sync +} +``` + +Gossip broadcasts `Op` messages to all swarm participants. When a `Put` is received, the entry is inserted into the local replica. When a `ContentReady` is received, peers know they can download the blob. When a `SyncReport` is received, peers check `has_news_for_us()` to decide if they should sync. + +### Content Download Flow + +1. When a `RemoteInsert` event occurs with `should_download: true`, the entry's content hash is queued for download +2. The `LiveActor` uses `iroh_blobs::downloader::Downloader` to fetch the blob +3. Known providers (peers who had `ContentStatus::Complete`) are used as download sources +4. On download completion, a `LiveEvent::ContentReady` event is emitted + +### LiveEvent (Public API) + +```rust +pub enum LiveEvent { + InsertLocal { entry: Entry }, + InsertRemote { from: PublicKey, entry: Entry, content_status: ContentStatus }, + ContentReady { hash: Hash }, + PendingContentReady, + NeighborUp(PublicKey), + NeighborDown(PublicKey), + SyncFinished(SyncEvent), +} +``` + +`SyncEvent` wraps `SyncFinished`: + +```rust +pub struct SyncFinished { + pub namespace: NamespaceId, + pub peer: PublicKey, + pub outcome: SyncOutcome, + pub timings: Timings, +} +``` + +## NamespaceStates + +```rust +pub struct NamespaceStates(BTreeMap); + +struct NamespaceState { + nodes: BTreeMap, + may_emit_ready: bool, +} +``` + +Each peer has a `PeerState` tracking sync progress: + +```rust +struct PeerState { + state: SyncState, // Idle or Running + resync_requested: bool, // Whether a resync was requested during active sync + last_sync: Option<(Instant, Result)>, +} +``` + +This state machine prevents concurrent syncs with the same peer for the same namespace and queues resync requests when needed. + +## DefaultAuthor + +```rust +pub struct DefaultAuthor { + value: RwLock, + storage: DefaultAuthorStorage, +} +``` + +- `DefaultAuthorStorage::Mem` — Ephemeral, creates a new author each time +- `DefaultAuthorStorage::Persistent(path)` — Stores the author ID as hex in a file, loads it on startup + +The default author provides a convenient "current user" identity for applications. + +## Docs Protocol Handler + +```rust +pub struct Docs { + engine: Arc, + api: DocsApi, +} +``` + +`Docs` implements `ProtocolHandler` for integration with iroh's `Router`: + +```rust +impl ProtocolHandler for Docs { + async fn accept(&self, connection: Connection) -> Result<(), AcceptError> { ... } + async fn shutdown(&self) { ... } +} +``` + +The `Builder` pattern configures storage: + +```rust +let docs = Docs::memory() + .spawn(endpoint, blobs, gossip) + .await?; +// or +let docs = Docs::persistent(path) + .protect_handler(handler) + .spawn(endpoint, blobs, gossip) + .await?; +``` + +## DocTicket + +```rust +pub struct DocTicket { + pub capability: Capability, + pub nodes: Vec, +} +``` + +A `DocTicket` encapsulates everything needed to join a document: +- A `Capability` (Read or Write) — provides the namespace key +- A list of `EndpointAddr` — bootstrap peers to connect to + +Tickets are serialized as base32-encoded postcard data with a `"doc"` prefix, using the `iroh_tickets::Ticket` trait. \ No newline at end of file diff --git a/docs/research/references/iroh/iroh-docs/06-network-protocol.md b/docs/research/references/iroh/iroh-docs/06-network-protocol.md new file mode 100644 index 0000000..b25624b --- /dev/null +++ b/docs/research/references/iroh/iroh-docs/06-network-protocol.md @@ -0,0 +1,189 @@ +# iroh-docs: Network Protocol and Wire Format + +## ALPN + +The docs protocol uses ALPN `/iroh-sync/1` for QUIC connection identification. + +```rust +pub const ALPN: &[u8] = b"/iroh-sync/1"; +``` + +## Connection Flow + +### Outgoing Sync (Alice — Initiator) + +```rust +pub async fn connect_and_sync( + endpoint: &Endpoint, + sync: &SyncHandle, + namespace: NamespaceId, + peer: EndpointAddr, + metrics: Option<&Metrics>, +) -> Result +``` + +1. Open a QUIC connection to the peer with ALPN `/iroh-sync/1` +2. Open a bidirectional QUIC stream +3. Run the Alice (initiator) protocol via `run_alice()` +4. Close the stream and return `SyncFinished` + +### Incoming Sync (Bob — Responder) + +```rust +pub async fn handle_connection( + sync: SyncHandle, + connection: Connection, + accept_cb: F, + metrics: Option<&Metrics>, +) -> Result +``` + +1. Accept a bidirectional QUIC stream from the connection +2. Run the Bob (responder) protocol via `BobState::run()` +3. The `accept_cb` determines whether to accept or reject each namespace +4. Close the stream and return `SyncFinished` + +## Wire Format + +### Frame Codec + +All messages are length-prefixed: + +``` +┌──────────────────────┬──────────────────────────────┐ +│ u32 big-endian len │ postcard-serialized Message │ +└──────────────────────┴──────────────────────────────┘ +``` + +Maximum message size: 1 GiB. + +### Message Types + +```rust +enum Message { + Init { + namespace: NamespaceId, // Which document to sync + message: ProtocolMessage, // Initial sync message (ranger::Message) + }, + Sync(ProtocolMessage), // Subsequent sync round-trip messages + Abort { reason: AbortReason }, // Responder rejects the request +} +``` + +### Serialization + +Messages use `postcard` (a compact `serde` format optimized for embedded/no-std use). The `SyncCodec` implements `tokio_util::codec::Encoder` and `Decoder` for async stream framing. + +## Protocol Sequence + +``` +Alice (Initiator) Bob (Responder) + │ │ + │──── Init { namespace, initial_msg } ───────▶│ + │ │ + │◀─── Sync(reply_msg) ────────────────────── │ (or Abort) + │ │ + │──── Sync(next_msg) ──────────────────────▶│ + │ │ + │◀─── Sync(reply_msg) ────────────────────── │ + │ │ + │──── Sync(next_msg) ──────────────────────▶│ + │ │ + │ ... until convergence ... │ + │ │ + │──── (stream closed) ─────────────────────▶│ + │ │ +``` + +The protocol terminates when one side has no more messages to send (convergence reached). Each `Sync` message carries a `ProtocolMessage` which is a `ranger::Message` containing `MessagePart`s (either `RangeFingerprint` or `RangeItem`). + +## SyncFinished Result + +```rust +pub struct SyncFinished { + pub namespace: NamespaceId, + pub peer: PublicKey, + pub outcome: SyncOutcome, // heads_received, num_recv, num_sent + pub timings: Timings, // connect duration, process duration +} +``` + +## Error Types + +### ConnectError + +```rust +pub enum ConnectError { + Connect { error: anyhow::Error }, // Connection failed + RemoteAbort(AbortReason), // Remote rejected our request + Sync { error: anyhow::Error }, // Sync protocol error + Close { error: anyhow::Error }, // Stream close error +} +``` + +### AcceptError + +```rust +pub enum AcceptError { + Connect { error: anyhow::Error }, // Connection failed + Open { peer: PublicKey, error }, // Failed to open replica + Abort { peer, namespace, reason }, // We aborted + Sync { peer, namespace, error }, // Sync protocol error + Close { peer, namespace, error }, // Stream close error +} +``` + +## Gossip Integration + +The `GossipState` manages iroh-gossip subscriptions per namespace: + +```rust +pub struct GossipState { + gossip: Gossip, + sync: SyncHandle, + to_live_actor: mpsc::Sender, + active: HashMap, + active_tasks: JoinSet<(NamespaceId, Result<()>)>, +} +``` + +When a document starts syncing: +1. The engine joins a gossip topic for that namespace +2. `GossipState::join()` subscribes with bootstrap peers +3. A receive loop task is spawned to process incoming gossip messages +4. `Op` messages (Put, ContentReady, SyncReport) are deserialized and forwarded to `LiveActor` + +When receiving an `Op::Put`: +```rust +// In the gossip receive loop: +let entry = SignedEntry::from_entry(...); // deserialize +sync.insert_remote(namespace, entry, from, content_status).await?; +``` + +When receiving an `Op::SyncReport`: +```rust +// Forward to LiveActor which checks has_news_for_us() +to_live_actor.send(ToLiveActor::IncomingSyncReport { from, report }).await?; +``` + +Broadcasting: +```rust +// When a local insert occurs: +gossip.broadcast(&namespace, postcard::to_stdvec(&Op::Put(entry))).await; + +// When content becomes ready: +gossip.broadcast(&namespace, postcard::to_stdvec(&Op::ContentReady(hash))).await; +``` + +## Sync Report Compression + +`SyncReport` encodes `AuthorHeads` with an optional size limit: + +```rust +pub struct SyncReport { + namespace: NamespaceId, + heads: Vec, // postcard-encoded AuthorHeads with size limit +} +``` + +The size limit ensures gossip messages stay small, dropping the oldest (least recent) author timestamps when necessary. \ No newline at end of file diff --git a/docs/research/references/iroh/iroh-docs/07-api-and-data-flow.md b/docs/research/references/iroh/iroh-docs/07-api-and-data-flow.md new file mode 100644 index 0000000..466c91c --- /dev/null +++ b/docs/research/references/iroh/iroh-docs/07-api-and-data-flow.md @@ -0,0 +1,188 @@ +# iroh-docs: API and RPC + +## DocsApi + +The `DocsApi` provides an RPC-based interface to the docs engine, implemented via `irpc`: + +```rust +#[derive(Debug, Clone)] +pub struct DocsApi { + inner: Client, +} +``` + +### Methods (via irpc) + +The API exposes document operations through an RPC protocol defined in `api/protocol.rs`: + +| Method | Request | Response | Description | +|--------|---------|----------|-------------| +| `Open` | `OpenRequest { doc_id }` | `OpenResponse` | Open a document for operations | +| `Close` | `CloseRequest { doc_id }` | `CloseResponse` | Close a document | +| `Status` | `StatusRequest { doc_id }` | `StatusResponse { status: OpenState }` | Get document open state | +| `List` | `ListRequest` | Stream of `ListResponse { id, capability }` | List all documents | +| `Create` | `CreateRequest` | `CreateResponse { id }` | Create a new document | +| `Drop` | `DropRequest { doc_id }` | `DropResponse` | Remove a document | +| `Import` | `ImportRequest { capability }` | `ImportResponse { doc_id }` | Import a document by capability | +| `Set` | `SetRequest { doc_id, author_id, key, value }` | `SetResponse { entry }` | Set a key-value pair | +| `SetHash` | `SetHashRequest { doc_id, author_id, key, hash, size }` | `SetHashResponse` | Set a key with pre-hashed content | +| `GetMany` | `GetManyRequest { doc_id, query }` | Stream of entries | Query entries | +| `GetExact` | `GetExactRequest { doc_id, key, author, include_empty }` | `GetExactResponse { entry }` | Get single entry | +| `Del` | `DelRequest { doc_id, author_id, key }` | `DelResponse { removed }` | Delete by key prefix | +| `Subscribe` | `SubscribeRequest { doc_id }` | Stream of `LiveEvent` | Subscribe to document events | +| `Share` | `ShareRequest { doc_id, mode, peers }` | `ShareResponse { ticket }` | Create a sharing ticket | +| `StartSync` | `StartSyncRequest { doc_id, peers }` | `StartSyncResponse` | Start live sync | +| `Leave` | `LeaveRequest { doc_id }` | `LeaveResponse` | Leave gossip swarm | +| `ImportFile` | `ImportFileRequest { ... }` | Stream of `ImportProgress` | Import file content and set key | +| `ExportFile` | `ExportFileRequest { ... }` | Stream of `ExportProgress` | Export content to file | +| `AuthorList` | `AuthorListRequest` | Stream of `AuthorListResponse` | List authors | +| `AuthorCreate` | `AuthorCreateRequest` | `AuthorCreateResponse { author_id }` | Create new author | +| `AuthorImport` | `AuthorImportRequest { author }` | `AuthorImportResponse { author_id }` | Import author key | +| `AuthorExport` | `AuthorExportRequest { author_id }` | `AuthorExportResponse { author }` | Export author key | +| `AuthorDelete` | `AuthorDeleteRequest { author_id }` | `AuthorDeleteResponse` | Delete author | +| `AuthorGetDefault` | `AuthorGetDefaultRequest` | `AuthorGetDefaultResponse { author_id }` | Get default author | +| `AuthorSetDefault` | `AuthorSetDefaultRequest { author_id }` | `AuthorSetDefaultResponse` | Set default author | +| `SetDownloadPolicy` | `SetDownloadPolicyRequest { doc_id, policy }` | `SetDownloadPolicyResponse` | Set download policy | +| `GetDownloadPolicy` | `GetDownloadPolicyRequest { doc_id }` | `GetDownloadPolicyResponse { policy }` | Get download policy | +| `GetSyncPeers` | `GetSyncPeersRequest { doc_id }` | `GetSyncPeersResponse { peers }` | Get known sync peers | + +## RPC Implementation + +The RPC is implemented via `irpc` (for local/remote procedure calls) and `noq` (for remote network access): + +### Local API + +`DocsApi::spawn(engine)` creates an `RpcActor` that processes requests against the engine directly: + +```rust +impl DocsApi { + pub fn spawn(engine: Arc) -> Self { + RpcActor::spawn(engine) + } +} +``` + +### Remote API + +When the `rpc` feature is enabled, `DocsApi::connect(endpoint, addr)` creates a remote client that sends requests over the network via `noq`. + +### Protocol Dispatch + +```rust +irpc::rpc::Handler dispatches: +DocsProtocol::Open(msg) => local.send((msg, tx)).await +DocsProtocol::Set(msg) => local.send((msg, tx)).await +// ... etc +``` + +## RpcActor + +The `RpcActor` (in `api/actor.rs`) bridges the RPC protocol to the `Engine`: + +```rust +struct RpcActor { + engine: Arc, +} +``` + +It handles each request type by calling the corresponding `Engine`/`SyncHandle` method and returning the result through the RPC channel. + +For streaming responses (like `GetMany`, `Subscribe`, `AuthorList`), the actor sends results through an `mpsc` channel that the RPC framework streams back to the client. + +## Share Mode and Tickets + +When sharing a document: + +```rust +pub enum ShareMode { + Read, // Share with read-only capability + Write, // Share with full write capability +} +``` + +The `Share` RPC method: +1. Gets or creates the namespace capability +2. Creates a `DocTicket` with the capability and provided peer addresses +3. Starts sync with the provided peers +4. Returns the ticket for distribution + +## Example: Basic Setup + +```rust +use iroh::{endpoint::presets, protocol::Router, Endpoint}; +use iroh_blobs::{BlobsProtocol, store::mem::MemStore, ALPN as BLOBS_ALPN}; +use iroh_docs::{protocol::Docs, ALPN as DOCS_ALPN}; +use iroh_gossip::{net::Gossip, ALPN as GOSSIP_ALPN}; + +#[tokio::main] +async fn main() -> anyhow::Result<()> { + let endpoint = Endpoint::bind(presets::N0).await?; + let blobs = MemStore::default(); + let gossip = Gossip::builder().spawn(endpoint.clone()); + let docs = Docs::memory() + .spawn(endpoint.clone(), (*blobs).clone(), gossip.clone()) + .await?; + + let router = Router::builder(endpoint.clone()) + .accept(BLOBS_ALPN, BlobsProtocol::new(&blobs, None)) + .accept(GOSSIP_ALPN, gossip) + .accept(DOCS_ALPN, docs) + .spawn(); + + Ok(()) +} +``` + +## Data Flow Summary + +``` +┌─────────────────────────────────────────────────────────────────┐ +│ Application / RPC │ +│ DocsApi ──irpc──▶ RpcActor ──▶ Engine / SyncHandle │ +└─────────────────────────────────────────────────────────────────┘ + +┌─────────────────────────────────────────────────────────────────┐ +│ Live Sync (per document) │ +│ │ +│ LiveActor event loop: │ +│ ┌────────────────┐ ┌─────────────────┐ ┌──────────────────┐ │ +│ │ Actor Messages │ │ Replica Events │ │ Gossip Events │ │ +│ │ (StartSync, │ │ (LocalInsert, │ │ (Put, │ │ +│ │ Subscribe, │ │ RemoteInsert) │ │ ContentReady, │ │ +│ │ Leave, ...) │ │ │ │ SyncReport) │ │ +│ └──────┬─────────┘ └───────┬────────┘ └──────┬──────────┘ │ +│ │ │ │ │ +│ ▼ ▼ ▼ │ +│ ┌──────────────────────────────────────────────────────────┐ │ +│ │ LiveActor::run_inner() │ │ +│ │ tokio::select! { ... } │ │ +│ │ │ │ +│ │ - Start/stop gossip subscriptions │ │ +│ │ - Initiate outgoing syncs (connect_and_sync) │ │ +│ │ - Accept incoming syncs (handle_connection) │ │ +│ │ - Queue content downloads │ │ +│ │ - Broadcast local inserts via gossip │ │ +│ │ - Emit LiveEvent to subscribers │ │ +│ └──────────────────────────────────────────────────────────┘ │ +│ │ +│ Running Tasks: │ +│ ┌───────────────────┐ ┌───────────────────┐ │ +│ │ sync_connect tasks│ │ sync_accept tasks │ │ +│ └───────────────────┘ └───────────────────┘ │ +│ ┌───────────────────┐ ┌───────────────────┐ │ +│ │ download tasks │ │ gossip receive loop│ │ +│ └───────────────────┘ └───────────────────┘ │ +└─────────────────────────────────────────────────────────────────┘ + +┌─────────────────────────────────────────────────────────────────┐ +│ Sync Actor (dedicated thread) │ +│ │ +│ ┌────────────┐ ┌─────────────────────────────────────────┐ │ +│ │ Action │ │ Replica Operations: │ │ +│ │ Channel │──▶│ Insert, Delete, Get, Query, │ │ +│ │ (bounded) │ │ SyncInit, SyncProcess, Open, Close, ...│ │ +│ └────────────┘ └─────────────────────────────────────────┘ │ +│ │ +│ Store (redb) ──▶ All reads/writes on this thread │ +└─────────────────────────────────────────────────────────────────┘ +``` \ No newline at end of file diff --git a/docs/research/references/iroh/iroh-docs/08-key-types-reference.md b/docs/research/references/iroh/iroh-docs/08-key-types-reference.md new file mode 100644 index 0000000..99d838d --- /dev/null +++ b/docs/research/references/iroh/iroh-docs/08-key-types-reference.md @@ -0,0 +1,318 @@ +# iroh-docs: Key Types Reference + +## Cryptographic Keys + +### NamespaceSecret + +```rust +pub struct NamespaceSecret { + signing_key: SigningKey, // ed25519_dalek::SigningKey (32 bytes) +} +``` + +- The write capability for a document +- Can sign entries (namespace signature) +- Derives `NamespacePublicKey` and `NamespaceId` +- Serialized as 32 bytes + +### NamespacePublicKey + +```rust +pub struct NamespacePublicKey(VerifyingKey); // ed25519_dalek::VerifyingKey +``` + +- The verifying key corresponding to `NamespaceSecret` +- Can verify namespace signatures on entries +- Serialized as 32 bytes + +### NamespaceId + +```rust +pub struct NamespaceId([u8; 32]); +``` + +- The byte representation of `NamespacePublicKey` +- Serves as the unique identifier for a document +- Can be converted back to `NamespacePublicKey` via `PublicKeyStore` (handles invalid curve points) + +### Author + +```rust +pub struct Author { + signing_key: SigningKey, // ed25519_dalek::SigningKey (32 bytes) +} +``` + +- A writer identity within a document +- Can sign entries (author signature) +- Derives `AuthorPublicKey` and `AuthorId` +- Created randomly with `Author::new(&mut rng)` +- Stored persistently in the redb authors table + +### AuthorPublicKey + +```rust +pub struct AuthorPublicKey(VerifyingKey); +``` + +- The verifying key corresponding to an `Author` +- Can verify author signatures on entries +- Serialized as 32 bytes + +### AuthorId + +```rust +pub struct AuthorId([u8; 32]); +``` + +- Byte representation of `AuthorPublicKey` +- Used as a component of `RecordIdentifier` +- Has `fmt_short()` for human-readable display (first 10 hex chars) + +## Entry Types + +### RecordIdentifier + +```rust +pub struct RecordIdentifier(Bytes); +// Layout: [NamespaceId(32) | AuthorId(32) | Key(variable)] +``` + +- The composite key for an entry +- Byte layout: 32 bytes namespace + 32 bytes author + variable-length key +- Ordering: namespace → author → key (lexicographic) +- This ordering is critical for the range-based sync algorithm + +### Record + +```rust +pub struct Record { + len: u64, // Byte length of content + hash: Hash, // BLAKE3 hash of content (32 bytes) + timestamp: u64, // Microseconds since Unix epoch +} +``` + +- The value portion of an entry +- Ordering: timestamp first, then hash (Last-Writer-Wins) +- `Record::empty(timestamp)` creates a tombstone (hash=EMPTY, len=0) +- `Record::new_current(hash, len)` uses current system time + +### Entry + +```rust +pub struct Entry { + id: RecordIdentifier, + record: Record, +} +``` + +- Combines key and value +- `Entry::new(id, record)` constructor +- `Entry::new_empty(id)` creates a tombstone with current timestamp +- `entry.sign(namespace, author)` produces a `SignedEntry` + +### SignedEntry + +```rust +pub struct SignedEntry { + signature: EntrySignature, // Dual Ed25519 signatures + entry: Entry, +} +``` + +- An entry with cryptographic proof of authorization and authorship +- `SignedEntry::from_entry(entry, namespace, author)` — create from entry +- `signed_entry.verify(store)` — verify both signatures using a `PublicKeyStore` +- Implements `RangeEntry` for the sync algorithm + +### EntrySignature + +```rust +pub struct EntrySignature { + author_signature: Signature, // 64-byte Ed25519 signature + namespace_signature: Signature, // 64-byte Ed25519 signature +} +``` + +- Created by signing the canonical byte encoding of the `Entry` +- Both signatures cover the same message bytes +- Verification requires both `NamespacePublicKey` and `AuthorPublicKey` + +## Sync Types + +### SyncOutcome + +```rust +pub struct SyncOutcome { + pub heads_received: AuthorHeads, + pub num_recv: usize, + pub num_sent: usize, +} +``` + +- Tracks the result of a sync session +- `heads_received` accumulates the latest timestamp seen from each author on the remote side + +### ProtocolMessage + +```rust +pub type ProtocolMessage = ranger::Message; +``` + +- The wire type for sync protocol messages +- Contains `Vec>` + +### ContentStatus + +```rust +pub enum ContentStatus { + Complete, // Content blob fully available + Incomplete, // Partially available + Missing, // Not available +} +``` + +- Communicated alongside entries during sync +- Helps peers decide whether to download content + +### InsertOrigin + +```rust +pub enum InsertOrigin { + Local, + Sync { + from: PeerIdBytes, // [u8; 32] — the remote peer + remote_content_status: ContentStatus, + }, +} +``` + +## Event Types + +### Event (Internal) + +```rust +pub enum Event { + LocalInsert { + namespace: NamespaceId, + entry: SignedEntry, + }, + RemoteInsert { + namespace: NamespaceId, + entry: SignedEntry, + from: PeerIdBytes, + should_download: bool, + remote_content_status: ContentStatus, + }, +} +``` + +- Emitted by `Replica` via `ReplicaInfo` subscribers +- `should_download` is determined by the `DownloadPolicy` + +### LiveEvent (Public) + +```rust +pub enum LiveEvent { + InsertLocal { entry: Entry }, + InsertRemote { from: PublicKey, entry: Entry, content_status: ContentStatus }, + ContentReady { hash: Hash }, + PendingContentReady, + NeighborUp(PublicKey), + NeighborDown(PublicKey), + SyncFinished(SyncEvent), +} +``` + +- Emitted by the `Engine` through `subscribe()` +- `InsertLocal` / `InsertRemote` are derived from `Event` by stripping `SignedEntry` → `Entry` +- `ContentReady` is emitted when a blob download completes +- `SyncFinished` wraps `SyncFinished` from the network layer + +## Store Types + +### Store (store::fs::Store) + +```rust +pub struct Store { + db: Database, // redb database + transaction: CurrentTransaction, // Current read/write transaction + open_replicas: HashSet, // Track which replicas are open + pubkeys: MemPublicKeyStore, // Cache for expanded public keys +} +``` + +### Query + +```rust +pub struct Query { + kind: QueryKind, // Flat or SingleLatestPerKey + filter_author: AuthorFilter, // Any or Exact + filter_key: KeyFilter, // Any, Exact, or Prefix + limit: Option, + offset: u64, + include_empty: bool, + sort_direction: SortDirection, +} +``` + +### Capability + +```rust +pub enum Capability { + Write(NamespaceSecret), + Read(NamespaceId), +} +``` + +- `Write` allows inserting entries and signing them +- `Read` allows syncing and reading but not inserting +- Can be serialized as `(u8, [u8; 32])` — kind byte + key bytes +- `merge()` can upgrade `Read` to `Write` + +### DownloadPolicy + +```rust +pub enum DownloadPolicy { + NothingExcept(Vec), // Whitelist mode + EverythingExcept(Vec), // Blacklist mode (default) +} +``` + +### DocTicket + +```rust +pub struct DocTicket { + pub capability: Capability, + pub nodes: Vec, +} +``` + +- Serializable as a base32 string with "doc" prefix +- Contains everything needed to join a document +- The wire format uses a versioned enum: `TicketWireFormat::Variant0(DocTicket)` + +## OpenState + +```rust +pub struct OpenState { + pub sync: bool, // Whether sync is enabled + pub subscribers: usize, // Number of event subscribers + pub handles: usize, // Number of open handles +} +``` + +Returned by the `Status` RPC method to report the state of an open document. + +## Utility Constants + +| Constant | Value | Purpose | +|----------|-------|---------| +| `MAX_TIMESTAMP_FUTURE_SHIFT` | 10 min in μs | Max future drift for entry timestamps | +| `MAX_COMMIT_DELAY` | 500ms | Auto-commit interval for store transactions | +| `ACTION_CAP` | 1024 | Bounded channel capacity for SyncHandle actions | +| `ACTOR_CHANNEL_CAP` | 64 | Channel capacity for LiveActor messages | +| `SUBSCRIBE_CHANNEL_CAP` | 256 | Channel capacity for event subscriptions | +| `PEERS_PER_DOC_CACHE_SIZE` | 5 | LRU cache size for sync peers per document | +| `MAX_MESSAGE_SIZE` | 1 GiB | Max wire message size | \ No newline at end of file diff --git a/docs/research/references/iroh/iroh-docs/README.md b/docs/research/references/iroh/iroh-docs/README.md new file mode 100644 index 0000000..778bd77 --- /dev/null +++ b/docs/research/references/iroh/iroh-docs/README.md @@ -0,0 +1,59 @@ +# iroh-docs Reference Documentation + +> Version: 0.98.0 +> Repository: https://github.com/n0-computer/iroh-docs +> License: MIT/Apache-2.0 +> Based on: [Range-Based Set Reconciliation (Meyer, 2022)](https://arxiv.org/abs/2212.13567) + +## Document Index + +| # | File | Topic | +|---|------|-------| +| 01 | [Overview and Architecture](01-overview-and-architecture.md) | High-level architecture, module layout, dependencies, feature flags | +| 02 | [Document Model](02-document-model.md) | CRDT data model: namespaces, authors, entries, signatures, prefix deletion, timestamps | +| 03 | [Sync Protocol](03-sync-protocol.md) | Range-based set reconciliation algorithm, fingerprints, message format, Store trait | +| 04 | [Store and Persistence](04-store-and-persistence.md) | redb table schema, transaction model, queries, download policies, PublicKeyStore | +| 05 | [Engine and Live Sync](05-engine-and-live-sync.md) | Engine, LiveActor, GossipState, content download, event system, DefaultAuthor | +| 06 | [Network Protocol](06-network-protocol.md) | ALPN, wire format, Alice/Bob protocol flow, error types, gossip integration | +| 07 | [API and Data Flow](07-api-and-data-flow.md) | RPC API, DocsApi, protocol messages, data flow diagrams | +| 08 | [Key Types Reference](08-key-types-reference.md) | All public types, constants, and their relationships | + +## Quick Reference + +### Core Concepts + +- **Namespace**: A document identity. Identified by `NamespaceId` (32 bytes), backed by an Ed25519 keypair (`NamespaceSecret`). +- **Author**: A writer identity. Identified by `AuthorId` (32 bytes), backed by an Ed25519 keypair (`Author`). +- **Entry**: A record identified by (namespace, author, key) with a value of (hash, len, timestamp). +- **SignedEntry**: An entry with dual Ed25519 signatures (namespace + author) proving authorization and authorship. +- **Replica**: A local instance of a document, holding entries in a store. +- **Capability**: Either `Write(NamespaceSecret)` or `Read(NamespaceId)` — controls whether entries can be inserted. +- **Store**: A `redb`-backed persistent store managing authors, namespaces, entries, and peer caches. +- **Engine**: Coordinates sync actors, gossip, and content downloads for live synchronization. + +### Key Algorithms + +1. **Range-based set reconciliation**: Efficiently compute the union of two entry sets over a network by comparing fingerprints of partitions, subdividing when fingerprints differ. +2. **Prefix deletion**: An entry at key "foo" acts as a tombstone for all entries whose key starts with "foo/". +3. **Last-writer-wins**: When entries conflict on the same (namespace, author, key), the one with the higher (timestamp, hash) wins. +4. **XOR fingerprints**: Fingerprint of a set is the XOR of individual entry fingerprints (BLAKE3 hashes of key data). + +### Data Flow + +``` +Application → DocsApi → Engine → LiveActor → GossipState → iroh-gossip + ↓ ↓ + SyncHandle → Actor → Store (redb) ← QUIC streams (iroh) + ↓ + iroh-blobs (content transfer) +``` + +### Dependencies + +- `iroh` — QUIC networking +- `iroh-blobs` — Content-addressed blob storage and transfer +- `iroh-gossip` — Gossip protocol for live updates +- `redb` — Embedded key-value store +- `ed25519-dalek` — Ed25519 signatures +- `blake3` — Hashing +- `postcard` — Serialization \ No newline at end of file diff --git a/docs/research/references/iroh/iroh-gossip/01-overview-architecture.md b/docs/research/references/iroh/iroh-gossip/01-overview-architecture.md new file mode 100644 index 0000000..ba197d8 --- /dev/null +++ b/docs/research/references/iroh/iroh-gossip/01-overview-architecture.md @@ -0,0 +1,79 @@ +# iroh-gossip: Overview & Architecture + +## What Is iroh-gossip? + +`iroh-gossip` is a Rust crate that implements an **epidemic broadcast tree** protocol for disseminating messages among a swarm of peers interested in a common **topic**. It is based on two academic papers: + +- **HyParView** — A hybrid partial view membership protocol for reliable swarm management ([paper](https://asc.di.fct.unl.pt/~jleitao/pdf/dsn07-leitao.pdf)) +- **PlumTree** — An epidemic broadcast tree protocol for efficient message dissemination ([paper](https://asc.di.fct.unl.pt/~jleitao/pdf/srds07-leitao.pdf)) + +The crate is designed as a protocol layer for the [iroh](https://docs.rs/iroh) networking library, but the core protocol logic is **IO-free** and can be used independently. + +## High-Level Architecture + +The crate is organized into two primary modules: + +| Module | Purpose | IO-aware? | +|--------|---------|-----------| +| `proto` | Pure state-machine implementation of the gossip protocol | No — completely IO-free | +| `net` | Networking layer that runs the protocol over iroh connections | Yes — depends on `iroh` and tokio | + +The `net` module is behind the `net` feature flag (enabled by default). An optional `rpc` feature adds remote procedure call support via the `irpc`/`noq` crates. + +### Module Dependency Graph + +``` +┌──────────────┐ +│ api │ ← Public API (Gossip, GossipTopic, GossipSender, GossipReceiver) +└──────┬───────┘ + │ +┌──────▼───────┐ +│ net │ ← Networking actor, connection loops, dialer +└──────┬───────┘ + │ +┌──────▼───────┐ +│ proto │ ← Pure protocol state machines +│ ┌─────────┐ │ +│ │hyparview│ │ ← Membership layer +│ ├─────────┤ │ +│ │ plumtree│ │ ← Broadcast layer +│ ├─────────┤ │ +│ │ topic │ │ ← Per-topic coordinator +│ ├─────────┤ │ +│ │ state │ │ ← Multi-topic state manager +│ ├─────────┤ │ +│ │ util │ │ ← Shared data structures (IndexSet, TimeBoundCache, TimerMap) +│ └─────────┘ │ +└──────────────┘ +``` + +### Key Design Principles + +1. **IO-free protocol core**: The `proto` module is a pure state machine. It takes `InEvent`s, produces `OutEvent`s, and has no knowledge of sockets, async runtimes, or network IO. + +2. **Topic-based isolation**: Each topic (`TopicId` = 32-byte identifier) has completely independent state. Topics are separate swarms and broadcast scopes. Joining multiple topics increases connections and routing table size proportionally. + +3. **Actor model for networking**: The `net` module runs a single async `Actor` that manages all topics, connections, and timers. It bridges between the protocol state machine and real network IO. + +4. **Wire protocol**: Messages are serialized with `postcard` (a `no_std`-friendly serde format) and sent over QUIC streams via iroh connections. Each stream is prefixed with a `StreamHeader` containing the topic ID. + +## Crate Features + +| Feature | Default? | Description | +|---------|----------|-------------| +| `net` | Yes | Networking layer (requires `iroh`, `tokio`, etc.) | +| `rpc` | No | RPC support via `irpc`/`noq` for remote control | +| `metrics` | Yes | Prometheus-style metrics via `iroh-metrics` | +| `test-utils` | No | Test utilities (seeded RNG, etc.) | +| `simulator` | No | CLI simulator for testing | +| `examples` | No | Example binaries (chat, setup) | + +## Cargo Dependencies (Key Ones) + +- `iroh` / `iroh-base` — Networking primitives (Endpoint, EndpointId, PublicKey, etc.) +- `postcard` — Wire serialization (serde-based, `no_std` compatible) +- `blake3` — Message ID hashing +- `ed25519-dalek` — Cryptographic signatures +- `n0-future` / `n0-error` — Async utilities and error handling +- `irpc` / `noq` — RPC infrastructure (optional) +- `indexmap` — Order-preserving hash collections used in `IndexSet` \ No newline at end of file diff --git a/docs/research/references/iroh/iroh-gossip/02-hyparview-membership.md b/docs/research/references/iroh/iroh-gossip/02-hyparview-membership.md new file mode 100644 index 0000000..511774a --- /dev/null +++ b/docs/research/references/iroh/iroh-gossip/02-hyparview-membership.md @@ -0,0 +1,169 @@ +# iroh-gossip: HyParView Membership Protocol + +## Overview + +The HyParView protocol provides **swarm membership management** — it maintains which peers are currently part of the swarm for a given topic and ensures the overlay network remains connected even as nodes join, leave, or fail. + +It is implemented in `src/proto/hyparview.rs`. + +## Core Concept: Two Views + +Each peer maintains two sets of peers: + +| View | Description | Default Size | Connection? | +|------|-------------|--------------|-------------| +| **Active View** | Peers we maintain active bidirectional connections to | 5 | Yes — TCP/QUIC connection is kept open | +| **Passive View** | An address book of peers we know about but are not connected to | 30 | No — just contact information | + +Key invariants: +- **Active connections are always bidirectional**: If peer A has peer B in its active view, peer B also has peer A in its active view. +- The passive view serves as a **failover pool**: When an active peer disconnects, a random peer from the passive view is promoted to fill the slot. + +## Configuration (`hyparview::Config`) + +```rust +pub struct Config { + pub active_view_capacity: usize, // Default: 5 + pub passive_view_capacity: usize, // Default: 30 + pub active_random_walk_length: Ttl, // Default: Ttl(6) + pub passive_random_walk_length: Ttl, // Default: Ttl(3) + pub shuffle_random_walk_length: Ttl, // Default: Ttl(6) + pub shuffle_active_view_count: usize, // Default: 3 + pub shuffle_passive_view_count: usize, // Default: 4 + pub shuffle_interval: Duration, // Default: 60s + pub neighbor_request_timeout: Duration, // Default: 500ms +} +``` + +These defaults come directly from the HyParView paper (p9), except for `shuffle_interval` and `neighbor_request_timeout` which are "wild guesses" in the code. + +## State Structure + +```rust +pub struct State { + me: PI, // Our peer identity + me_data: Option, // Opaque data we share with peers + pub active_view: IndexSet, // Connected peers + pub passive_view: IndexSet, // Known but disconnected peers + config: Config, + shuffle_scheduled: bool, // Whether shuffle timer is active + rng: RG, // Random number generator + stats: Stats, + pending_neighbor_requests: HashSet, // Peers we've sent Neighbor to but no reply yet + peer_data: HashMap, // Opaque data received from other peers + alive_disconnect_peers: HashSet, // Peers disconnecting but to keep in passive view +} +``` + +## Messages (`hyparview::Message`) + +| Message | Direction | Purpose | +|---------|-----------|---------| +| `Join(Option)` | New node → Contact | Sent to a known peer to join the swarm | +| `ForwardJoin(ForwardJoin)` | Propagated | Forwarded to active view to introduce a new member | +| `Neighbor(Neighbor)` | Bidirectional | Request to add sender to active view (with priority) | +| `Disconnect(Disconnect)` | Bidirectional | Notification that a peer is leaving or being demoted | +| `Shuffle(Shuffle)` | Initiated periodically | Sent to random peer to exchange passive view contacts | +| `ShuffleReply(ShuffleReply)` | Reply to Shuffle | Returns a random subset of our views to the origin | + +### Message Details + +```rust +pub struct ForwardJoin { + peer: PeerInfo, // The new peer's identity + optional data + ttl: Ttl, // Time-to-live, decremented per hop +} + +pub struct Shuffle { + origin: PI, // Who initiated the shuffle + nodes: Vec>, // Random subset of our views + ttl: Ttl, // Time-to-live for the random walk +} + +pub struct Neighbor { + priority: Priority, // High (cannot be denied) or Low (can be denied) + data: Option, +} + +pub struct Disconnect { + alive: bool, // If true, peer is still alive (just demoting) + _respond: bool, // Obsolete, kept for wire compat +} +``` + +## Join Procedure (Step by Step) + +1. A new node sends `Join(me_data)` to a known contact peer. +2. The contact peer adds the new node to its active view (even evicting a random peer if necessary). +3. The contact peer forwards `ForwardJoin` to all other peers in its active view with `TTL = active_random_walk_length`. +4. Each peer receiving `ForwardJoin`: + - If `TTL == 0` or active view has ≤1 peer: sends `Neighbor(High)` to the new node (which adds it to active view). + - If `TTL == passive_random_walk_length`: adds the new node to passive view. + - Decrements TTL and forwards to a random active peer (different from sender). + +5. The `Neighbor` message establishes the bidirectional active connection. A `Priority::High` neighbor request **must** be accepted (potentially evicting a random active peer). A `Priority::Low` request is only accepted if there is room. + +## Shuffle Mechanism + +Periodically (every `shuffle_interval`), each node: +1. Picks a random active peer. +2. Sends `Shuffle` containing a random subset of active + passive views plus the origin's info, with a TTL. +3. The shuffle message does a random walk (each hop decrements TTL). +4. When TTL reaches 0 or the active view is ≤1, the peer accepts the shuffle and replies with `ShuffleReply` containing its own random peers. +5. The origin receives `ShuffleReply` and adds new peers to its passive view. + +This ensures the passive view remains fresh and provides good connectivity even in dynamic networks. + +## Failure Recovery + +When a peer in the active view disconnects (detected via `PeerDisconnected`): +1. The peer is removed from the active view. +2. A `NeighborDown` event is emitted. +3. A random peer from the passive view is selected and sent a `Neighbor(Low)` request. +4. If that peer doesn't respond within `neighbor_request_timeout`, it's removed from the passive view and another peer is tried. +5. This continues until a connection is established or the passive view is exhausted. + +If a `Disconnect(alive=true)` message is received: +- The peer is moved to the passive view (not just dropped), because it's still alive. +- The `alive_disconnect_peers` set tracks which disconnected peers should be retained in passive view when their connection eventually closes. + +## PeerData + +`PeerData` is an opaque `Bytes` type that peers exchange when joining. In the `net` module, it is used to serialize and transmit addressing information (`AddrInfo`): + +```rust +struct AddrInfo { + relay_url: Option, + direct_addresses: BTreeSet, +} +``` + +This allows the gossip protocol itself to help propagate connectivity information, enabling the `GossipAddressLookup` service to feed addresses back into iroh's endpoint discovery system. + +## Events (`hyparview::Event`) + +| Event | Meaning | +|-------|---------| +| `NeighborUp(PI)` | A peer was added to our active view | +| `NeighborDown(PI)` | A peer was removed from our active view | + +These events are forwarded up to the PlumTree layer and to the application. + +## Timers + +| Timer | Purpose | +|-------|---------| +| `DoShuffle` | Periodically trigger a shuffle operation | +| `PendingNeighborRequest(PI)` | Timeout for a pending neighbor request | + +## IO Trait Pattern + +The HyParView state machine is generic over an `IO` trait: + +```rust +pub trait IO { + fn push(&mut self, event: impl Into>); +} +``` + +This allows the protocol to emit output events without knowing about the networking layer. The upper layers supply a `VecDeque` or similar container. \ No newline at end of file diff --git a/docs/research/references/iroh/iroh-gossip/03-plumtree-broadcast.md b/docs/research/references/iroh/iroh-gossip/03-plumtree-broadcast.md new file mode 100644 index 0000000..0f5600c --- /dev/null +++ b/docs/research/references/iroh/iroh-gossip/03-plumtree-broadcast.md @@ -0,0 +1,256 @@ +# iroh-gossip: PlumTree Broadcast Protocol + +## Overview + +The PlumTree (Epidemic Broadcast Trees) protocol provides **efficient message broadcasting** across all peers in a topic's swarm. It builds on top of HyParView's membership layer, using the active view as its peer set. + +It is implemented in `src/proto/plumtree.rs`. + +## Core Concept: Eager vs Lazy Push + +Each peer maintains two subsets of its HyParView active view: + +| Set | Description | Behavior | +|-----|-------------|----------| +| **Eager push peers** | Peers to whom full messages are sent immediately | Messages are pushed eagerly (full content) | +| **Lazy push peers** | Peers to whom only message IDs (hashes) are sent | `IHave` announcements are sent, requesting content only if needed | + +When a peer broadcasts a message: +1. The **full message** is pushed to all **eager** peers. +2. The **message ID** (a blake3 hash) is pushed to all **lazy** peers (after a short delay for batching). + +This creates an **optimized broadcast tree**: eager peers form a spanning tree for low-latency delivery, while lazy peers provide redundancy through timeout-based recovery. + +## Configuration (`plumtree::Config`) + +```rust +pub struct Config { + pub graft_timeout_1: Duration, // Default: 80ms + pub graft_timeout_2: Duration, // Default: 40ms + pub dispatch_timeout: Duration, // Default: 5ms + pub optimization_threshold: Round, // Default: Round(7) + pub message_cache_retention: Duration, // Default: 30s + pub message_id_retention: Duration, // Default: 90s + pub cache_evict_interval: Duration, // Default: 1s +} +``` + +### Timeout Semantics + +- **`graft_timeout_1`**: After receiving an `IHave`, wait this long for the full message from an eager peer. If it doesn't arrive, send a `Graft` to the `IHave` sender. +- **`graft_timeout_2`**: After sending a `Graft`, wait this shorter timeout for the reply. If no reply, try the next `IHave` sender. +- **`dispatch_timeout`**: Delay before batching and sending `IHave` messages. This allows multiple announcements to be aggregated into a single message. +- **`optimization_threshold`**: Number of hops difference required to trigger tree optimization (see below). + +### Cache Settings + +- **`message_cache_retention`**: How long to keep full message payloads in cache. This enables replying to `Graft` requests from peers who missed the eager push. +- **`message_id_retention`**: How long to remember that we've already seen a message ID. This prevents re-delivering duplicate messages. +- **`cache_evict_interval`**: How often to check and evict expired entries. + +## State Structure + +```rust +pub struct State { + me: PI, // Our peer identity + config: Config, // Protocol configuration + + pub eager_push_peers: BTreeSet, // Full message delivery peers + pub lazy_push_peers: BTreeSet, // Message-ID-only delivery peers + + lazy_push_queue: BTreeMap>, // Pending IHave announcements (batched) + + missing_messages: HashMap>, // IHave senders awaiting delivery + received_messages: TimeBoundCache, // Seen message IDs + cache: TimeBoundCache, // Full message payloads + + graft_timer_scheduled: HashSet, // Active graft timers + dispatch_timer_scheduled: bool, // Whether IHave dispatch is pending + + init: bool, // Whether first event was processed + stats: Stats, // Message counters + max_message_size: usize, // Maximum allowed message size +} +``` + +## Message Types (`plumtree::Message`) + +| Message | Direction | Purpose | +|---------|-----------|---------| +| `Gossip(Gossip)` | Eager push | Full message content, broadcast to eager peers | +| `Prune` | Bidirectional | Sent when moving a peer from eager to lazy set | +| `Graft(Graft)` | Lazy → Eager upgrade | Request to become an eager peer; may include a message ID to request re-delivery | +| `IHave(Vec)` | Lazy push | Announcement: "I have these messages" (batched, sent after `dispatch_timeout`) | + +### Gossip Message Structure + +```rust +pub struct Gossip { + id: MessageId, // blake3 hash of content + content: Bytes, // The actual message payload + scope: DeliveryScope, // Swarm(round) or Neighbors +} +``` + +The `DeliveryScope` tracks how many hops the message has traveled: + +```rust +pub enum DeliveryScope { + Swarm(Round), // Delivered via the swarm; Round = hop count from origin + Neighbors, // Delivered only to direct neighbors (not forwarded further) +} +``` + +Each time a `Gossip` message is forwarded, its `Round` is incremented via `next_round()`. `Neighbors`-scope messages are not forwarded at all. + +### IHave Structure + +```rust +pub struct IHave { + id: MessageId, // The blake3 hash of the message content + round: Round, // The hop count at which the sender received this message +} +``` + +### Graft Structure + +```rust +pub struct Graft { + id: Option, // If set, also reply with full message content + round: Round, // The round from the IHave that triggered this graft +} +``` + +### Message ID + +```rust +pub struct MessageId([u8; 32]); // blake3 hash of message content + +impl MessageId { + pub fn from_content(message: &[u8]) -> Self { + Self::from(blake3::hash(message)) + } +} +``` + +Messages are validated: when receiving a `Gossip`, the receiver checks that `MessageId::from_content(&content) == id`. Spoofed messages (where the hash doesn't match the content) are silently discarded. + +## Broadcast Flow + +### Sending a Message + +``` +1. Compute MessageId = blake3(content) +2. Create Gossip { id, content, scope: Swarm(Round(0)) or Neighbors } +3. If Swarm scope: + a. Add to received_messages and cache + b. Queue IHave for lazy peers (dispatched after dispatch_timeout) +4. Eager-push Gossip to all eager peers (except self and sender) +``` + +### Receiving a Gossip Message + +``` +1. Validate: message.id == blake3(message.content) → discard if invalid +2. If already received (in received_messages): + → Send Prune to sender (move sender to lazy set) + → Return (don't re-broadcast) +3. If Swarm scope: + a. Add to received_messages + b. Increment round (next_round) + c. Add to cache (for Graft replies) + d. Eager-push to all eager peers (except sender) + e. Lazy-push IHave to all lazy peers (except sender) + f. Check if any prior IHave senders had a shorter path → optimize tree +4. Emit Received event to application +``` + +### Receiving an IHave + +``` +For each IHave entry: + If message ID not in received_messages: + Add (sender, round) to missing_messages[message_id] + If no graft timer scheduled for this message: + Schedule SendGraft timer (graft_timeout_1) +``` + +### Graft Timer Expiry (Two-Phase) + +**Phase 1 (`graft_timeout_1`):** +``` +If message already received → no-op (cancel) +Otherwise: + Pop first (peer, round) from missing_messages[message_id] + Move peer to eager set + Send Graft { id: Some(message_id), round } to that peer + Schedule another SendGraft timer (graft_timeout_2) for fallback +``` + +**Phase 2 (`graft_timeout_2`):** +``` +If message already received → no-op +Otherwise: + Pop next (peer, round) from missing_messages[message_id] + Move that peer to eager set + Send Graft { id: Some(message_id), round } + Schedule another SendGraft timer (graft_timeout_2) + (continues until the message is received or senders are exhausted) +``` + +### Receiving a Graft + +``` +1. Move sender to eager set +2. If Graft contains a message ID: + Look up message in cache + If found: send Gossip(message) to the requesting peer +``` + +### Receiving a Prune + +``` +Move sender from eager set to lazy set +``` + +## Tree Optimization + +The PlumTree self-optimizes based on latency. When a `Gossip` message is received, if we previously received an `IHave` for the same message from a different peer, we check whether the IHave path was significantly shorter: + +``` +if (ihave_round < gossip_round) && (gossip_round - ihave_round) >= optimization_threshold: + Graft the IHave sender (move to eager) + Prune the Gossip sender (move to lazy) +``` + +This means if a peer consistently has a shorter path to the message origin, they are promoted to eager, and the longer-path peer is demoted. The `optimization_threshold` (default: 7 hops) prevents thrashing from minor latency differences. + +## Neighbor Events + +PlumTree receives neighbor events from HyParView: + +- **`NeighborUp(peer)`**: Add peer to eager set (all new neighbors start as eager) +- **`NeighborDown(peer)`**: Remove from both eager and lazy sets; clean up any `IHave` entries from this peer in `missing_messages` + +## Neighbor-Only Broadcast + +The `Scope::Neighbors` broadcast scope sends a message only to directly connected peers (the active view), without any forwarding: + +```rust +pub enum Scope { + Swarm, // Broadcast to all peers in the swarm + Neighbors, // Broadcast only to immediate neighbors +} +``` + +Neighbor-scoped messages are useful for localized communication and are not cached or re-broadcast. + +## Cache Management + +The PlumTree maintains two time-bounded caches: + +1. **`cache`** (`TimeBoundCache`): Stores full message payloads for `message_cache_retention` (default 30s). This enables replying to `Graft` requests for recently-broadcast messages. + +2. **`received_messages`** (`TimeBoundCache`): Tracks which messages have been seen for `message_id_retention` (default 90s). This prevents duplicate delivery. + +Both caches are periodically evicted (every `cache_evict_interval`, default 1s) via the `EvictCache` timer. \ No newline at end of file diff --git a/docs/research/references/iroh/iroh-gossip/04-state-and-topic.md b/docs/research/references/iroh/iroh-gossip/04-state-and-topic.md new file mode 100644 index 0000000..8c5acaa --- /dev/null +++ b/docs/research/references/iroh/iroh-gossip/04-state-and-topic.md @@ -0,0 +1,187 @@ +# iroh-gossip: Protocol State & Topic Coordination + +## Overview + +The `state` module (`src/proto/state.rs`) provides the **top-level protocol state machine** that manages multiple topics. The `topic` module (`src/proto/topic.rs`) coordinates the HyParView and PlumTree state machines for a single topic. + +## Multi-Topic State (`state::State`) + +```rust +pub struct State { + me: PI, // Our peer identity + me_data: PeerData, // Our opaque peer data + config: Config, // Protocol configuration + rng: R, // Random number generator + states: HashMap>, // Per-topic state + outbox: Outbox, // Buffered output events + peer_topics: ConnsMap, // Maps peer → set of shared topics +} +``` + +The `State` acts as a **multiplexer** — it routes events to the correct topic's state and collects output events. It also tracks which topics are shared with each peer (in `peer_topics`), which is used to determine when a peer connection can safely be closed (only when no topic still needs it). + +### TopicId + +```rust +#[derive(Clone, Copy, Eq, PartialEq, Hash, Serialize, Ord, PartialOrd, Deserialize)] +pub struct TopicId([u8; 32]); +``` + +A 32-byte identifier for a topic. Typically created as `blake3::hash(topic_name)` or from raw bytes. Each topic is an independent swarm and broadcast scope. + +### Wire Message Format + +```rust +pub struct Message { + pub topic: TopicId, + pub message: topic::Message, +} +``` + +Every wire message carries the `TopicId` prefix, allowing multiplexing of multiple topics over a single connection. + +### Event Routing + +`InEvent` is mapped to either a topic-specific event or a global event: + +| InEvent | Routing | +|---------|---------| +| `RecvMessage(from, Message{topic, message})` | → Topic-specific: `topic::InEvent::RecvMessage` | +| `Command(topic, command)` | → Topic-specific: `topic::InEvent::Command` | +| `TimerExpired(Timer{topic, timer})` | → Topic-specific: `topic::InEvent::TimerExpired` | +| `PeerDisconnected(peer)` | → Broadcast to ALL topics | +| `UpdatePeerData(data)` | → Broadcast to ALL topics | + +### Topic Lifecycle + +When a `Command::Join(peers)` is received for a topic that doesn't yet have state, a new `topic::State` is automatically created. When `Command::Quit` is received, the topic's state is removed after processing the quit event. + +### Connection Management + +When a `topic::OutEvent::DisconnectPeer(peer)` is emitted, the state module checks `peer_topics` to see if any other topic still needs a connection to that peer. Only when no topic needs the peer anymore is `OutEvent::DisconnectPeer(peer)` emitted at the top level. + +## Topic State (`topic::State`) + +```rust +pub struct State { + me: PI, + pub swarm: hyparview::State, // HyParView membership + pub gossip: plumtree::State, // PlumTree broadcast + outbox: VecDeque>, + stats: Stats, +} +``` + +The topic state **composes** HyParView and PlumTree, bridging them together: + +### Event Forwarding + +When `topic::State::handle()` is called: + +1. **HyParView events** are processed first (membership layer). +2. **NeighborUp/NeighborDown events** emitted by HyParView are forwarded to PlumTree: + - `NeighborUp(peer)` → `plumtree::InEvent::NeighborUp(peer)` — adds peer to eager set + - `NeighborDown(peer)` → `plumtree::InEvent::NeighborDown(peer)` — removes peer from both sets +3. All output events from both layers are collected and returned. + +### Command Handling + +| Command | Action | +|---------|--------| +| `Join(peers)` | Sends `RequestJoin(peer)` to HyParView for each peer in the list | +| `Broadcast(data, scope)` | Sends `Broadcast(data, scope)` to PlumTree | +| `Quit` | Sends `Quit` to HyParView (which sends `Disconnect` to all active peers) | + +### Message Routing + +When a topic message is received: + +```rust +match message { + Message::Swarm(message) => hyparview.handle(RecvMessage(from, message)), + Message::Gossip(message) => plumtree.handle(RecvMessage(from, message)), +} +``` + +### Timer Routing + +```rust +match timer { + Timer::Swarm(timer) => hyparview.handle(TimerExpired(timer)), + Timer::Gossip(timer) => plumtree.handle(TimerExpired(timer)), +} +``` + +## Topic Messages (`topic::Message`) + +```rust +pub enum Message { + Swarm(hyparview::Message), // Membership messages + Gossip(plumtree::Message), // Broadcast messages +} +``` + +The message kind is used for metrics tracking: + +```rust +pub fn kind(&self) -> MessageKind { + match self { + Message::Swarm(_) => MessageKind::Control, + Message::Gossip(message) => match message { + plumtree::Message::Gossip(_) => MessageKind::Data, + _ => MessageKind::Control, + }, + } +} +``` + +## Topic Events (`topic::Event`) + +```rust +pub enum Event { + NeighborUp(PI), // From HyParView: new active neighbor + NeighborDown(PI), // From HyParView: lost active neighbor + Received(GossipEvent), // From PlumTree: received a gossip message +} +``` + +The `Received` event contains: + +```rust +pub struct GossipEvent { + pub content: Bytes, // Message payload + pub delivered_from: PI, // Peer that delivered the message to us + pub scope: DeliveryScope, // Swarm(round) or Neighbors +} +``` + +## Topic Configuration + +```rust +pub struct Config { + pub membership: hyparview::Config, // HyParView configuration + pub broadcast: plumtree::Config, // PlumTree configuration + pub max_message_size: usize, // Maximum wire message size (default: 4096) +} +``` + +The `max_message_size` is the total wire-level message size including headers. The actual payload capacity is computed as `max_message_size - postcard_header_size`, where the header size accounts for the topic ID and message envelope overhead. + +## Statistics + +Each topic tracks: +```rust +pub struct Stats { + pub messages_sent: usize, + pub messages_received: usize, +} +``` + +The PlumTree layer also tracks: +```rust +pub struct Stats { + pub payload_messages_received: u64, + pub control_messages_received: u64, + pub max_last_delivery_hop: u16, +} +``` \ No newline at end of file diff --git a/docs/research/references/iroh/iroh-gossip/05-net-actor.md b/docs/research/references/iroh/iroh-gossip/05-net-actor.md new file mode 100644 index 0000000..192954e --- /dev/null +++ b/docs/research/references/iroh/iroh-gossip/05-net-actor.md @@ -0,0 +1,244 @@ +# iroh-gossip: Networking Layer & Actor Model + +## Overview + +The `net` module (`src/net.rs` and submodules) provides the async runtime layer that connects the IO-free protocol state machine to real network IO via iroh QUIC connections. It is built around a **single Actor** that manages all topics and connections. + +## ALPN Protocol + +```rust +pub const GOSSIP_ALPN: &[u8] = b"/iroh-gossip/1"; +``` + +This ALPN identifier is used when establishing QUIC connections through iroh. + +## Gossip Handle (`net::Gossip`) + +```rust +#[derive(Debug, Clone)] +pub struct Gossip { + pub(crate) inner: Arc, +} +``` + +`Gossip` is the primary public handle. It derefs to `GossipApi`, providing the user-facing interface: + +```rust +// Subscribe to a topic +let (sender, receiver) = gossip.subscribe(topic_id, bootstrap_peers).await?.split(); + +// Subscribe and wait for at least one connection +let topic = gossip.subscribe_and_join(topic_id, bootstrap_peers).await?; + +// Broadcast a message +sender.broadcast(b"hello world".to_vec().into()).await?; + +// Broadcast to neighbors only +sender.broadcast_neighbors(b"local announcement".to_vec().into()).await?; + +// Join additional peers +sender.join_peers(vec![peer_id]).await?; +``` + +### Builder Pattern + +```rust +let gossip = Gossip::builder() + .max_message_size(8192) // Default: 4096 + .membership_config(hyparview_config) // HyParView settings + .broadcast_config(plumtree_config) // PlumTree settings + .alpn(b"/custom-alpn") // Custom ALPN (must match across network) + .spawn(endpoint); +``` + +## Architecture: The Actor + +The core of the networking layer is the `Actor` struct, which runs as a single async task: + +```rust +struct Actor { + alpn: Bytes, + state: proto::State, // Protocol state machine + endpoint: Endpoint, // iroh endpoint for connections + dialer: Dialer, // Manages outgoing connections + rpc_rx: mpsc::Receiver, // API commands + local_rx: mpsc::Receiver, // Local commands (connections, shutdown) + in_event_tx: mpsc::Sender, // Protocol input channel + in_event_rx: mpsc::Receiver, // Protocol input channel (receiver) + timers: Timers, // Scheduled timers + topics: HashMap, // Per-topic subscription state + peers: HashMap, // Per-peer connection state + command_rx: stream_group::Keyed, // Per-topic command streams + quit_queue: VecDeque, // Topics pending unsubscription + connection_tasks: JoinSet<...>, // Running connection loop tasks + metrics: Arc, + topic_event_forwarders: JoinSet, // Tasks forwarding events to subscribers + address_lookup: GossipAddressLookup, // Address discovery integration +} +``` + +### Event Loop + +The actor's `run()` method calls `event_loop()` in a loop. Each iteration uses `tokio::select!` to handle: + +| Source | Action | +|--------|--------| +| `local_rx` (local messages) | Handle incoming connections or shutdown | +| `rpc_rx` (RPC messages) | Process `Join` requests from the API | +| `command_rx` (per-topic commands) | Process `Broadcast`, `BroadcastNeighbors`, `JoinPeers`, or stream closure | +| `addr_updates` (endpoint addr changes) | Update our `PeerData` in the protocol state | +| `dialer` (connection establishment) | Handle successful/failed outgoing connections | +| `in_event_rx` (protocol events from connections) | Feed events to the protocol state machine | +| `timers` (scheduled timers) | Feed timer expirations to the protocol state machine | +| `connection_tasks` (connection task completions) | Handle peer disconnections | +| `topic_event_forwarders` (subscription tasks) | Handle topic cleanup when all subscribers drop | + +### Processing InEvents + +When an `InEvent` is processed, the actor calls `self.state.handle(event, now, metrics)`, which returns `Vec`. For each `OutEvent`: + +| OutEvent | Action | +|----------|--------| +| `SendMessage(peer, message)` | Send via peer's active connection or queue for pending connection | +| `EmitEvent(topic, event)` | Forward to topic's `broadcast::Sender` → subscribers | +| `ScheduleTimer(delay, timer)` | Schedule timer via `Timers` data structure | +| `DisconnectPeer(peer)` | Drop the peer's send channel, removing from `peers` map | +| `PeerData(endpoint_id, data)` | Decode `AddrInfo` from `PeerData`, add to `GossipAddressLookup` | + +## Connection Management + +### Peer States + +```rust +enum PeerState { + Pending { + queue: Vec, // Messages queued while connecting + }, + Active { + active_send_tx: mpsc::Sender, // Current active send channel + active_conn_id: ConnId, // Stable ID of active connection + other_conns: Vec, // Older connections still closing + }, +} +``` + +When a message needs to be sent to a peer: +- **Active**: Send immediately via `active_send_tx` +- **Pending**: Queue the message and initiate a dial + +### Dialer + +```rust +struct Dialer { + endpoint: Endpoint, + pending: JoinSet<(EndpointId, Option>)>, + pending_dials: HashMap, +} +``` + +The `Dialer` manages outgoing connections. It: +1. Checks if a dial is already pending for a peer +2. Spawns an async connection task with cancellation support +3. Returns completed connections via `next_conn()` + +### Connection Loop + +Each peer connection runs a `connection_loop` task: + +```rust +async fn connection_loop( + from: PublicKey, // Remote peer's public key + conn: Connection, // QUIC connection + origin: ConnOrigin, // Accept (incoming) or Dial (outgoing) + send_rx: mpsc::Receiver, // Messages to send + in_event_tx: mpsc::Sender, // Channel to protocol + max_message_size: usize, // Maximum message size + queue: Vec, // Queued messages to send first +) -> Result<(), ConnectionLoopError> +``` + +The connection loop: +1. First sends any queued messages +2. Runs a send loop and receive loop concurrently (`tokio::join!`) +3. Uses iroh QUIC bidirectional streams for communication + +### Wire Protocol + +Messages are serialized with `postcard` and sent as **length-prefixed frames** over QUIC unidirectional streams: + +``` +┌──────────────┐ +│ Stream Header │ ── Contains TopicId (sent once per stream) +├──────────────┤ +│ Frame (len) │ ── u32 length prefix +│ Frame (data) │ ── postcard-encoded topic::Message +├──────────────┤ +│ Frame (len) │ ── next message... +│ Frame (data) │ +└──────────────┘ +``` + +Each topic gets its own unidirectional stream. The stream header is sent once when the stream is opened. Disconnect messages close the stream after being sent. + +The `SendLoop` manages per-topic streams within a connection: + +```rust +struct SendLoop { + conn: Connection, + streams: HashMap, // One stream per topic + buffer: Vec, + max_message_size: usize, + send_rx: mpsc::Receiver, +} +``` + +When a disconnect message is sent for a topic, the stream for that topic is closed (via `finish()`). + +## Topic State (Net Layer) + +```rust +struct TopicState { + neighbors: BTreeSet, // Current active neighbors (from protocol) + event_sender: broadcast::Sender, // Broadcast channel to subscribers + command_rx_keys: HashSet, // Active command stream keys +} +``` + +A topic is considered "still needed" if it has either: +- Active command receivers (publishers), or +- Active event subscribers (subscribers) + +When neither exists, the topic is queued for quit/unsubscription. + +## Address Lookup Integration + +The `GossipAddressLookup` integrates with iroh's address discovery system: + +```rust +pub(crate) struct GossipAddressLookup { + endpoints: NodeMap, // BTreeMap + _task_handle: Arc>, // Background eviction task +} +``` + +It implements iroh's `AddressLookup` trait, allowing gossip-discovered peer addresses to feed back into iroh's connection establishment. This means that when a peer shares its address information in `Join` or `ForwardJoin` messages, that information is used to help iroh connect to that peer. + +Entries expire after 5 minutes (configurable via `RetentionOpts`), with eviction checks every 30 seconds. + +## Metrics + +The `Metrics` struct tracks various counters: + +| Metric | Description | +|--------|-------------| +| `msgs_ctrl_sent` | Control messages sent | +| `msgs_ctrl_recv` | Control messages received | +| `msgs_data_sent` | Data messages sent | +| `msgs_data_recv` | Data messages received | +| `msgs_data_sent_size` | Total size of data messages sent | +| `msgs_data_recv_size` | Total size of data messages received | +| `msgs_ctrl_sent_size` | Total size of control messages sent | +| `msgs_ctrl_recv_size` | Total size of control messages received | +| `neighbor_up` | Neighbor connections established | +| `neighbor_down` | Neighbor connections lost | +| `actor_tick_*` | Various event loop tick counters | \ No newline at end of file diff --git a/docs/research/references/iroh/iroh-gossip/06-api-data-flow.md b/docs/research/references/iroh/iroh-gossip/06-api-data-flow.md new file mode 100644 index 0000000..dadc9da --- /dev/null +++ b/docs/research/references/iroh/iroh-gossip/06-api-data-flow.md @@ -0,0 +1,290 @@ +# iroh-gossip: Public API & Data Flow + +## Public API Types + +### Gossip (Main Handle) + +The `Gossip` struct is the main entry point, created via a `Builder`: + +```rust +let gossip = Gossip::builder() + .max_message_size(8192) + .membership_config(HyparviewConfig { ... }) + .broadcast_config(PlumtreeConfig { ... }) + .alpn(b"/custom-alpn") + .spawn(endpoint); +``` + +It derefs to `GossipApi`, which provides: + +| Method | Description | +|--------|-------------| +| `subscribe(topic_id, bootstrap)` | Join a topic with default options | +| `subscribe_and_join(topic_id, bootstrap)` | Join and wait for at least one connection | +| `subscribe_with_opts(topic_id, opts)` | Join with custom `JoinOptions` | +| `handle_connection(conn)` | Handle an incoming QUIC connection | +| `shutdown()` | Gracefully leave all topics and stop | +| `max_message_size()` | Get configured max message size | +| `metrics()` | Get metrics handle | + +### GossipTopic (Subscription Handle) + +Returned by `subscribe()`, it is a `Stream>`: + +```rust +let topic: GossipTopic = gossip.subscribe(topic_id, peers).await?; +topic.broadcast(b"hello".to_vec().into()).await?; +topic.broadcast_neighbors(b"local".to_vec().into()).await?; +topic.joined().await?; // Wait for first connection +``` + +Can be split into sender and receiver: + +```rust +let (sender, receiver) = topic.split(); +// sender: GossipSender - can broadcast and join peers +// receiver: GossipReceiver - can receive events and check neighbors +``` + +### GossipSender + +```rust +pub struct GossipSender(mpsc::Sender); + +impl GossipSender { + pub async fn broadcast(&self, message: Bytes) -> Result<(), ApiError>; + pub async fn broadcast_neighbors(&self, message: Bytes) -> Result<(), ApiError>; + pub async fn join_peers(&self, peers: Vec) -> Result<(), ApiError>; +} +``` + +### GossipReceiver + +```rust +pub struct GossipReceiver { + stream: Pin> + Send + Sync + 'static>>, + neighbors: HashSet, +} + +impl GossipReceiver { + pub fn neighbors(&self) -> impl Iterator + '_; + pub async fn joined(&mut self) -> Result<(), ApiError>; + pub fn is_joined(&self) -> bool; +} +``` + +The `GossipReceiver` tracks the neighbor set internally by processing `NeighborUp` and `NeighborDown` events. + +### Event Types + +```rust +pub enum Event { + NeighborUp(EndpointId), // New direct neighbor connected + NeighborDown(EndpointId), // Direct neighbor disconnected + Received(Message), // Gossip message received + Lagged, // Internal channel lagged (messages dropped) +} + +pub struct Message { + pub content: Bytes, // Message content + pub scope: DeliveryScope, // Swarm(round) or Neighbors + pub delivered_from: EndpointId, // Peer that delivered the message to us +} +``` + +### Command Types + +```rust +pub enum Command { + Broadcast(Bytes), // Broadcast to all in swarm + BroadcastNeighbors(Bytes), // Broadcast to direct neighbors only + JoinPeers(Vec), // Join additional peers +} +``` + +### JoinOptions + +```rust +pub struct JoinOptions { + pub bootstrap: BTreeSet, // Initial peers to connect to + pub subscription_capacity: usize, // Event channel capacity (default: 2048) +} +``` + +### DeliveryScope + +```rust +pub enum DeliveryScope { + Swarm(Round), // Message traveled `Round` hops from origin + Neighbors, // Direct neighbor message (not forwarded) +} +``` + +`DeliveryScope::Swarm(Round(0))` means the message was sent by a direct neighbor. `Round(n)` means the message traveled n hops. + +## Data Flow Diagrams + +### Joining a Topic + +``` +User Code GossipApi Actor Proto State + | | | | + |-- subscribe(topic, peers)->| | | + | |-- JoinRequest ------->| | + | | |-- Command::Join ------>| + | | | |-- RequestJoin(peers) + | | | |-- SendMessage(peer, Join) + | | | |-- ... + | |<-- NeighborUp events--|<-- EmitEvent(NeighborUp)| + |<-- Event::NeighborUp ------| | | +``` + +### Broadcasting a Message + +``` +User Code GossipSender Actor Proto State Network + | | | | | + |-- broadcast(msg) ->| | | | + | |-- Command:: --> | | | + | | Broadcast | | | + | | |-- Broadcast ---->| | + | | | |-- eager_push --->| + | | | | (Gossip msgs) | + | | | |-- lazy_push ----->| + | | | | (IHave msgs) | + | | | | | + | (other peer receives Gossip) | | | + | | | |<-- RecvMessage --| + | | |<-- InEvent -------| | + | | | | (validates ID) | + | | | | (forwards) | + |<-- Received(msg) -|<-- EmitEvent -| | | +``` + +### Receiving and Processing IHave/Graft + +``` +Time → + +Peer A Our Node Peer B + | | | + |-- IHave(id, round) --->| | + | | Schedule graft_timeout_1 | + | | (wait for eager push) | + | | | + | [timeout expires] | | + | |-- Graft(id, round) ----->| (Peer B sent IHave) + | | | + | |<-- Gossip(content) -------| (Peer B replies) + | | | + | |-- Prune ----------------->| (maybe, if optimization) +``` + +### HyParView Join Flow + +``` +New Node Contact Node Active Peers of Contact + | | | + |-- Join(me_data) -->| | + | |-- add_active(new) | + | |-- Neighbor(High) ----->| (to new node) + | |-- ForwardJoin ------->| (to each active peer) + | | |-- add_active or add_passive + | | |-- Neighbor(Low/High) -> (to new node) + | | |-- ForwardJoin -> (random peer) + | | | + |<-- Neighbor(High) -| | + |<-- Neighbor(Low/High) ----------------------| + | | | +``` + +### Shuffle Periodic Operation + +``` +Node A Node B Random Node + | | | + |-- Shuffle ---------->| | + | (origin=A, nodes, | | + | TTL=6) | | + | |-- Shuffle ------------>| + | | (origin=A, nodes, | + | | TTL=5) | + | | |-- ... + | | |-- (TTL reaches 0) + | | | + |<-- ShuffleReply ----|<-- ShuffleReply --------| + | (random nodes) | (random nodes) | + | | | + |-- add_passive(nodes from reply) | +``` + +## RPC Support (Optional Feature) + +When the `rpc` feature is enabled, `GossipApi` can also operate remotely: + +```rust +// Server side +gossip.listen(rpc_endpoint).await; + +// Client side +let api = GossipApi::connect(rpc_endpoint, addr); +let topic = api.subscribe_and_join(topic_id, bootstrap).await?; +``` + +This uses the `irpc`/`noq` crates for bidirectional streaming RPC. The `Join` request establishes a bidirectional stream: +- Client → Server: `Command` messages (Broadcast, BroadcastNeighbors, JoinPeers) +- Server → Client: `Event` messages (NeighborUp, NeighborDown, Received, Lagged) + +## Channel Architecture + +``` + ┌─────────────────────────────────────────────────┐ + │ Actor │ + │ │ + RPC/Local ──────►│ rpc_rx ◄─────────────────────────────────────│ + Commands │ local_rx ◄── HandleConnection, Shutdown │ + │ │ + │ in_event_tx ──► in_event_rx ────────────────│──► proto::State::handle() + │ │ │ + │ ◄── OutEvent ────────────────────────────────│◄──── │ + │ │ │ + │ ├──► SendMessage ──► peer.send_tx │ + │ ├──► EmitEvent ──► topic.event_sender │ + │ ├──► ScheduleTimer ──► timers │ + │ ├──► DisconnectPeer ──► drop peer │ + │ └──► PeerData ──► address_lookup │ + │ │ + │ topic.event_sender ──► broadcast channel ────│──► GossipReceiver + │ │ + │ command_rx ◄─── per-topic command streams ──│◄── GossipSender + │ │ + └─────────────────────────────────────────────────┘ +``` + +## Configuration Defaults Summary + +| Parameter | Default | Source | +|-----------|---------|--------| +| Active view capacity | 5 | HyParView paper (p9) | +| Passive view capacity | 30 | HyParView paper (p9) | +| Active random walk length | 6 | HyParView paper (p9) | +| Passive random walk length | 3 | HyParView paper (p9) | +| Shuffle random walk length | 6 | HyParView paper (p9) | +| Shuffle active view count | 3 | HyParView paper (p9) | +| Shuffle passive view count | 4 | HyParView paper (p9) | +| Shuffle interval | 60s | Implementation choice | +| Neighbor request timeout | 500ms | Implementation choice | +| Graft timeout 1 | 80ms | Implementation choice | +| Graft timeout 2 | 40ms | Implementation choice | +| Dispatch timeout | 5ms | Implementation choice | +| Optimization threshold | 7 hops | PlumTree paper (p12) | +| Message cache retention | 30s | Implementation choice | +| Message ID retention | 90s | Implementation choice | +| Cache evict interval | 1s | Implementation choice | +| Max message size | 4096 bytes | Implementation choice | +| Send queue capacity | 64 messages | Implementation choice | +| To-actor channel capacity | 64 messages | Implementation choice | +| In-event channel capacity | 1024 messages | Implementation choice | +| Topic event channel capacity | 256 events | Implementation choice | +| Topic events default capacity | 2048 events | Implementation choice | +| Topic commands channel capacity | 64 commands | Implementation choice | \ No newline at end of file diff --git a/docs/research/references/iroh/iroh-gossip/07-utilities-wire-format.md b/docs/research/references/iroh/iroh-gossip/07-utilities-wire-format.md new file mode 100644 index 0000000..c6e012a --- /dev/null +++ b/docs/research/references/iroh/iroh-gossip/07-utilities-wire-format.md @@ -0,0 +1,176 @@ +# iroh-gossip: Utility Data Structures & Wire Format + +## IndexSet (`proto::util::IndexSet`) + +A wrapper around `indexmap::IndexSet` that provides random selection capabilities needed by HyParView: + +```rust +pub(crate) struct IndexSet { + inner: indexmap::IndexSet, +} +``` + +### Key Operations + +| Method | Purpose | +|--------|---------| +| `insert(value)` | Add element (returns false if already present) | +| `remove(value)` | Remove by value (swap-remove, O(1)) | +| `remove_index(index)` | Remove by index (swap-remove) | +| `remove_random(rng)` | Remove a random element | +| `pick_random(rng)` | Get reference to random element | +| `pick_random_without(exclude, rng)` | Random element excluding certain elements | +| `pick_random_index(rng)` | Random index | +| `shuffled(rng)` | All elements in random order | +| `shuffled_and_capped(len, rng)` | First `len` elements after shuffle | +| `shuffled_without(exclude, rng)` | Random order excluding certain elements | +| `shuffled_without_and_capped(exclude, len, rng)` | Capped shuffle excluding elements | +| `iter_without(value)` | Iterator skipping a specific element | + +These operations are critical for HyParView's random walks, shuffle exchanges, and passive view management. + +## TimerMap (`proto::util::TimerMap`) + +A priority queue of timer entries sorted by `Instant`, with stable ordering via a sequence counter: + +```rust +pub struct TimerMap { + heap: BinaryHeap>, + seq: u64, +} +``` + +Used by the protocol state machine for scheduling future events (shuffles, graft timeouts, cache eviction). The networking layer wraps this in an async-friendly `Timers` type that can `wait_next()`. + +### Key Operations + +| Method | Purpose | +|--------|---------| +| `insert(instant, item)` | Schedule a timer | +| `pop_before(limit)` | Pop the earliest entry if it's before `limit` | +| `drain_until(from)` | Drain all entries up to a time | +| `first()` | Get reference to earliest entry | + +## TimeBoundCache (`proto::util::TimeBoundCache`) + +A `HashMap` where entries expire after a specified `Instant`: + +```rust +pub struct TimeBoundCache { + map: HashMap, + expiry: TimerMap, +} +``` + +Used by PlumTree for: +- `received_messages: TimeBoundCache` — deduplication +- `cache: TimeBoundCache` — message payload storage for Graft replies + +### Key Operations + +| Method | Purpose | +|--------|---------| +| `insert(key, value, expires)` | Insert with expiration | +| `contains_key(key)` | Check existence | +| `get(key)` | Get value | +| `expires(key)` | Get expiration time | +| `expire_until(instant)` | Remove all expired entries, returns count | +| `len()` / `is_empty()` | Size queries | + +The `expire_until` method correctly handles re-insertions: if a key is re-inserted with a later expiration time after being added to the expiry queue, the old expiry entry is ignored (not removed from the map). + +## Wire Format + +### Frame Encoding + +Messages are encoded using `postcard` (a `no_std`-friendly, `serde`-compatible format) and sent as length-prefixed frames: + +``` +┌──────────────┬──────────────┬─────────────────┐ +│ Length (u32) │ TopicHeader │ Message Payload │ +│ big-endian │ postcard │ postcard │ +└──────────────┴──────────────┴─────────────────┘ +``` + +### Stream Protocol + +Each QUIC unidirectional stream is dedicated to a single topic. The stream begins with a `StreamHeader`: + +```rust +pub(crate) struct StreamHeader { + pub(crate) topic_id: TopicId, +} +``` + +All subsequent frames on that stream carry messages for that topic. When a `Disconnect` message is sent, the stream is closed (via `finish()`). + +### Message Types on Wire + +```rust +pub enum Message { + Swarm(hyparview::Message), // Membership messages + Gossip(plumtree::Message), // Broadcast messages +} +``` + +Where `PI` is `PublicKey` (32-byte ed25519 public key) in the networking layer. + +The `MessageKind` classification is used for metrics: + +| Kind | Message Types | +|------|--------------| +| `Data` | `Gossip` messages (actual content) | +| `Control` | All Swarm messages, plus `Prune`, `Graft`, `IHave` | + +### Message Size Limits + +- Default max message size: 4096 bytes (minimum: 512) +- The header size is computed at compile time via `postcard::experimental::serialized_size` +- Actual payload capacity = `max_message_size - header_size` + +The `SendLoop` checks message size before writing and returns `WriteError::TooLarge` if exceeded. + +## PeerData & Address Propagation + +The `PeerData` type is an opaque `Bytes` wrapper used in HyParView messages. In the `net` layer, it carries addressing information: + +```rust +struct AddrInfo { + relay_url: Option, + direct_addresses: BTreeSet, +} +``` + +This is serialized with `postcard` and passed as `PeerData` in `Join`, `ForwardJoin`, and `Neighbor` messages. When received, the `AddrInfo` is decoded and fed into `GossipAddressLookup`, which implements iroh's `AddressLookup` trait, allowing gossip-discovered addresses to be used for future connections. + +## GossipAddressLookup + +```rust +pub(crate) struct GossipAddressLookup { + endpoints: NodeMap, // Arc>> + _task_handle: Arc>, // Background eviction task +} +``` + +Key behaviors: +- **Merging**: When adding addresses for an already-known endpoint, new addresses are merged (union of direct addresses, relay URL is overwritten) +- **Expiration**: Entries expire after 5 minutes, with eviction checks every 30 seconds +- **Integration**: Implements `iroh::address_lookup::AddressLookup`, returning data with provenance "gossip" + +## Dialer + +```rust +struct Dialer { + endpoint: Endpoint, + pending: JoinSet<(EndpointId, Option>)>, + pending_dials: HashMap, +} +``` + +The `Dialer` manages outgoing connection attempts: +- Queues a dial via `queue_dial(endpoint_id, alpn)` +- Checks for pending dials to avoid duplicate connections +- Supports cancellation of in-progress dials +- Returns completed connections via `next_conn()` + +When a dial succeeds, the connection is passed to `handle_connection()`. When a dial fails and the peer is not already active, a `PeerDisconnected` event is injected into the protocol state. \ No newline at end of file diff --git a/docs/research/references/iroh/iroh-gossip/08-testing-metrics-refs.md b/docs/research/references/iroh/iroh-gossip/08-testing-metrics-refs.md new file mode 100644 index 0000000..c0fb68d --- /dev/null +++ b/docs/research/references/iroh/iroh-gossip/08-testing-metrics-refs.md @@ -0,0 +1,169 @@ +# iroh-gossip: Testing & Simulation + +## Test Infrastructure + +The crate includes two layers of testing: + +### 1. Unit Tests (in source files) + +Unit tests are embedded in each module file behind `#[cfg(test)]`: + +| Module | Tests | +|--------|-------| +| `proto/hyparview.rs` | Not shown (would be in the file) | +| `proto/plumtree.rs` | `optimize_tree`, `spoofed_messages_are_ignored`, `cache_is_evicted` | +| `proto.rs` | `hyparview_smoke`, `plumtree_smoke`, `quit` | +| `net.rs` | `gossip_net_smoke`, `subscription_cleanup` | +| `api.rs` | `test_rpc`, `ensure_gossip_topic_is_sync` | +| `proto/util.rs` | `indexset`, `timer_map`, `hex`, `time_bound_cache` | + +### 2. Protocol Simulator (`proto::sim`) + +The `sim` module (behind `test-utils` feature) provides a deterministic network simulator: + +```rust +// Available when feature = "test-utils" +pub mod sim; +``` + +This allows testing the protocol logic without any real networking, using seeded RNG for reproducibility. + +The simulator creates a `Network` of virtual nodes, each running their own `proto::State`. Events are processed in discrete "trips" (round-trips), allowing controlled testing of protocol behavior. + +### 3. Simulation Binary (`sim` feature) + +The crate includes a CLI simulator (behind `simulator` feature) that can run large-scale simulations: + +``` +cargo run --bin sim --features simulator +``` + +This uses `rayon` for parallel execution and `comfy-table` for result output. + +### 4. Integration Tests (`tests/sim.rs`) + +Behind the `test-utils` feature, provides end-to-end protocol testing. + +## Key Test Patterns + +### Protocol-Level Smoke Test + +From `proto.rs`: + +```rust +#[test] +fn hyparview_smoke() { + let rng = ChaCha12Rng::seed_from_u64(0); + let mut config = Config::default(); + config.membership.active_view_capacity = 2; + let mut network = Network::new(config.into(), rng); + for i in 0..4 { network.insert(i); } + let t: TopicId = [0u8; 32].into(); + + // Join nodes + network.command(0, t, Command::Join(vec![1, 2])); + network.command(1, t, Command::Join(vec![2])); + network.command(2, t, Command::Join(vec![])); + network.run_trips(3); + + // Verify events and connections + assert_eq!(network.events_sorted(), expected); + assert_eq!(network.conns(), vec![(0, 1), (0, 2), (1, 2)]); + assert!(network.check_synchronicity()); +} +``` + +### PlumTree Optimization Test + +From `plumtree.rs`: + +```rust +#[test] +fn optimize_tree() { + // When an IHave message arrives with fewer hops than the Gossip message, + // and the difference exceeds optimization_threshold, the tree is restructured: + // - The IHave sender is promoted to eager (Graft) + // - The Gossip sender is demoted to lazy (Prune) +} +``` + +### Spoofed Message Test + +```rust +#[test] +fn spoofed_messages_are_ignored() { + // Messages where MessageId != blake3(content) are silently discarded + let message = Message::Gossip(Gossip { + content: content.clone(), + id: MessageId::from_content(b"wrong_content"), // Spoofed! + scope: DeliveryScope::Swarm(Round(1)), + }); + state.handle(InEvent::RecvMessage(2, message), now, &mut io); + // No events are emitted +} +``` + +### Networking Smoke Test + +From `net.rs`: + +```rust +#[tokio::test] +async fn gossip_net_smoke() { + // Creates 3 endpoints with a relay server + // Subscribes and joins a topic + // Broadcasts messages and verifies reception + // Uses real QUIC connections via iroh +} +``` + +## Metrics + +The `Metrics` struct (in `src/metrics.rs`) uses `iroh_metrics::MetricsGroup`: + +```rust +#[derive(Debug, Default, MetricsGroup)] +#[metrics(name = "gossip")] +pub struct Metrics { + pub msgs_ctrl_sent: Counter, + pub msgs_ctrl_recv: Counter, + pub msgs_data_sent: Counter, + pub msgs_data_recv: Counter, + pub msgs_data_sent_size: Counter, + pub msgs_data_recv_size: Counter, + pub msgs_ctrl_sent_size: Counter, + pub msgs_ctrl_recv_size: Counter, + pub neighbor_up: Counter, + pub neighbor_down: Counter, + pub actor_tick_main: Counter, + pub actor_tick_rx: Counter, + pub actor_tick_endpoint: Counter, + pub actor_tick_dialer: Counter, + pub actor_tick_dialer_success: Counter, + pub actor_tick_dialer_failure: Counter, + pub actor_tick_in_event_rx: Counter, + pub actor_tick_timers: Counter, +} +``` + +These are tracked both in the protocol state machine (for message counts) and in the actor event loop (for tick-level diagnostics). When the `metrics` feature is enabled, they are exported via Prometheus-compatible endpoints. + +## References + +### Academic Papers + +- **HyParView**: Leitao, J., Pereira, J., & Rodrigues, L. (2007). "HyParView: A Membership Protocol for Reliable Gossip Multicast with Dense Coverage." [PDF](https://asc.di.fct.unl.pt/~jleitao/pdf/dsn07-leitao.pdf) +- **PlumTree**: Leitao, J., Pereira, J., & Rodrigues, L. (2007). "Epidemic Broadcast Trees." [PDF](https://asc.di.fct.unl.pt/~jleitao/pdf/srds07-leitao.pdf) + +### Implementation Reference + +- Bartosz Sypytkowski's example implementation: [gist](https://gist.github.com/Horusiath/84fac596101b197da0546d1697580d99) + +### Related Projects + +- [iroh](https://docs.rs/iroh) — The networking library that iroh-gossip integrates with +- [Earthstar](https://github.com/earthstar-project/earthstar) — Another PlumTree implementation referenced in code comments + +### Crate Repository + +- [github.com/n0-computer/iroh-gossip](https://github.com/n0-computer/iroh-gossip) \ No newline at end of file diff --git a/docs/research/references/iroh/iroh-gossip/README.md b/docs/research/references/iroh/iroh-gossip/README.md new file mode 100644 index 0000000..9f1b5fc --- /dev/null +++ b/docs/research/references/iroh/iroh-gossip/README.md @@ -0,0 +1,40 @@ +# iroh-gossip Reference Documentation + +This directory contains a deep-dive reference on how the `iroh-gossip` crate works, based on source code analysis of the repository at `/workspace/iroh-gossip`. + +## Documents + +| # | File | Topic | +|---|------|-------| +| 01 | [Overview & Architecture](01-overview-architecture.md) | Crate structure, module organization, design principles, features, dependencies | +| 02 | [HyParView Membership](02-hyparview-membership.md) | Swarm membership protocol: active/passive views, join procedure, shuffle mechanism, failure recovery, PeerData | +| 03 | [PlumTree Broadcast](03-plumtree-broadcast.md) | Epidemic broadcast trees: eager/lazy push, Graft/IHave/Prune, tree optimization, message deduplication, cache management | +| 04 | [State & Topic Coordination](04-state-and-topic.md) | Multi-topic state management, topic lifecycle, event routing between HyParView and PlumTree | +| 05 | [Net Actor & Networking](05-net-actor.md) | Actor model, event loop, connection management, Dialer, wire protocol, address lookup, topic state in the net layer | +| 06 | [API & Data Flow](06-api-data-flow.md) | Public API types, subscription model, event/command flow, channel architecture, configuration defaults | +| 07 | [Utilities & Wire Format](07-utilities-wire-format.md) | IndexSet, TimerMap, TimeBoundCache, serialization, PeerData/AddrInfo, Dialer internals | +| 08 | [Testing & Metrics](08-testing-metrics-refs.md) | Test infrastructure, simulation, key test patterns, metrics, references | + +## Quick Reference + +### Version +`iroh-gossip` v0.97.0 + +### ALPN +`/iroh-gossip/1` + +### Core Protocols +- **HyParView**: Hybrid partial view membership (active view = 5, passive view = 30 by default) +- **PlumTree**: Epidemic broadcast trees (eager + lazy push with Graft/IHave optimization) + +### Key Abstractions +- **TopicId**: 32-byte identifier for a topic/swarm +- **PeerIdentity**: Generic trait (instantiated as `PublicKey` in the net layer) +- **PeerData**: Opaque bytes exchanged on join (carries `AddrInfo` in net layer) +- **IO trait**: Interface for protocol output events (pure state machine, no IO) + +### Wire Format +- Postcard (serde) encoding over QUIC unidirectional streams +- Length-prefixed frames (u32 length + postcard payload) +- Stream header with TopicId +- Max message size: 4096 bytes (configurable, minimum 512) \ No newline at end of file diff --git a/docs/research/references/iroh/iroh-live/01-overview-and-architecture.md b/docs/research/references/iroh/iroh-live/01-overview-and-architecture.md new file mode 100644 index 0000000..31aabff --- /dev/null +++ b/docs/research/references/iroh/iroh-live/01-overview-and-architecture.md @@ -0,0 +1,104 @@ +# iroh-live: Overview and Architecture + +## What It Is + +iroh-live is a real-time audio/video streaming system built on top of [iroh](https://github.com/n0-computer/iroh) (QUIC-based P2P networking) and [Media over QUIC (MoQ)](https://moq.dev/). It handles the full pipeline: camera/mic capture → encoding → transport → decoding → rendering. Connections are peer-to-peer by default, with an optional relay server for browser access via WebTransport. + +**Status:** Early tech preview. APIs are unstable. Windows support is missing. Audio-video sync is basic. + +## Workspace Crates + +| Crate | Description | +|-------|-------------| +| `iroh-live` | High-level API: `Live`, `Call`, `Room`, tickets, subscriptions | +| `iroh-moq` | MoQ transport layer over iroh/QUIC via `web-transport-iroh` | +| `iroh-live-relay` | Relay server bridging iroh P2P to browser WebTransport | +| `moq-media` | Media pipelines: capture, encode, decode, publish, subscribe, adaptive bitrate. No iroh dependency | +| `rusty-codecs` | Codec implementations (H264/openh264, AV1/rav1e+ rav1d, Opus), hardware accel (VAAPI, V4L2, VideoToolbox) | +| `rusty-capture` | Cross-platform capture: PipeWire, V4L2, X11, ScreenCaptureKit, AVFoundation | +| `moq-media-egui` | egui integration for video rendering | +| `moq-media-dioxus` | dioxus-native integration for video rendering | +| `moq-media-android` | Android camera, EGL rendering, JNI bridge | +| `iroh-live-cli` | CLI tool (`irl`) for publishing, playing, calls, rooms, relay | + +## Layer Architecture + +Three distinct layers, each usable independently: + +``` +┌──────────────────────────────────────────────────────────┐ +│ iroh-live │ +│ Session management, tickets, rooms, calls │ +│ Re-exports: moq-media, iroh-moq │ +├──────────────────────────────────────────────────────────┤ +│ moq-media │ +│ Media pipelines: LocalBroadcast, RemoteBroadcast, │ +│ codecs, adaptive bitrate, playout │ +│ NO iroh dependency (transport-agnostic) │ +├──────────────────────────────────────────────────────────┤ +│ iroh-moq │ +│ MoQ session management, publish/subscribe over QUIC │ +│ Uses web-transport-iroh + moq-lite │ +└──────────────────────────────────────────────────────────┘ + +Below moq-media: + rusty-codecs ─ codec implementations, hardware accel, wgpu rendering + rusty-capture ─ platform-specific screen/camera capture +``` + +## Design Principles + +1. **`&self` everywhere** — All public types use interior mutability. Safe to share across async tasks/threads without wrappers. +2. **Drop-based cleanup** — Dropping a `Call` closes it. Dropping `LocalBroadcast` tears down encoders. Dropping `VideoTrack` stops its decoder thread. +3. **Watcher for continuous state, Stream for discrete events** — Connection quality and catalog contents use `n0_watcher::Direct`. Participant joins use `impl Stream`. +4. **Declarative intent, not mechanism** — `VideoTarget::default().max_pixels(1280*720)` describes what quality you need. The catalog selects the best rendition. +5. **moq-media is standalone** — A recording pipeline can use `LocalBroadcast`/`RemoteBroadcast` without iroh-live. The transport boundary is the `PacketSink`/`PacketSource` trait pair. + +## Data Flow (End-to-End) + +``` +Publisher Side: + capture source (rusty-capture, VideoSource trait) + │ + ▼ + encoder pipeline (moq-media, dedicated OS thread) + │ + ▼ EncodedFrame + PacketSink (MoqPacketSink — starts new MoQ group on keyframe) + │ + ▼ MoQ transport (iroh-moq, QUIC streams) + +Subscriber Side: + PacketSource (MoqPacketSource — reads ordered frames from MoQ) + │ + ▼ MediaPacket + decoder pipeline (moq-media, dedicated OS thread) + │ + ▼ VideoFrame + FramePacer (PTS-based sleep) or Sync (shared playout clock) + │ + ▼ + renderer (wgpu texture upload or egui widget) +``` + +Encoder and decoder pipelines run on **dedicated OS threads**, not tokio tasks, so slow codec operations never block the async runtime. The `forward_packets` async task bridges the network-side `PacketSource` into an mpsc channel that the decoder thread reads synchronously. + +## Key Dependencies + +| Dependency | Purpose | +|------------|---------| +| `iroh` | QUIC endpoint, connection management, P2P connectivity | +| `iroh-gossip` | Gossip protocol for room participant discovery | +| `iroh-tickets` | Ticket serialization for `RoomTicket` | +| `iroh-smol-kv` | Distributed KV store for room state (gossip-backed) | +| `moq-lite` | Core MoQ protocol: BroadcastProducer, BroadcastConsumer, Track, Group | +| `hang` | Catalog management for broadcast metadata | +| `moq-mux` | MoQ multiplexing | +| `moq-relay` | Relay server implementation (used by iroh-live-relay) | +| `web-transport-iroh` | WebTransport over iroh QUIC connections | +| `n0-future` | Async utilities (FuturesUnordered, AbortOnDropHandle) | +| `n0-watcher` | Watchable/Direct reactive state | + +## License + +Dual-licensed: MIT OR Apache-2.0. Copyright 2025 N0, INC. \ No newline at end of file diff --git a/docs/research/references/iroh/iroh-live/02-core-api.md b/docs/research/references/iroh/iroh-live/02-core-api.md new file mode 100644 index 0000000..e7e36b1 --- /dev/null +++ b/docs/research/references/iroh/iroh-live/02-core-api.md @@ -0,0 +1,167 @@ +# iroh-live: Core API — Live, Call, Subscription, Ticket + +## `Live` — Entry Point + +The primary entry point for all iroh-live operations. Manages an iroh `Endpoint`, the MoQ transport (`Moq`), and optionally a `Gossip` instance for rooms. + +### Construction + +```rust +// Simple: from environment, accept incoming connections +let live = Live::from_env().await?.with_router().spawn(); + +// With gossip for rooms +let live = Live::from_env().await?.with_router().with_gossip().spawn(); + +// From an existing endpoint +let live = Live::builder(endpoint).with_router().with_gossip().spawn(); + +// Manual router mounting (when you have other protocols) +let router = live.register_protocols(Router::builder(endpoint)); +let router = router.accept(other_protocol, other_handler); +let router = router.spawn(); +``` + +### Key Methods + +| Method | Description | +|--------|-------------| +| `publish(name, &LocalBroadcast)` | Register a broadcast for all connected peers | +| `subscribe(remote, name)` | Connect to a peer and subscribe to a broadcast → `Subscription` | +| `subscribe_media(remote, name, audio, config)` | Connect, subscribe, decode → `(MoqSession, MediaTracks)` | +| `join_room(ticket)` | Join a gossip-based multi-party room → `Room` | +| `endpoint()` | Access the underlying iroh `Endpoint` | +| `transport()` | Access the `Moq` transport for advanced operations | +| `gossip()` | Access the `Gossip` instance (if enabled) | +| `shutdown()` | Close all sessions, stop router, close endpoint | + +### Builder Options + +- **`with_router()`** — Spawns an internal `Router` so the endpoint accepts incoming MoQ sessions. Without this, only outbound connections work. +- **`with_gossip()`** — Creates a `Gossip` instance (required for rooms). Internally mounts on the Router if `with_router` is also set. +- **`gossip(gossip)`** — Use an externally-managed `Gossip` instance. + +### Internal Architecture + +`Live` holds: +- `endpoint: Endpoint` — iroh QUIC endpoint +- `moq: Moq` — Internal actor for session/broadcast management +- `gossip: Option` — For room coordination +- `router: Option` — For accepting incoming connections + +The `from_env()` method reads `IROH_SECRET` for the secret key and generates one if not set. It uses the `N0` preset for relay and DNS discovery. + +## `LiveTicket` — Connection Sharing + +A serializable ticket that contains everything needed to connect to a publisher. + +```rust +// Create a ticket +let ticket = LiveTicket::new(endpoint.addr(), "my-stream"); + +// Serialize to URI string (fits in QR codes) +let s = ticket.to_string(); +// → "iroh-live:/my-stream" + +// Deserialize +let parsed: LiveTicket = s.parse()?; + +// With relay URLs for indirect connectivity +let ticket = LiveTicket::new(addr, "stream").with_relay_urls(vec![ + "https://relay.example.com".to_string(), +]); +``` + +**Format:** `iroh-live:/` + +Also supports legacy `name@base32` format for backward compatibility. + +The ticket string is kept short enough for QR codes (< 2000 bytes). It uses postcard (binary) serialization with base64url encoding. + +## `Call` — 1:1 Video Call + +A convenience wrapper over MoQ primitives for bidirectional calls. + +### Flow + +1. One side creates a `LocalBroadcast` with video/audio configured +2. **Dialer:** `Call::dial(live, remote_addr, local_broadcast)` — connects, publishes "call" broadcast, subscribes to remote's "call" broadcast +3. **Acceptor:** `Call::accept(session, local_broadcast)` — accepts an incoming session, publishes and subscribes + +The broadcast name is always `"call"` — this is hardcoded (`CALL_BROADCAST_NAME`). + +```rust +// Dialer side +let local = LocalBroadcast::new(); +local.video().set_source(camera, VideoCodec::H264, [VideoPreset::P720])?; +let call = Call::dial(&live, remote_addr, local).await?; + +// Access remote media +let remote_broadcast = call.remote(); +let video = remote_broadcast.video()?; + +// Wait for call to end +let reason = call.closed().await; +``` + +### Key Properties + +- `call.local()` → `&LocalBroadcast` (your media) +- `call.remote()` → `&RemoteBroadcast` (peer's media) +- `call.signals()` → `watch::Receiver` (for adaptive bitrate) +- `call.close()` — closes with error code 0 and reason "call ended" +- `call.closed()` → waits for close, returns `DisconnectReason` (LocalClose, RemoteClose, TransportError) + +Auto-wires stats recording and network signal production on the connection. + +## `Subscription` — Subscribe Handle + +Created by `Live::subscribe()`. Wraps the MoQ session, remote broadcast, and network signals into a single handle. The constructor auto-wires stats recording and signal production. + +```rust +let sub = live.subscribe(remote_addr, "stream").await?; + +// Access components +sub.session() // &MoqSession +sub.broadcast() // &RemoteBroadcast +sub.signals() // &watch::Receiver + +// Convenience methods +let tracks = sub.media(&audio_backend, Default::default()).await?; +let tracks = sub.media_with_decoders::(&audio_backend, config).await?; + +// Decompose +let (session, broadcast, signals) = sub.into_parts(); +``` + +## `DisconnectReason` + +```rust +pub enum DisconnectReason { + LocalClose, + RemoteClose, + TransportError, +} +``` + +Derived from the QUIC connection's close reason. Used by `Call::closed()`. + +## `util` Module + +### `secret_key_from_env()` + +Loads the iroh secret key from the `IROH_SECRET` environment variable. Generates a new key if not set, printing the hex-encoded key for reuse. + +### `spawn_signal_producer(conn, shutdown)` + +Spawns a background task that polls QUIC connection path stats every 200ms and produces `NetworkSignals` for adaptive rendition selection. Returns a `watch::Receiver`. + +Computes: +- **RTT** — from `selected_path.rtt()` +- **Loss rate** — delta-based (lost packets / (sent + lost) over the interval) +- **Available bandwidth** — estimated from congestion window: `cwnd * 8 / rtt` +- **Congestion events** — monotonically increasing counter + +### `spawn_stats_recorder(conn, net_stats, shutdown)` + +Records connection stats (RTT, loss rate, bandwidth, path type) into `NetStats` for debug overlay display. Runs every 200ms. \ No newline at end of file diff --git a/docs/research/references/iroh/iroh-live/03-iroh-moq-transport.md b/docs/research/references/iroh/iroh-live/03-iroh-moq-transport.md new file mode 100644 index 0000000..6a5aac1 --- /dev/null +++ b/docs/research/references/iroh/iroh-live/03-iroh-moq-transport.md @@ -0,0 +1,164 @@ +# iroh-moq: MoQ Transport Layer + +## Overview + +`iroh-moq` is the transport bridge between iroh's QUIC endpoint and the moq-lite broadcast protocol. It manages connections, session lifecycle, broadcast routing, and subscription handling. This is the only crate in the workspace that directly interacts with QUIC transport — everything above uses `Moq`/`MoqSession` as the interface. + +**ALPN:** `moq-lite-03` + +## Core Types + +### `Moq` — Transport Manager + +The top-level transport entry point. Wraps an iroh `Endpoint` and runs an internal actor (`Actor`) that handles all connection and broadcast management. + +```rust +let moq = Moq::new(endpoint); +``` + +**Internal architecture:** + +`Moq` holds an `mpsc::Sender` to communicate with a spawned actor task. The actor manages: +- A `HashMap` of active sessions +- A `HashMap` of locally published broadcasts +- A `JoinSet` of session tasks (each tracks session lifetime) +- A `FuturesUnordered` of pending connect tasks +- A `broadcast::Sender` for incoming session notifications + +**Key methods:** + +| Method | Description | +|--------|-------------| +| `new(endpoint)` | Creates transport and spawns the actor | +| `protocol_handler()` | Returns `MoqProtocolHandler` for Router registration | +| `publish(name, producer)` | Register a broadcast for all current and future sessions | +| `connect(remote)` | Connect to remote peer, deduplicating existing connections | +| `incoming_sessions()` | Get stream of incoming sessions | +| `published_broadcasts()` | List currently published broadcast names | +| `shutdown()` | Cancel the shutdown token, closing all sessions | + +### `MoqProtocolHandler` + +Implements iroh's `ProtocolHandler` trait. When the Router receives an incoming connection with the `moq-lite-03` ALPN: + +1. Accepts the raw QUIC `Connection` +2. Wraps it in a `web_transport_iroh::Session::raw(connection)` +3. Completes the moq-lite server handshake: `MoqSession::session_accept(wt_session)` +4. Sends the session to the actor via `ActorMessage::HandleSession` + +### `MoqSession` — Single Peer Connection + +Represents a MoQ connection with one remote peer. Created via: +- `Moq::connect()` (outbound, client role) +- `IncomingSession::accept()` (inbound, server role) + +```rust +// Outbound +let session = moq.connect(remote_addr).await?; + +// Inbound +let incoming = incoming_session.next().await?; +let session = incoming.accept(); // or incoming.reject() +``` + +**Internal structure:** + +```rust +pub struct MoqSession { + wt_session: web_transport_iroh::Session, + _moq_session: Arc, + publish: OriginProducer, // For announcing local broadcasts + subscribe: OriginConsumer, // For consuming remote broadcasts +} +``` + +The `OriginProducer`/`OriginConsumer` pair comes from moq-lite. The session creates them before the handshake: + +- **Client (connect):** Creates `OriginProducer` for publish and `OriginConsumer` for subscribe, then `Client::new().with_publish(...).with_consume(...).connect(session)` +- **Server (accept):** Same pattern with `Server::new().with_publish(...).with_consume(...).accept(session)` + +**Key methods:** + +| Method | Description | +|--------|-------------| +| `subscribe(name)` | Wait for remote to announce broadcast, return `BroadcastConsumer` | +| `publish(name, consumer)` | Make a broadcast available to remote peer | +| `conn()` | Reference to underlying QUIC `Connection` (for stats) | +| `remote_id()` | Remote peer's `EndpointId` | +| `close(code, reason)` | Close the session | +| `closed()` | Wait for session to close, returns `SessionError` | +| `origin_producer()` | Direct access to moq-lite publish origin | +| `origin_consumer()` | Direct access to moq-lite subscribe origin | + +### `IncomingSession` / `IncomingSessionStream` + +`IncomingSession` wraps a `MoqSession` that has completed the handshake. Provides: +- `remote_id()` — the connecting peer's identity +- `accept()` — returns the `MoqSession` +- `reject()` — closes with error code 1 + +`IncomingSessionStream` is an async stream that yields `IncomingSession` values. Uses a `broadcast::Receiver` internally, handling lag by skipping missed sessions. + +## Actor Internals + +The `Actor` is the core event loop for the `Moq` transport: + +``` +loop { + select! { + msg = inbox.recv() → handle_message(msg) + session_closed → remove session, log + broadcast_closed → remove from publishing map + connect_completed → handle_session or reply to caller + } +} +``` + +### Message Types + +```rust +enum ActorMessage { + HandleSession { session: Box }, + LocalBroadcast { broadcast_name: String, producer: BroadcastProducer }, + Connect { remote: EndpointAddr, reply: oneshot::Sender<...> }, + GetPublished { reply: oneshot::Sender> }, +} +``` + +### Connection Deduplication + +When `Connect` is received for a peer that already has an active session, the existing session is returned immediately. If a connection attempt is already in progress, the oneshot reply is queued and notified when the connection completes. + +### Broadcast Fan-out + +When a `LocalBroadcast` is published via `Moq::publish()`: +1. The actor stores the `BroadcastProducer` in its `publishing` map +2. It immediately announces the broadcast to all existing sessions by calling `session.publish(name, producer.consume())` on each +3. For future sessions, the actor iterates `publishing` entries and announces each one +4. A `FuturesUnordered` tracks when each broadcast closes, removing it from the map + +### Session Lifecycle + +When a session is established (either incoming or outgoing): +1. All currently published broadcasts are announced to it +2. It's stored in `sessions` by `EndpointId` +3. A session task is spawned that waits for the session to close +4. If there were pending connect requests for this peer, they're fulfilled + +## Error Types + +```rust +enum Error { + Connect(ConnectError), // iroh connection failure + Moq(moq_lite::Error), // MoQ protocol error + Server(web_transport_iroh::ServerError), // WebTransport server error + InternalConsistencyError(LiveActorDiedError), // Actor died + Request(WriteError), // QUIC write error +} + +enum SubscribeError { + NotAnnounced, // Track was not announced + Closed, // Track was closed + SessionClosed(SessionError), // Session closed +} +``` \ No newline at end of file diff --git a/docs/research/references/iroh/iroh-live/04-rooms.md b/docs/research/references/iroh/iroh-live/04-rooms.md new file mode 100644 index 0000000..35656ac --- /dev/null +++ b/docs/research/references/iroh/iroh-live/04-rooms.md @@ -0,0 +1,185 @@ +# iroh-live: Rooms — Multi-Party Coordination + +## Overview + +The `rooms` module provides multi-party room coordination over iroh-gossip. Participants discover each other via a gossip topic, automatically connect and subscribe to each other's broadcasts, and receive `RoomEvent` notifications as peers join, publish, and leave. + +## Core Types + +### `Room` + +The main room handle. Created via `Room::new(live, ticket)`. Spawns an internal actor that manages all peer coordination. + +```rust +// Create a room (generates a random topic) +let ticket = RoomTicket::generate(); +let room = Room::new(&live, ticket.clone()).await?; + +// Or join an existing room +let room = Room::new(&live, existing_ticket).await?; +``` + +**Methods:** +- `recv()` — Wait for next `RoomEvent` +- `try_recv()` — Non-blocking event check +- `ticket()` — Get a ticket that includes this peer as a bootstrap node +- `split()` — Decompose into `(RoomEvents, RoomHandle)` for use in separate tasks +- `publish(name, &LocalBroadcast)` — Publish a broadcast to the room +- `set_chat_publisher(publisher)` — Register a chat publisher +- `send_chat(text)` — Send a chat message + +### `RoomHandle` + +Cloneable handle for publishing into a room. Obtained from `Room::split()`. Can be shared across tasks. + +```rust +let (events, handle) = room.split(); + +// In one task: receive events +while let Some(event) = events.recv().await { + match event { ... } +} + +// In another task: publish +handle.publish("camera", &broadcast).await?; +handle.send_chat("Hello!").await?; +handle.set_display_name("Alice").await?; +``` + +### `RoomTicket` + +```rust +pub struct RoomTicket { + pub bootstrap: Vec, // Bootstrap peer IDs for gossip + pub topic_id: TopicId, // Gossip topic identifier +} +``` + +Serialized via `iroh_tickets` (binary format). Can be created from: +- `RoomTicket::generate()` — Random topic, no bootstrap +- `RoomTicket::new(topic_id, bootstrap)` — Specific topic and peers +- `RoomTicket::new_from_env()` — From `IROH_LIVE_ROOM` or `IROH_LIVE_TOPIC` env vars + +### `RoomEvent` + +```rust +pub enum RoomEvent { + RemoteAnnounced { + remote: EndpointId, + broadcasts: Vec, + }, + BroadcastSubscribed { + session: Box, + broadcast: Box, + }, + PeerJoined { + remote: EndpointId, + display_name: Option, + }, + PeerLeft { + remote: EndpointId, + }, + ChatReceived { + remote: EndpointId, + message: ChatMessage, + }, +} +``` + +## Room Actor — Internal Architecture + +The room actor is a spawned task that manages the gossip KV subscription and coordinates all peer connections. + +### State + +```rust +struct Actor { + me: EndpointId, + _gossip: Gossip, + live: Live, + active_subscribe: HashSet, // (EndpointId, name) pairs + active_publish: HashSet, // Locally published broadcast names + known_peers: HashMap>, // display names + connecting: ConnectingFutures, // In-flight subscribe attempts + subscribe_closed: FuturesUnordered, // Track subscription lifetimes + publish_closed: FuturesUnordered, // Track publish lifetimes + chat_messages: FuturesUnordered, // Active chat subscribers + chat_publisher: Option, + display_name: Option, + event_tx: mpsc::Sender, + kv: iroh_smol_kv::Client, // Distributed KV for peer state + kv_writer: WriteScope, // KV write access +} +``` + +### Gossip KV for Peer Discovery + +The room uses `iroh-smol-kv` over gossip for peer state coordination. Each peer writes their `PeerState` to key `b"s"`: + +```rust +struct PeerState { + broadcasts: Vec, + display_name: Option, +} +``` + +Serialized with postcard (binary format — **no `skip_serializing_if`** allowed since postcard is positional). + +### Event Loop + +``` +loop { + select! { + update = gossip_kv_stream.next() → handle_gossip_update + msg = inbox.recv() → handle_api_message + result = connecting.next() → subscribe succeeded/failed + broadcast_closed → remove from active, maybe emit PeerLeft + publish_closed → remove from active_publish, update KV + chat_message → emit ChatReceived + } +} +``` + +### Peer Discovery Flow + +1. Peer A publishes a broadcast via `handle.publish("camera", &broadcast)` +2. Actor publishes to MoQ AND updates gossip KV with `PeerState { broadcasts: ["camera"], display_name: ... }` +3. Peer B's gossip KV stream receives the update +4. Peer B's actor checks `known_peers` — if new, emits `PeerJoined` +5. Peer B's actor checks `active_subscribe` — if new broadcast, initiates `live.subscribe(remote, name)` +6. When subscription succeeds, Peer B emits `BroadcastSubscribed` +7. If the broadcast has a chat track, a chat subscriber is spawned + +### Chat + +Chat uses a dedicated MoQ track within each broadcast. Each message is a single MoQ group containing one frame of UTF-8 text. The sender identity comes from the broadcast context (peer ID), not the message payload. + +### Connection Lifecycle + +- When a broadcast closes (`subscribe_closed`), it's removed from `active_subscribe` +- If this was the last broadcast from that peer, `PeerLeft` is emitted +- When a publish closes (`publish_closed`), the KV is updated to remove that broadcast + +### `RoomPublisherSync` + +A convenience wrapper for the common pattern of publishing camera+audio and optionally screen share into a room: + +```rust +let publisher = RoomPublisherSync::new(room_handle, audio_backend); +publisher.set_state(&PublishOpts::default())?; +``` + +Automatically publishes a "camera" broadcast and manages a "screen" broadcast when screen sharing is toggled on. + +## API Messages + +```rust +enum ApiMessage { + Publish { name: String, producer: BroadcastProducer }, + SendChat { text: String }, + SetChatPublisher { publisher: ChatPublisher }, + SetDisplayName { name: String }, +} +``` + +These are sent from `RoomHandle` to the actor via an mpsc channel. \ No newline at end of file diff --git a/docs/research/references/iroh/iroh-live/05-relay.md b/docs/research/references/iroh/iroh-live/05-relay.md new file mode 100644 index 0000000..a6f1bfc --- /dev/null +++ b/docs/research/references/iroh/iroh-live/05-relay.md @@ -0,0 +1,105 @@ +# iroh-live-relay: Browser Bridging + +## Overview + +The relay server bridges iroh P2P streams to browser clients via WebTransport. Browsers cannot speak iroh's QUIC protocol directly, so the relay accepts WebTransport connections and either serves locally-published broadcasts or pulls them from remote iroh publishers on demand. + +**Architecture:** + +``` +iroh-live publish --(iroh P2P)--> iroh-live-relay <--(WebTransport)-- browser +browser --(WebTransport)--> iroh-live-relay --(iroh P2P)--> iroh-live subscribe +``` + +## Components + +### `RelayConfig` (CLI Configuration) + +```rust +pub struct RelayConfig { + pub bind: SocketAddr, // QUIC/WebTransport bind (default: [::]:4443) + pub http_bind: SocketAddr, // HTTP static files bind (default: same as bind) +} +``` + +Flattenable into a clap CLI via `#[command(flatten)]`. + +### `run(config)` — Main Server Loop + +The main entry point. Sets up: + +1. **QUIC/WebTransport server** — Uses `moq-native::ServerConfig` with: + - QUIC backend: `noq` (a custom QUIC implementation) + - iroh endpoint integration + - Self-signed TLS certificates (dev mode) for `localhost` + - Max streams: `moq_relay::DEFAULT_MAX_STREAMS` + +2. **iroh endpoint** — Binds an iroh endpoint for P2P connectivity, prints its ID + +3. **moq-relay Cluster** — The broadcast routing engine. Manages broadcast lifecycle: when all subscribers disconnect, the broadcast is removed. + +4. **HTTP server** — Axum router serving: + - `GET /certificate.sha256` — TLS fingerprint for dev mode + - `GET /` — Web viewer landing page + - `GET /{path}` — Static file serving with CORS + - Embedded via `include_dir!` from `web/dist/` + +5. **Pull mode** — If iroh endpoint is available, creates a `PullState` for on-demand remote broadcast fetching + +6. **Connection loop** — Accepts incoming connections, parses the URL path as a `LiveTicket`, and if valid, triggers a pull before running the connection + +### `PullState` — On-Demand Remote Fetching + +When a browser connects with a broadcast name that is a valid `LiveTicket`, the relay: + +1. Checks if the broadcast already exists in the cluster (fast path) +2. If not, connects to the remote publisher via iroh-live's `Moq::connect()` +3. Subscribes to the remote broadcast +4. Publishes the consumer into the local cluster under the ticket string as the name +5. Spawns a keepalive task that holds the session until it closes + +**Concurrency:** Duplicate concurrent pulls for the same ticket are deduplicated using a `HashMap>`. Waiters block on the `Notify` until the first connector finishes. + +```rust +pub(crate) struct PullState { + live: iroh_live::Live, + cluster: Cluster, + connecting: Arc>>>>, +} +``` + +### Web Viewer + +The relay embeds a SolidJS + TypeScript web application compiled by Vite. It uses: +- `@moq/watch` — Web component for watching streams via WebCodecs +- `@moq/publish` — Web component for publishing from browser camera/mic +- WebTransport — For QUIC connectivity from the browser + +Watch URLs: `https://relay:4443/?name=` + +### Data Directory + +The relay persists data to `$IROH_LIVE_RELAY_DATA` (or the platform default). This includes: +- iroh secret key (`iroh_secret_key`) — ensures endpoint ID stability across restarts +- TLS certificates + +### TLS and Certificates + +Currently **self-signed only**. ACME/Let's Encrypt is planned but not implemented. In dev mode, browsers need `--ignore-certificate-errors` or the relay's fingerprint (served at `/certificate.sha256`) for WebTransport to work. + +## Error Handling + +No authentication is implemented yet. The relay accepts all connections. MoQ supports token-based authentication which could be added. + +## CLI Binary + +```rust +// iroh-live-relay/src/main.rs +#[derive(Parser)] +struct Cli { + #[command(flatten)] + relay: RelayConfig, +} +``` + +Must call `rustls::crypto::aws_lc_rs::default_provider().install_default()` before `run()`. \ No newline at end of file diff --git a/docs/research/references/iroh/iroh-live/06-moq-media-pipelines.md b/docs/research/references/iroh/iroh-live/06-moq-media-pipelines.md new file mode 100644 index 0000000..da453b4 --- /dev/null +++ b/docs/research/references/iroh/iroh-live/06-moq-media-pipelines.md @@ -0,0 +1,304 @@ +# moq-media: Media Pipelines + +## Overview + +`moq-media` owns the media pipeline: broadcast management, codec orchestration, playout timing, adaptive bitrate, and audio backend. **It has no dependency on iroh** — it works with any transport that implements `PacketSource` and `PacketSink`. This makes it usable for recording pipelines, studio links, and camera dashboards without RTC. + +## Module Structure + +``` +moq-media/ +├── lib.rs — Re-exports and feature-gated modules +├── publish.rs — LocalBroadcast, VideoPublisher, AudioPublisher +├── subscribe.rs — RemoteBroadcast, VideoTrack, AudioTrack, MediaTracks +├── transport.rs — PacketSource/PacketSink traits, MoqPacketSource, MoqPacketSink +├── net.rs — NetworkSignals (RTT, loss rate, available bandwidth) +├── adaptive.rs — Adaptive rendition switching algorithm +├── playout.rs — PlaybackPolicy, SyncMode +├── chat.rs — ChatPublisher, ChatSubscriber (MoQ track-based) +├── frame_channel.rs — Single-frame channel (last-writer-wins for video) +├── sync.rs — Shared playout clock (Sync) for A/V sync +├── stats.rs — Metric, Label, NetStats, EncodeStats, RenderStats, etc. +├── pipeline.rs — Pipeline orchestration +├── pipeline/ — VideoEncoderPipeline, AudioEncoderPipeline, VideoDecoderPipeline, etc. +├── audio_backend.rs — AudioBackend trait and device enumeration +├── audio_backend/ — Platform-specific audio backends (cpal, etc.) +├── capture.rs — Camera/screen capture integration +├── source_spec.rs — VideoInput, PreEncodedTrack +├── test_util.rs — Test utilities (feature-gated) +└── processing/ — Scale, color conversion, etc. +``` + +## Publish Pipeline — `LocalBroadcast` + +`LocalBroadcast` manages encoder pipelines and publishes a catalog that subscribers use to discover available renditions. It owns a `BroadcastProducer` (from moq-lite) and coordinates video and audio track lifecycles. + +### Construction + +```rust +let broadcast = LocalBroadcast::new(); +broadcast.video().set_source(camera, VideoCodec::H264, [VideoPreset::P720])?; +broadcast.audio().set(mic, AudioCodec::Opus, [AudioPreset::Hq])?; + +// Or pre-encoded sources +broadcast.video().set(VideoInput::pre_encoded("video/h264-pi", config, factory))?; +``` + +### Slot Handles + +- `broadcast.video()` → `VideoPublisher` (borrows `&self`) +- `broadcast.audio()` → `AudioPublisher` (borrows `&self`) + +Both use interior mutability. Calling `set()` tears down any existing pipeline and installs the new one. + +### Video Input Modes + +```rust +pub enum VideoInput { + Renditions(VideoRenditions), // Raw source → multiple encoded renditions (simulcast) + PreEncoded(Vec), // Already-encoded tracks pass through +} +``` + +**`VideoRenditions`** holds a `SharedVideoSource` and a map of rendition names to encoder factories. Multiple renditions share the same source via `watch::Receiver>`. Slow encoders never cause backpressure on the source — intermediate frames are silently skipped. + +**`PreEncodedTrack`** is for hardware encoders that produce compressed output directly (e.g., rpicam-vid on Raspberry Pi). Each track carries a name, `VideoConfig`, and a factory closure that creates a fresh source per subscriber. + +### SharedVideoSource + +Runs the capture source on a dedicated OS thread. Parks when no subscribers are connected (releasing camera/screen resources) and unparks when the first subscriber arrives. Uses `AtomicU32` subscriber counting with proper memory ordering (`AcqRel`/`Acquire`). + +Frames are distributed via `watch::Sender>` — always contains the latest frame, so slow encoders never block the source. + +### Demand-Driven Track Startup + +The broadcast's run loop (`LocalBroadcast::run_dynamic`) calls `producer.requested_track().await` to wait for subscriber demand. When a subscriber requests a specific rendition: + +1. The loop looks up the rendition in the current `VideoInput` or `AudioRenditions` +2. It starts the corresponding encoder pipeline on a dedicated OS thread +3. When all subscribers disconnect (tracked via `track.unused().await`), the pipeline is stopped + +This means encoder threads only run when someone is actually consuming. + +### Catalog + +`LocalBroadcast` maintains a catalog track (hang's built-in catalog mechanism) listing all available video and audio renditions with codec configuration, dimensions, and bitrate. Updated whenever video or audio is set/cleared. + +Catalog format follows the `hang::catalog::Catalog` structure with `Video` and `Audio` entries, each containing a `BTreeMap` of rendition names to configurations. + +### Encoder Pipeline Architecture + +All encoder pipelines run on **dedicated OS threads** (`spawn_thread`), not tokio tasks. Codec operations are CPU-intensive and sometimes block on hardware (VAAPI, V4L2), so running on tokio tasks would starve other async work. + +Communication with the async runtime: +- **VideoEncoderPipeline**: reads `SharedVideoSource` via `watch::Receiver`, writes encoded frames to `MoqPacketSink` +- **AudioEncoderPipeline**: reads from `AudioSource`, writes to `MoqPacketSink` +- **PreEncodedVideoPipeline**: reads from `PreEncodedVideoSource`, writes to `MoqPacketSink` + +### Chat + +```rust +let chat_publisher = broadcast.enable_chat()?; +chat_publisher.send("Hello!")?; + +// Subscriber side +if let Some(chat_sub) = remote_broadcast.chat() { + let msg = chat_sub.recv().await; +} +``` + +Each chat message is a single MoQ group with one frame of UTF-8 text. The track name is `"chat"` with priority 10. + +## Subscribe Pipeline — `RemoteBroadcast` + +`RemoteBroadcast` wraps a `BroadcastConsumer` and watches its catalog for available video and audio renditions. Created with a `BroadcastConsumer` and a `PlaybackPolicy`. + +### Construction + +```rust +let broadcast = RemoteBroadcast::new("stream-name", consumer).await?; +// Or with explicit policy +let broadcast = RemoteBroadcast::with_playback_policy("stream", consumer, policy).await?; +``` + +On construction, spawns a catalog-watching task that publishes snapshots via `Watchable`. + +### `CatalogSnapshot` + +Point-in-time view of the broadcast's catalog. Derefs to `hang::Catalog`. Carries a sequence number for change detection. + +```rust +let catalog = broadcast.catalog(); +catalog.video_renditions() // Iterator of rendition names sorted by width +catalog.audio_renditions() // Iterator of audio rendition names +catalog.select_video_rendition(Quality::High)? // Best match for quality +catalog.has_video() +catalog.has_audio() +catalog.has_chat() +catalog.user() // User metadata from publisher +``` + +### Rendition Selection + +```rust +pub enum Quality { Highest, High, Mid, Low } + +pub struct VideoTarget { + pub max_pixels: Option, + pub max_bitrate_kbps: Option, + pub rendition: Option, // Pin to specific rendition +} +``` + +`Quality::High` → `max_pixels(1280*720)`, etc. If `rendition` is set, it takes priority. + +### VideoTrack + +Represents a decoded video stream from a remote broadcast. The decoder runs on a dedicated OS thread. + +**Creation flow:** + +1. Pick a rendition (via `VideoTarget` or explicit name) +2. Create `TrackConsumer` from `BroadcastConsumer`, wrap in `OrderedConsumer` with `PlaybackPolicy::max_latency` +3. Wrap in `MoqPacketSource` +4. A `forward_packets` async task reads from `MoqPacketSource` → `mpsc` channel +5. Decoder thread reads `mpsc` → decoder → output via `Sync` playout clock (or `FramePacer`) +6. Output channel: `FrameReceiver` (latest-frame wins, suitable for rendering) + +**Frame access:** +- `track.try_recv()` — Returns latest frame, draining older buffered frames (for game loops) +- `track.next_frame().await` — Async wait for next frame +- `track.has_frame()` — Check without consuming + +**Adaptive rendition switching:** +```rust +track.enable_adaptation(broadcast, signals, config, decode_config)?; +track.disable_adaptation(); +track.is_adaptive(); +track.selected_rendition(); +track.set_rendition_mode(RenditionMode::Fixed("video/h264-360p".into())); +track.set_rendition_mode(RenditionMode::Auto); +track.rendition_watcher(); // Direct watcher for rendition changes +``` + +### AudioTrack + +Same pattern as `VideoTrack` but sends decoded samples to an `AudioSink` (typically cpal + sonora). The audio decoder thread runs a 10ms tick loop. + +### MediaTracks + +Convenience struct combining `RemoteBroadcast` with optional `VideoTrack` and `AudioTrack`: + +```rust +pub struct MediaTracks { + pub broadcast: RemoteBroadcast, + pub video: Option, + pub audio: Option, +} +``` + +### Lifecycle + +Both `VideoTrack` and `AudioTrack` use drop-based cleanup. Dropping cancels the decoder thread (via `CancellationToken`) and the `forward_packets` task (via `AbortOnDropHandle`). The `OrderedConsumer` is dropped, signaling the transport that the track is no longer needed. + +## Transport Abstraction — `PacketSource` / `PacketSink` + +The transport boundary between moq-media and the network: + +```rust +pub trait PacketSource: Send + 'static { + fn read(&mut self) -> impl Future>> + Send; +} + +pub trait PacketSink: Send + 'static { + fn write(&mut self, packet: EncodedFrame) -> Result<()>; + fn finish(&mut self) -> Result<()>; +} +``` + +**`MoqPacketSink`** wraps an `OrderedProducer`. When it receives an `EncodedFrame` with `is_keyframe = true`, it calls `keyframe()` on the producer to start a new MoQ group. This keyframe-to-group mapping is how subscribers can join at any group boundary. + +**`MoqPacketSource`** wraps an `OrderedConsumer` and reads frames, converting them to `MediaPacket`. + +**`PipeSink` / `PipeSource`** — In-memory pipe for local encode→decode without network (testing, local preview). + +## Adaptive Rendition Switching + +The adaptation algorithm runs in a background task that monitors `NetworkSignals` and decides whether to switch to a different video rendition. + +### Algorithm + +Renditions are ranked by pixel count (highest first). The algorithm maintains state across ticks: + +```rust +pub enum Decision { + Hold, // Stay on current rendition + Downgrade(usize), // Switch to lower at index + Emergency, // Drop to lowest immediately + StartProbe(usize), // Try upgrading to index +} +``` + +**Emergency** (immediate): Loss rate ≥ 20% → drop to lowest rendition + +**Downgrade** (sustained 500ms): Loss rate ≥ 10% OR available bandwidth < 85% of current rendition's bitrate + +**Upgrade probe** (sustained 4s good conditions): Loss ≤ 2%, bandwidth ≥ 120% of next-higher rendition's bitrate → start 3-second probe on the higher rendition + +**Probe abort**: Loss ≥ 5% or new congestion events during probe → abort, 8s cooldown + +**Post-downgrade cooldown**: 4s after any downgrade before probes are allowed + +### Implementation + +The adaptation task (`adaptation_task_v2`) creates new `VideoDecoderPipeline`s that write to the same `FrameSender` via `with_sender()`. The frame channel stays the same while the underlying decoder pipeline gets swapped. When switching: + +1. Create a new decoder pipeline for the target rendition +2. Drop the old pipeline handle +3. Update `selected_rendition` Watchable + +## Playback and Sync + +### PlaybackPolicy + +```rust +pub struct PlaybackPolicy { + pub sync: SyncMode, // Synced (shared clock) or Unmanaged (PTS pacing) + pub max_latency: Duration, // Default: 150ms — how much buffering before skipping forward +} +``` + +### SyncMode + +- **`Synced`** (default): Shared playout clock (`Sync`). Video frames are gated by `Sync::wait(pts)`, which blocks until `reference + pts + latency` arrives. Audio paces itself through its ring buffer (~80ms). +- **`Unmanaged`**: No synchronization. `FramePacer` sleeps between frames based on PTS deltas, clamped to 2× frame period. + +### Sync + +The `Sync` type records arrival offsets via `received(pts)` and blocks on `wait(pts)` until `reference + pts + latency`. This keeps audio and video aligned without cross-path gating or signaling. Ported from the moq/js implementation. + +## Stats + +moq-media has a structured stats system for debug overlays: + +- **`NetStats`** — RTT, loss%, bandwidth, path type (written by iroh-live transport bridge) +- **`EncodeStats`** — FPS, encode time, bitrate, codec, encoder, resolution, capture path +- **`RenderStats`** — FPS, decode time, decoder, renderer, rendition +- **`TimingStats`** — Audio buffer level, video/audio lag, A/V delta, video buffer depth +- **`Timeline`** — Ring buffer of `FrameMeta` entries for timeline visualization + +Each `Metric` has EMA smoothing, a history ring buffer, and optional color thresholds. `Label` provides atomic string values. + +## Codec Support + +Feature-gated codec support: + +| Feature | Codec | Backend | +|---------|-------|---------| +| `h264` | H.264 | openh264 (software) | +| `av1` | AV1 | rav1e encoder, rav1d decoder | +| `opus` | Opus | opus crate | +| `vaapi` | VAAPI | Linux hardware encode/decode | +| `videotoolbox` | VideoToolbox | macOS hardware | +| `v4l2` | V4L2 | Raspberry Pi hardware | +| `pcm` | Raw PCM | No encoding | \ No newline at end of file diff --git a/docs/research/references/iroh/iroh-live/07-network-signals-and-adaptive-bitrate.md b/docs/research/references/iroh/iroh-live/07-network-signals-and-adaptive-bitrate.md new file mode 100644 index 0000000..4530a6a --- /dev/null +++ b/docs/research/references/iroh/iroh-live/07-network-signals-and-adaptive-bitrate.md @@ -0,0 +1,95 @@ +# iroh-live: Network Signals and Adaptive Bitrate + +## NetworkSignals + +Produced by polling iroh QUIC connection stats. Consumed by `VideoTrack::enable_adaptation()` to decide when to switch video renditions. + +```rust +pub struct NetworkSignals { + pub rtt: Duration, // Round-trip time to remote peer + pub loss_rate: f64, // Recent packet loss rate (0.0..=1.0), 200ms delta window + pub available_bps: u64, // Estimated available bandwidth (cwnd * 8 / rtt) + pub congestion_events: u64, // Monotonically increasing congestion counter +} +``` + +### Production + +`spawn_signal_producer()` in `iroh-live/src/util.rs` polls every 200ms: + +1. Gets connection paths via `conn.paths().get()` +2. Finds the selected path (`is_selected()`) +3. Reads path stats (`lost_packets`, `udp_tx.datagrams`, `cwnd`) and RTT +4. Computes delta-based loss rate: `delta_lost / (delta_sent + delta_lost)` +5. Estimates bandwidth: `cwnd * 8 * 1e9 / rtt_ns` +6. Writes to `watch::Sender` + +Also: `spawn_stats_recorder()` records into `NetStats` for the debug overlay (RTT, loss%, bandwidth in/out, path type). + +## Adaptive Rendition Algorithm + +Located in `moq-media/src/adaptive.rs`. The algorithm evaluates `NetworkSignals` against configured thresholds and produces `Decision` values. + +### Configuration (`AdaptiveConfig`) + +| Parameter | Default | Description | +|-----------|---------|-------------| +| `upgrade_hold` | 4s | Sustained good conditions before upgrade probe | +| `downgrade_hold` | 500ms | Sustained bad conditions before downgrade | +| `probe_duration` | 3s | How long a probe runs before committing | +| `probe_cooldown` | 8s | Cooldown after a failed probe | +| `post_downgrade_cooldown` | 4s | Cooldown after any downgrade | +| `loss_downgrade` | 10% | Loss rate threshold for downgrade | +| `loss_emergency` | 20% | Loss rate for immediate drop to lowest | +| `loss_good` | 2% | Loss rate considered "good" | +| `loss_probe_abort` | 5% | Loss rate that aborts an active probe | +| `bw_downgrade_ratio` | 85% | Bandwidth utilization ceiling for downgrade | +| `bw_probe_headroom` | 120% | Required excess bandwidth for probe | +| `check_interval` | 200ms | How often adaptation task checks signals | + +### Decision Logic + +``` +1. Emergency: loss >= 20% AND not already lowest → Drop to lowest immediately + +2. Downgrade check: + - bandwidth_stressed (available < current_bitrate * 85%) OR loss >= 10% + - sustained for downgrade_hold (500ms) → Downgrade(next_lower) + +3. Upgrade check: + - Already at highest → Hold + - Within post_downgrade_cooldown (4s) → Hold + - Within probe_cooldown (8s) → Hold + - bandwidth_headroom (available >= next_higher_bitrate * 120%) AND loss <= 2% + - sustained for upgrade_hold (4s) → StartProbe(next_higher) + +4. Otherwise: Hold +``` + +### Probe Lifecycle + +When `StartProbe(idx)` is decided: +1. Create a new decoder pipeline for the higher rendition +2. Write frames to the same `FrameSender` (seamless switch for the consumer) +3. Monitor signals during the probe period +4. If `should_abort_probe()` (loss ≥ 5% or new congestion events) → abort, drop probe pipeline, cooldown 8s +5. If probe duration (3s) passes without abort → commit, replace current pipeline + +### Rendition Ranking + +```rust +pub fn rank_renditions(renditions: &BTreeMap) -> Vec +``` + +Sorts by pixel count descending (highest quality = index 0). Each `RankedRendition` carries name, pixels, bitrate_bps, width, height. + +### RenditionMode + +```rust +pub enum RenditionMode { + Auto, // Algorithm-driven switching + Fixed(String), // Pin to a specific rendition +} +``` + +Controlled via `VideoTrack::set_rendition_mode()`. In Fixed mode, the algorithm switches directly to the named rendition without probing. \ No newline at end of file diff --git a/docs/research/references/iroh/iroh-live/08-p2p-and-relay.md b/docs/research/references/iroh/iroh-live/08-p2p-and-relay.md new file mode 100644 index 0000000..e9e4b5e --- /dev/null +++ b/docs/research/references/iroh/iroh-live/08-p2p-and-relay.md @@ -0,0 +1,85 @@ +# iroh-live: P2P Connectivity and Relay Architecture + +## Direct Connectivity + +iroh connects peers directly when possible: + +- **Same LAN:** Communicates over the local network without traffic leaving the subnet +- **Public IP / simple NAT:** iroh's hole-punching establishes a direct UDP path +- **Symmetric NAT / corporate firewalls / CGNAT:** Falls back to iroh relay network + +The iroh endpoint exposes path statistics via `conn.paths()`, which returns a `Watcher`. Each `PathInfo` reports RTT, whether the path is selected, and the remote address. The selected path is the one actively carrying traffic; iroh may maintain multiple candidate paths and switch between them. + +The transition between direct and relayed paths is transparent to the application. The media pipeline sees only changes in RTT and bandwidth, which adaptive rendition switching handles automatically. + +## iroh-live-relay: Architecture + +The relay serves two transport protocols simultaneously: + +``` +iroh P2P publisher ──(QUIC, moq-lite-03)──> iroh-live-relay <──(WebTransport/H3, noq)── browser +``` + +Both protocols feed into `moq-relay`'s shared `Origin`, which manages broadcast routing. A broadcast published via iroh is automatically available to WebTransport subscribers, and vice versa. + +### Pull Model + +The relay operates in **pull mode**: it connects to iroh publishers on demand when a browser client requests a broadcast. The broadcast name in the URL can be a `LiveTicket` URI. Multiple browser clients watching the same broadcast share a single upstream iroh connection. + +Pull flow: +1. Browser connects via WebTransport, requests broadcast by name (or ticket) +2. Relay checks if broadcast already exists in local cluster → fast path +3. If not, relay uses iroh-live `Moq::connect()` to connect to the remote publisher +4. Subscribes to the broadcast via `session.subscribe(broadcast_name)` +5. Publishes the consumer into the local cluster under the ticket string as the name +6. Spawns a keepalive task holding the session until it closes +7. Browser receives the stream through the relay's WebTransport frontend + +### Connection Deduplication + +`PullState` uses a `HashMap>` to prevent duplicate concurrent connections to the same remote. If a pull is already in progress for a given ticket, subsequent requests wait on the `Notify` and then check if the broadcast appeared in the cluster. + +### QUIC Backend: noq + +The relay uses `noq` as its QUIC backend (not quinn). This is configured via: + +```rust +server_config.backend = Some(moq_native::QuicBackend::Noq); +``` + +### iroh Endpoint Integration + +The relay also binds an iroh endpoint: + +```rust +let mut iroh_config = moq_native::IrohEndpointConfig::default(); +iroh_config.enabled = Some(true); +iroh_config.secret = Some(relay.iroh_secret_path_str()); +let iroh = iroh_config.bind().await?; +``` + +This enables the relay to participate in the iroh P2P network directly. + +## Ticket Format + +`LiveTicket` serves as the connection mechanism for both P2P and relay scenarios: + +- **P2P:** Subscriber uses the `EndpointAddr` (node ID + relay URLs) to connect directly +- **Relay:** The full ticket string becomes the broadcast name in the URL: `https://relay:4443/?name=iroh-live:...` + +The ticket format: `iroh-live:/` + +It also supports a legacy format: `@` + +## Connection Access in iroh-moq + +`MoqSession::conn()` returns a reference to the underlying iroh `Connection`. This is used by: + +1. **Signal producer** — Polls path stats for `NetworkSignals` +2. **Stats recorder** — Records into `NetStats` for debug overlays +3. **Call::closed()** — Inspects QUIC close reason to determine `DisconnectReason` + +The connection provides: +- `paths().get()` — List of active network paths with RTT, stats, relay status +- `close_reason()` — Why the connection closed (LocallyClosed, ApplicationClosed, ConnectionClosed, Reset) +- `remote_id()` — Remote peer's endpoint ID \ No newline at end of file diff --git a/docs/research/references/iroh/iroh-live/README.md b/docs/research/references/iroh/iroh-live/README.md new file mode 100644 index 0000000..6590cfc --- /dev/null +++ b/docs/research/references/iroh/iroh-live/README.md @@ -0,0 +1,42 @@ +# iroh-live Reference Documentation + +> **Status:** Early tech preview. APIs are unstable. Based on source code analysis of the iroh-live workspace. + +## Files + +| File | Topic | +|------|-------| +| [01-overview-and-architecture](01-overview-and-architecture.md) | Workspace structure, crate layers, design principles, data flow, dependencies | +| [02-core-api](02-core-api.md) | `Live`, `LiveTicket`, `Call`, `Subscription`, `DisconnectReason`, `util` module | +| [03-iroh-moq-transport](03-iroh-moq-transport.md) | `Moq`, `MoqSession`, `MoqProtocolHandler`, actor internals, session lifecycle, error types | +| [04-rooms](04-rooms.md) | `Room`, `RoomHandle`, `RoomTicket`, `RoomEvent`, gossip KV coordination, actor architecture | +| [05-relay](05-relay.md) | `iroh-live-relay`: browser bridging, pull model, `RelayConfig`, `PullState`, web viewer | +| [06-moq-media-pipelines](06-moq-media-pipelines.md) | `LocalBroadcast`, `RemoteBroadcast`, `VideoTrack`, `AudioTrack`, transport abstraction, codec support | +| [07-network-signals-and-adaptive-bitrate](07-network-signals-and-adaptive-bitrate.md) | `NetworkSignals`, adaptation algorithm, `AdaptiveConfig`, `Decision`, probe lifecycle | +| [08-p2p-and-relay](08-p2p-and-relay.md) | iroh P2P connectivity, relay architecture, pull model, ticket format, connection access | + +## Quick Navigation + +### "How do I..." + +- **Publish a stream?** → [02-core-api](02-core-api.md) (`Live::publish`) + [06-moq-media-pipelines](06-moq-media-pipelines.md) (`LocalBroadcast`) +- **Subscribe to a stream?** → [02-core-api](02-core-api.md) (`Live::subscribe`) + [06-moq-media-pipelines](06-moq-media-pipelines.md) (`RemoteBroadcast`) +- **Make a 1:1 call?** → [02-core-api](02-core-api.md) (`Call::dial` / `Call::accept`) +- **Create a multi-party room?** → [04-rooms](04-rooms.md) (`Room::new`, `RoomTicket`) +- **Bridge to browsers?** → [05-relay](05-relay.md) (`iroh-live-relay`) +- **Adapt quality to network conditions?** → [07-network-signals-and-adaptive-bitrate](07-network-signals-and-adaptive-bitrate.md) +- **Understand the MoQ transport?** → [03-iroh-moq-transport](03-iroh-moq-transport.md) +- **Understand the media pipeline?** → [06-moq-media-pipelines](06-moq-media-pipelines.md) + +### Key Source Files + +| Component | Path | +|-----------|------| +| iroh-live crate | `iroh-live/src/{lib, live, call, subscription, ticket, types, util, rooms}.rs` | +| iroh-moq crate | `iroh-moq/src/lib.rs` | +| iroh-live-relay | `iroh-live-relay/src/{lib, main, pull}.rs` | +| moq-media publish | `moq-media/src/publish.rs` | +| moq-media subscribe | `moq-media/src/subscribe.rs` | +| moq-media adaptive | `moq-media/src/adaptive.rs` | +| moq-media transport | `moq-media/src/transport.rs` | +| moq-media network signals | `moq-media/src/net.rs` | \ No newline at end of file diff --git a/docs/research/references/iroh/iroh/01-overview-architecture.md b/docs/research/references/iroh/iroh/01-overview-architecture.md new file mode 100644 index 0000000..86e48c2 --- /dev/null +++ b/docs/research/references/iroh/iroh/01-overview-architecture.md @@ -0,0 +1,160 @@ +# Iroh: Overview & Architecture + +**Version**: 0.98.1 +**Repository**: https://github.com/n0-computer/iroh +**License**: MIT OR Apache-2.0 +**Rust Edition**: 2024 +**MSRV**: 1.89 + +## What is Iroh? + +Iroh is a Rust library for establishing **peer-to-peer QUIC connections dialed by public key**. You provide an `EndpointAddr` (which identifies a peer), and iroh finds and maintains the fastest connection route — whether direct (hole-punched) or relayed through a server. + +Core value propositions: +- **Dial by public key** — no IP addresses or hostnames needed at the application layer +- **Hole-punching** — automatically attempts direct P2P connectivity +- **Relay fallback** — encrypted relay servers ensure connectivity even behind NATs +- **Built on QUIC** — uses the `noq` QUIC implementation for multiplexed, encrypted streams +- **Address Lookup** — pluggable discovery system to resolve `EndpointId → addressing info` + +## Workspace Structure + +``` +iroh/ # Core library (p2p QUIC connections) +├── iroh-base/ # Fundamental types: SecretKey, PublicKey, EndpointId, RelayUrl, EndpointAddr +├── iroh-dns/ # DNS resolver + endpoint info serialization (pkarr) +├── iroh-dns-server/ # DNS server implementation (powers dns.iroh.link) +├── iroh-relay/ # Relay server + client implementation +└── iroh/bench/ # Benchmarks +``` + +### Dependency Graph + +``` +iroh depends on: + ├── iroh-base (key types, EndpointAddr, RelayUrl) + ├── iroh-dns (DNS resolution, EndpointInfo serialization) + ├── iroh-relay (RelayMap, RelayConfig, relay client/server, QUIC client) + ├── noq (QUIC implementation) + ├── noq-proto (QUIC protocol types) + ├── noq-udp (UDP socket abstraction) + ├── netwatch (network interface monitoring) + ├── portmapper (UPnP/PCP/NAT-PMP port mapping, optional) + ├── n0-future (async utilities) + ├── n0-watcher (watch/subscribe primitives) + └── iroh-metrics (metrics collection) +``` + +## Key Concepts + +### EndpointId / PublicKey +Every iroh endpoint has a unique Ed25519 cryptographic key pair. The public key doubles as the endpoint identifier (`EndpointId`). It's used for both: +- **Identity** — unique addressing in the network +- **Encryption** — TLS authentication (via RFC 7250 Raw Public Keys, no X.509 certificates) + +### EndpointAddr +The addressing structure that combines identity with network paths: +```rust +pub struct EndpointAddr { + pub id: EndpointId, // Who to connect to + pub addrs: BTreeSet, // How to reach them +} + +pub enum TransportAddr { + Relay(RelayUrl), // Via relay server + Ip(SocketAddr), // Direct IP address + Custom(CustomAddr), // Via custom transport +} +``` + +### Relay Servers +Relay servers provide: +1. **Reliable connectivity** — always reachable, forward encrypted traffic to the correct endpoint by `EndpointId` +2. **Hole-punching assistance** — QUIC Address Discovery (QAD), STUN-like services +3. **Traffic relay** — fallback when direct connections are impossible + +Connections to relays use HTTP/1.1 with TLS, then upgrade to a custom protocol. The relay only sees encrypted traffic. + +### Connection Flow +1. Endpoint binds, connects to a "home relay" +2. To connect to peer: resolve `EndpointId` → `EndpointAddr` via Address Lookup +3. Establish initial connection via relay +4. Attempt direct connection (hole-punching if needed) +5. Migrate to direct connection when available (relay becomes backup) + +## Crate: `iroh` (Core Library) + +### Main Types +| Type | Module | Purpose | +|------|--------|---------| +| `Endpoint` | `endpoint` | Central API — connect, accept, manage connections | +| `Builder` | `endpoint` | Configure and construct an `Endpoint` | +| `Router` | `protocol` | Accept loop that dispatches to `ProtocolHandler`s | +| `ProtocolHandler` | `protocol` | Trait for handling incoming connections by ALPN | +| `Connection` | `endpoint::connection` | QUIC connection wrapper | +| `Incoming` | `endpoint::connection` | Pre-handshake incoming connection | +| `Accepting` | `endpoint::connection` | Post-accept, pre-handshake state | + +### Feature Flags +- `default` = `["metrics", "fast-apple-datapath", "portmapper", "tls-ring"]` +- `metrics` — Prometheus-style metrics collection +- `portmapper` — UPnP/PCP/NAT-PMP support +- `test-utils` — Testing utilities +- `platform-verifier` — Use OS TLS trust anchors +- `qlog` — QUIC event logging +- `fast-apple-datapath` — Private Apple APIs for batched sends +- `tls-ring` / `tls-aws-lc-rs` — Choose TLS crypto backend +- `unstable-custom-transports` — Custom transport API (unstable) + +### WASM Support +The crate compiles to `wasm32-unknown-unknown` for browser targets. Browser builds: +- Use `PkarrResolver` instead of `DnsAddressLookup` (DNS-over-HTTPS) +- Cannot bind IP sockets (no direct connectivity) +- Use `wasm-bindgen-futures` for async runtime + +## Presets + +The `presets` module provides common configurations: + +| Preset | Description | +|--------|-------------| +| `Empty` | No defaults — you must set all required options yourself | +| `Minimal` | Sets only the crypto provider (ring or aws-lc-rs) | +| `N0` | Full n0 defaults: crypto provider, Pkarr publisher, DNS resolver, n0 relay servers | +| `N0DisableRelay` | N0 defaults but with `RelayMode::Disabled` | + +```rust +// Quick start with full n0 infrastructure +let endpoint = Endpoint::bind(presets::N0).await?; + +// Minimal — just crypto, no relay or address lookup +let endpoint = Endpoint::bind(presets::Minimal).await?; +``` + +## Encryption & Authentication + +Iroh uses **RFC 7250 Raw Public Keys** for TLS — no X.509 certificates. Each endpoint has: +- `SecretKey` (Ed25519) — used for TLS authentication and signing +- `PublicKey`/`EndpointId` — derived from `SecretKey`, used as identity + +The TLS server name is encoded as `.iroh.invalid` to ensure 0-RTT session ticket separation per endpoint. + +## 0-RTT Support + +Iroh supports QUIC 0-RTT connections: +- `Connecting::into_0rtt()` on the client side +- `Accepting::into_0rtt()` on the server side +- TLS session tickets cached per remote endpoint (default 256 tickets = ~150 KiB) +- `max_tls_tickets()` builder option to tune cache size + +## Default Infrastructure (n0) + +Production relay servers (4 regions): +| Region | Hostname | +|--------|----------| +| NA East | `use1-1.relay.n0.iroh-canary.iroh.link` | +| NA West | `usw1-1.relay.n0.iroh-canary.iroh.link` | +| EU | `euc1-1.relay.n0.iroh-canary.iroh.link` | +| AP | `aps1-1.relay.n0.iroh-canary.iroh.link` | + +DNS Address Lookup origin: `dns.iroh.link` \ No newline at end of file diff --git a/docs/research/references/iroh/iroh/02-key-types-traits.md b/docs/research/references/iroh/iroh/02-key-types-traits.md new file mode 100644 index 0000000..eb6fe9c --- /dev/null +++ b/docs/research/references/iroh/iroh/02-key-types-traits.md @@ -0,0 +1,392 @@ +# Iroh: Key Types and Traits + +## Core Identity Types (`iroh-base`) + +### `SecretKey` +Ed25519 signing key (32 bytes). Used for: +- TLS authentication (RFC 7250 Raw Public Key) +- Signing pkarr packets for address discovery +- Generating the corresponding `PublicKey`/`EndpointId` + +```rust +// Generation +let secret_key = SecretKey::generate(); + +// From bytes +let secret_key = SecretKey::from_bytes(&[0u8; 32]); + +// Access public key +let public_key: PublicKey = secret_key.public(); +``` + +### `PublicKey` / `EndpointId` +`EndpointId` is a type alias for `PublicKey`. Both are 32-byte Ed25519 compressed points. + +```rust +pub type EndpointId = PublicKey; + +impl PublicKey { + pub const LENGTH: usize = 32; + pub fn from_bytes(bytes: &[u8; 32]) -> Result; + pub fn as_bytes(&self) -> &[u8; 32]; + pub fn verify(&self, message: &[u8], signature: &Signature) -> Result<(), SignatureError>; + pub fn fmt_short(&self) -> impl Display; // First 5 bytes hex +} +``` + +Serialization: Human-readable → base32 z-base-32 encoding; Binary → 32 raw bytes. + +### `Signature` +Ed25519 signature (64 bytes). Used in pkarr for signing endpoint discovery records. + +### `KeyParsingError` +Error type for key parsing failures. + +## Addressing Types (`iroh-base`) + +### `EndpointAddr` +The primary addressing type — combines identity with network paths: + +```rust +pub struct EndpointAddr { + pub id: EndpointId, + pub addrs: BTreeSet, +} + +impl EndpointAddr { + pub fn new(id: PublicKey) -> Self; + pub fn from_parts(id: PublicKey, addrs: impl IntoIterator) -> Self; + pub fn with_relay_url(self, relay_url: RelayUrl) -> Self; + pub fn with_ip_addr(self, addr: SocketAddr) -> Self; + pub fn is_empty(&self) -> bool; + pub fn ip_addrs(&self) -> impl Iterator; + pub fn relay_urls(&self) -> impl Iterator; +} +``` + +Can be constructed from just an `EndpointId` (relies on Address Lookup), or with explicit paths: +```rust +// From just EndpointId — needs Address Lookup +let addr = EndpointAddr::new(endpoint_id); + +// With relay URL +let addr = EndpointAddr::new(endpoint_id).with_relay_url(relay_url); + +// With both +let addr = EndpointAddr::from_parts(endpoint_id, [ + TransportAddr::Relay(relay_url), + TransportAddr::Ip(socket_addr), +]); +``` + +### `TransportAddr` +```rust +pub enum TransportAddr { + Relay(RelayUrl), + Ip(SocketAddr), + Custom(CustomAddr), +} +``` + +### `CustomAddr` +Opaque custom transport address (for `unstable-custom-transports` feature): +```rust +pub struct CustomAddr { + id: u32, + addr: Vec, +} +``` + +### `RelayUrl` +Arc-wrapped `Url` identifying a relay server. Cheaply clonable. Encourages fully-qualified DNS names (trailing dot). + +```rust +let url: RelayUrl = "https://use1-1.relay.n0.iroh-canary.iroh.link.".parse()?; +``` + +## Endpoint Trait (`iroh`) + +### `Endpoint` +The central type — created via `Builder`, used for all connection operations: + +```rust +impl Endpoint { + // Construction + pub fn builder(preset: impl Preset) -> Builder; + pub async fn bind(preset: impl Preset) -> Result; + + // Connection + pub async fn connect(&self, addr: impl Into, alpn: &[u8]) -> Result; + pub async fn connect_with_opts(&self, addr: impl Into, alpn: &[u8], opts: ConnectOptions) -> Result; + pub fn accept(&self) -> Accept<'_>; + + // Identity + pub fn id(&self) -> EndpointId; + pub fn secret_key(&self) -> &SecretKey; + pub fn addr(&self) -> EndpointAddr; + pub fn watch_addr(&self) -> impl Watcher; + + // Lifecycle + pub async fn close(&self); + pub fn is_closed(&self) -> bool; + pub fn closed(&self) -> EndpointClosed; + pub async fn online(&self); // Wait for relay connection + + // Configuration changes + pub fn set_alpns(&self, alpns: Vec>); + pub async fn insert_relay(&self, relay: RelayUrl, config: Arc) -> Option>; + pub async fn remove_relay(&self, relay: &RelayUrl) -> Option>; + pub async fn add_external_addr(&self, addr: SocketAddr); + pub async fn remove_external_addr(&self, addr: &SocketAddr) -> bool; + pub fn set_user_data_for_address_lookup(&self, user_data: Option); + pub async fn network_change(&self); + + // Observers + pub fn home_relay_status(&self) -> impl Watcher>; + pub fn net_report(&self) -> impl Watcher>; + pub fn remote_info(&self, id: EndpointId) -> Option; + pub fn metrics(&self) -> &EndpointMetrics; + pub fn bound_sockets(&self) -> Vec; + pub fn dns_resolver(&self) -> Result<&DnsResolver, EndpointError>; + pub fn tls_config(&self) -> &rustls::ClientConfig; + pub fn address_lookup(&self) -> Result<&AddressLookupServices, EndpointError>; +} +``` + +### `Builder` +Fluent builder for `Endpoint`: + +```rust +let ep = Endpoint::builder(presets::N0) + .secret_key(secret_key) // Identity + .alpns(vec![b"my-alpn".to_vec()]) // Accepted protocols + .relay_mode(RelayMode::Default) // Relay configuration + .address_lookup(PkarrPublisher::n0_dns()) // Address discovery + .address_lookup(DnsAddressLookup::n0_dns()) // DNS resolution + .addr_filter(AddrFilter::relay_only()) // Filter published addresses + .user_data_for_address_lookup(user_data) // Custom discovery data + .transport_config(QuicTransportConfig::default()) // QUIC tuning + .dns_resolver(dns_resolver) // Custom DNS resolver + .proxy_url(proxy_url) // HTTP proxy + .ca_roots_config(CaRootsConfig::default()) // TLS CA roots + .keylog(true) // SSLKEYLOGFILE debug + .max_tls_tickets(256) // 0-RTT ticket cache + .hooks(my_hook) // Connection hooks + .portmapper_config(PortmapperConfig::Enabled) // UPnP/NAT-PMP + .external_addr(addr) // Advertised external addr + .bind_addr("0.0.0.0:0")? // Bind specific socket + .bind() // Build & bind + .await?; +``` + +### `RelayMode` +```rust +pub enum RelayMode { + Disabled, // No relay + Default, // n0 production relays + Staging, // n0 staging relays + Custom(RelayMap), // Custom relay configuration +} +``` + +## Protocol Handler (`iroh::protocol`) + +### `ProtocolHandler` +Trait for handling incoming connections by ALPN: + +```rust +pub trait ProtocolHandler: Send + Sync + Debug + 'static { + // Optional: intercept at Accepting stage (supports 0-RTT) + fn on_accepting(&self, accepting: Accepting) -> impl Future> + Send; + + // Required: handle the established connection + fn accept(&self, connection: Connection) -> impl Future> + Send; + + // Optional: called on graceful shutdown + fn shutdown(&self) -> impl Future + Send; +} +``` + +### `Router` +Spawns an accept loop that dispatches incoming connections to registered handlers: + +```rust +let router = Router::builder(endpoint) + .accept(b"/my-alpn", Arc::new(MyHandler)) + .incoming_filter(|incoming| { + if !incoming.remote_addr_validated() { + IncomingFilterOutcome::Retry + } else { + IncomingFilterOutcome::Accept + } + }) + .spawn(); + +// Later... +router.shutdown().await?; +``` + +### `IncomingFilterOutcome` +```rust +pub enum IncomingFilterOutcome { + Accept, // Allow the connection + Retry, // Send QUIC retry (address validation) + Reject, // Refuse with CONNECTION_REFUSED + Ignore, // Drop silently (remote times out) +} +``` + +### `AccessLimit` +Wrapper that limits connections to allowed `EndpointId`s: + +```rust +let handler = AccessLimit::new(MyHandler, |endpoint_id| allowed_set.contains(&endpoint_id)); +``` + +### `EndpointHooks` +Intercept connection establishment at two points: + +```rust +pub trait EndpointHooks: Debug + Send + Sync { + // Before outgoing connection starts + fn before_connect<'a>(&'a self, remote_addr: &'a EndpointAddr, alpn: &'a [u8]) + -> BoxFuture<'a, BeforeConnectOutcome>; + + // After TLS handshake completes (on both sides) + fn after_handshake<'a>(&'a self, info: &'a ConnectionInfo) + -> BoxFuture<'a, AfterHandshakeOutcome>; +} +``` + +## Connection Types (`iroh::endpoint::connection`) + +### `Connecting` +The state between initiating a connection and completing the handshake: + +```rust +impl Connecting { + pub async fn await?(self) -> Result; + pub fn into_0rtt(self) -> Result<(OutgoingZeroRttConnection, Connection), Connecting>; + pub fn alpn(&self) -> Result, ConnectingError>; + pub fn remote_id(&self) -> Result; +} +``` + +### `Connection` +Wraps a `noq::Connection` with iroh-specific metadata: + +```rust +impl Connection { + // Stream operations + pub async fn open_bi(&self) -> Result<(SendStream, RecvStream), OpenBi>; + pub async fn accept_bi(&self) -> Result<(SendStream, RecvStream), AcceptBi>; + pub async fn open_uni(&self) -> Result; + pub async fn accept_uni(&self) -> Result; + + // Datagrams + pub fn send_datagram(&self, data: SendDatagram) -> Result<(), SendDatagramError>; + pub async fn read_datagram(&self) -> Result; + + // Connection lifecycle + pub fn close(&self, error_code: VarInt, reason: &[u8]); + pub async fn closed(&self) -> ConnectionError; + + // Identity + pub fn remote_id(&self) -> EndpointId; + pub fn alpn(&self) -> Vec; + + // Path observation + pub fn paths(&self) -> PathWatcher; + + // Keying material export + pub fn export_keying_material(&self, output: &mut [u8], label: &[u8], context: Option<&[u8]>) -> Result<(), ExportKeyingMaterialError>; +} +``` + +### `Incoming` +Pre-accept incoming connection: + +```rust +impl Incoming { + pub fn accept(self) -> Result; + pub fn accept_with(self, server_config: Arc) -> Result; + pub fn refuse(self); + pub fn retry(self) -> Result<(), RetryError>; + pub fn ignore(self); + pub fn remote_addr(&self) -> IncomingAddr; + pub fn local_ip(&self) -> Option; + pub fn remote_addr_validated(&self) -> bool; + pub fn decrypt(&self) -> Option; +} +``` + +### `IncomingAddr` +```rust +pub enum IncomingAddr { + Ip(SocketAddr), + Relay { url: RelayUrl, endpoint_id: EndpointId }, + Custom(CustomAddr), +} +``` + +## `RelayMap` and `RelayConfig` (`iroh-relay`) + +### `RelayMap` +Thread-safe map of relay servers: + +```rust +let map = RelayMap::from_iter([ + "https://relay1.example.org".parse()?, + "https://relay2.example.org".parse()?, +]); +``` + +### `RelayConfig` +```rust +pub struct RelayConfig { + pub url: RelayUrl, + pub quic: Option, // QAD support +} + +pub struct RelayQuicConfig { + pub port: u16, // Default: 3478 +} +``` + +## `EndpointData` and `EndpointInfo` (`iroh-dns`) + +### `EndpointData` +The data published about an endpoint: + +```rust +pub struct EndpointData { + addrs: Vec, + user_data: Option, +} +``` + +### `EndpointInfo` +Combines `EndpointId` with `EndpointData`: + +```rust +pub struct EndpointInfo { + pub endpoint_id: EndpointId, + pub data: EndpointData, +} +``` + +### `UserData` +Application-defined string data published alongside addressing info: + +```rust +pub struct UserData(String); // Max 256 bytes +``` + +### `AddrFilter` +Controls which addresses are published to address lookup services: + +```rust +let filter = AddrFilter::relay_only(); // Only relay URLs +let filter = AddrFilter::unfiltered(); // All addresses +let filter = AddrFilter::custom(|addrs| { /* custom logic */ }); +``` \ No newline at end of file diff --git a/docs/research/references/iroh/iroh/03-networking-protocols.md b/docs/research/references/iroh/iroh/03-networking-protocols.md new file mode 100644 index 0000000..c5bed63 --- /dev/null +++ b/docs/research/references/iroh/iroh/03-networking-protocols.md @@ -0,0 +1,401 @@ +# Iroh: Networking & Protocol Details + +## Connection Establishment + +### Overview +The connection process follows this sequence: + +``` +Caller Callee + | | + |--- connect(EndpointAddr, alpn) -------->| (via relay first) + | | + |<------ TLS Handshake (Raw Public Key) ->| + | | + |<====== QUIC Connection Established ====| + | | + | (iroh attempts direct path migration) | + | | + |--- open_bi() / open_uni() ------------->| + |<--- accept_bi() / accept_uni() ----------| +``` + +### Step-by-Step + +1. **Resolve addressing** — `resolve_remote(EndpointAddr)` starts a `RemoteStateActor` for the peer. If no direct addresses or relay URL are provided, Address Lookup services are queried. + +2. **Map addresses** — `EndpointId` is mapped to a synthetic IPv6 address for the QUIC layer (`EndpointIdMappedAddr`). Relay and custom transport addresses are similarly mapped. + +3. **TLS connection** — Uses RFC 7250 Raw Public Keys. The server name is encoded as `.iroh.invalid`. Both sides authenticate by `EndpointId`. + +4. **ALPN negotiation** — The Application-Layer Protocol Negotiation determines which protocol handler receives the connection. + +5. **Path migration** — Once a QUIC connection is established (initially via relay), iroh continuously searches for better paths. Direct IP paths are preferred when available. + +## Transport Layer Architecture + +### The `Socket` — Core Connectivity Engine + +The `Socket` struct is the heart of iroh's networking. It manages: +- Multiple transport paths (IPv4, IPv6, relay, custom) +- Address discovery and NAT traversal +- Path migration between relay and direct connections + +``` + ┌──────────────┐ + │ Endpoint │ (Public API) + │ (Arc) │ + └──────┬───────┘ + │ + ┌──────▼───────┐ + │ Socket │ (Connectivity engine) + │ (Arc) │ + └──────┬───────┘ + │ + ┌────────────┼────────────┐ + │ │ │ + ┌─────▼─────┐ ┌───▼────┐ ┌──────▼──────┐ + │IpTransport│ │Relay │ │CustomTransport│ + │(IPv4/v6) │ │Transport│ │(unstable) │ + └─────┬─────┘ └───┬────┘ └──────┬──────┘ + │ │ │ + ┌─────▼─────┐ ┌───▼────┐ │ + │ UdpSocket │ │WebSocket│ │ + │ (netwatch)│ │ Actor │ │ + └────────────┘ └────────┘ │ +``` + +### Transport Configuration + +```rust +pub enum TransportConfig { + Ip { + config: IpConfig, // IPv4 or IPv6 socket config + is_user_defined: bool, + }, + Relay { + relay_map: RelayMap, // Which relay servers to use + is_user_defined: bool, + }, + #[cfg(feature = "unstable-custom-transports")] + Custom(Arc), +} + +pub enum IpConfig { + V4 { ip_net: Ipv4Net, port: u16, is_required: bool, is_default: bool }, + V6 { ip_net: Ipv6Net, scope_id: u32, port: u16, is_required: bool, is_default: bool }, +} +``` + +### Address Mapping + +Iroh maps all transport addresses to IPv6 for the QUIC layer: + +- **IPv4/IPv6 addresses** → used directly as QUIC path addresses +- **Relay addresses** → mapped to synthetic IPv6 addresses in a dedicated range +- **Custom addresses** → mapped to synthetic IPv6 addresses in another range + +The `MappedAddrs` struct maintains these mappings: +```rust +pub(crate) struct MappedAddrs { + pub(super) endpoint_addrs: AddrMap, + pub(super) relay_addrs: AddrMap<(RelayUrl, EndpointId), RelayMappedAddr>, + pub(super) custom_addrs: AddrMap, +} +``` + +### Transport Bias + +Path selection uses a configurable bias system: + +```rust +let endpoint = Endpoint::builder(presets::N0) + .transport_bias(AddrKind::Custom(42), TransportBias::primary()) + .bind() + .await?; +``` + +Default biases: +- IPv4 and IPv6 are **primary** (IPv6 gets small RTT advantage) +- Relay is **backup** (only used when no primary transport available) + +## Relay Protocol + +### Architecture + +The relay system is based on a revised version of Tailscale's DERP (Designated Encrypted Relay for Packets) protocol. + +``` +Client A Relay Server Client B + │ │ │ + │─── HTTP CONNECT ──>| │ + │<── 200 OK ─────────│ │ + │ │<─── HTTP CONNECT ────│ + │ │──── 200 OK ────────>│ + │ │ │ + │─── Encrypted QUIC ─>│─── Encrypted QUIC ─>│ + │<── Encrypted QUIC ──│<── Encrypted QUIC ──│ +``` + +### Relay Actor + +The `RelayActor` manages the WebSocket connection to the relay: +- Connects to relay via HTTPS, upgrades to custom protocol +- Sends/receives encrypted datagrams on behalf of the local endpoint +- Manages reconnection on network changes or relay restarts +- Reports connection status via `HomeRelayWatch` + +### Relay Data Flow +1. Outgoing packet → `RelayTransport::send()` → `RelayActor` → WebSocket → Relay server → WebSocket → remote `RelayActor` → remote `RelayTransport::recv()` → QUIC +2. The relay only sees encrypted QUIC packets — it cannot decode application data + +### Home Relay Selection + +The `net_report` module continuously probes relay servers and maintains latency statistics. The "home relay" is selected based on: +- Lowest recent latency (with hysteresis to avoid flapping) +- At most a 2/3 improvement threshold to switch from current relay + +## Hole-Punching & NAT Traversal + +### QUIC Address Discovery (QAD) + +Iroh uses QUIC Address Discovery (based on [draft-ietf-quic-address-discovery](https://datatracker.ietf.org/doc/draft-ietf-quic-address-discovery/)) to discover external IP addresses. The relay servers expose QAD endpoints. + +The `net_report` module: +1. Establishes QUIC connections to relay servers +2. Uses `observed_external_addr()` to learn external addresses +3. Reports NAT type, mapping behavior, and preferred relay + +### NAT Traversal Strategy + +``` + ┌──────────────────────────────┐ + │ NAT Traversal │ + │ │ + │ 1. Direct connection attempt │ + │ (simultaneous open) │ + │ │ + │ 2. QAD-discovered addresses │ + │ (relay reports observed IP)│ + │ │ + │ 3. Port mapping (UPnP/PCP/NAT-PMP)│ + │ (if supported by gateway) │ + │ │ + │ 4. Relay fallback │ + │ (always available) │ + └──────────────────────────────┘ +``` + +### Port Mapper + +```rust +pub enum PortmapperConfig { + Enabled {}, // Default: tries UPnP, PCP, NAT-PMP + Disabled, // No port mapping +} +``` + +When enabled, the port mapper: +- Discovers gateway devices +- Requests port mappings +- Provides external addresses to the endpoint +- Updates when mappings change + +### Net Report + +`NetReport` discovers network conditions: +- IPv4/IPv6 connectivity +- NAT mapping behavior (varies by destination or not) +- Captive portal detection +- Preferred relay selection +- External IP addresses (via QAD) + +Key timeouts: +- `NET_REPORT_TIMEOUT` = 10 seconds +- `FULL_REPORT_INTERVAL` = 5 minutes +- `HEARTBEAT_INTERVAL` = 5 seconds (keepalive) +- `PATH_MAX_IDLE_TIMEOUT` = 15 seconds (direct) +- `RELAY_PATH_MAX_IDLE_TIMEOUT` = 30 seconds (relay) + +## Address Lookup System + +### Trait Definition + +```rust +pub trait AddressLookup: Debug + Send + Sync + 'static { + fn publish(&self, data: &EndpointData); + fn resolve(&self, endpoint_id: EndpointId) -> Option>>; +} +``` + +### `AddressLookupServices` +A composite that runs multiple lookup services concurrently: + +```rust +let services = AddressLookupServices::default(); +services.set_addr_filter(AddrFilter::relay_only()); +services.add(publisher); +services.add(resolver); +``` + +Resolution merges results from all services. Individual service errors don't block other services. + +### Built-in Implementations + +#### `PkarrPublisher` +Publishes endpoint info to a pkarr relay via HTTP PUT: +```rust +let publisher = PkarrPublisher::builder(pkarr_url) + .addr_filter(AddrFilter::relay_only()) // Default: relay-only + .build(secret_key, tls_config); +``` + +#### `PkarrResolver` (browser/WASM) +Resolves endpoint info from a pkarr relay via HTTP GET. + +#### `DnsAddressLookup` (non-browser) +Resolves endpoint info via DNS TXT records: +```rust +// Default n0 DNS +let lookup = DnsAddressLookup::n0_dns(); + +// Custom DNS origin +let lookup = DnsAddressLookup::new(dns_resolver, origin); +``` + +#### `MemoryLookup` +In-memory address lookup for testing: +```rust +let lookup = MemoryLookup::new(); +lookup.add_endpoint(endpoint_id, endpoint_data); +``` + +### DNS Record Format +``` +_iroh.. TXT +``` +Attributes: +- `relay=` — Home relay URL +- `addr= ` — Space-separated socket addresses +- `user_data=` — Application-specific data + +## TLS Configuration + +### `TlsConfig` +Manages TLS state shared across sessions: +```rust +struct TlsConfig { + secret_key: SecretKey, + cert_resolver: Arc, + server_verifier: Arc, + client_verifier: Arc, + session_store: Arc, + crypto_provider: Arc, +} +``` + +### Raw Public Key Certificate +Uses RFC 7250 — no X.509 certificates. The `ResolveRawPublicKeyCert` resolver creates TLS certificates on-the-fly from the Ed25519 public key. + +### Verification Flow +- **Client verifies server**: The `ServerCertificateVerifier` checks that the server's `EndpointId` matches the expected `EndpointId` encoded in the TLS server name. +- **Server verifies client**: The `ClientCertificateVerifier` ensures the client presents a valid raw public key. + +### Crypto Providers +Two built-in options via feature flags: +- `tls-ring` — uses `ring` crypto (default) +- `tls-aws-lc-rs` — uses AWS LC-RS crypto + +Custom providers can be set via `Builder::crypto_provider()`. + +## Multipath & Path Migration + +Iroh supports QUIC multipath connections. Multiple paths can be active simultaneously: + +```rust +// Watch path changes +let paths = connection.paths(); +while let Some(infos) = paths.stream().next().await { + for info in infos.iter() { + if info.is_ip() { /* direct path */ } + if info.is_relay() { /* relay path */ } + } +} +``` + +Maximum multipath paths per connection: 12 (`MAX_MULTIPATH_PATHS`). + +### Path Types +```rust +pub struct PathInfo { + pub addr: TransportAddr, + pub usage: TransportAddrUsage, +} + +pub enum TransportAddrUsage { + DefaultRoute, + SubnetRoute, + Backup, +} +``` + +## Connection Hooks + +```rust +#[derive(Debug, Clone)] +struct MyHook; + +impl EndpointHooks for MyHook { + fn before_connect<'a>( + &'a self, + remote_addr: &'a EndpointAddr, + alpn: &'a [u8], + ) -> BoxFuture<'a, BeforeConnectOutcome> { + Box::pin(async move { + if is_allowed(remote_addr.id()) { + BeforeConnectOutcome::Accept + } else { + BeforeConnectOutcome::Reject + } + }) + } + + fn after_handshake<'a>( + &'a self, + info: &'a ConnectionInfo, + ) -> BoxFuture<'a, AfterHandshakeOutcome> { + Box::pin(async move { + AfterHandshakeOutcome::Accept + }) + } +} +``` + +## Custom Transports (Unstable) + +```rust +pub trait CustomTransport: Send + Sync + Debug + 'static { + // Create an endpoint for this transport + fn create_endpoint(&self, config: CustomEndpointConfig) -> Result, CustomTransportError>; +} + +pub trait CustomEndpoint: Send + Sync + Debug + 'static { + fn send(&self, item: CustomSendItem) -> Result<(), CustomTransportError>; + fn recv(&self) -> Result; +} + +// Register: +let ep = Endpoint::builder(presets::N0) + .add_custom_transport(Arc::new(MyTransport)) + .bind() + .await?; +``` + +Transport IDs (from `TRANSPORTS.md`): + +| ID | Transport | Address format | +|----|-----------|---------------| +| `0x00-0x1F` | Reserved | - | +| `0x20` | Test | Ed25519 public key (32 bytes) | +| `0x544F52` | Tor | Ed25519 public key (32 bytes) | +| `0x424C45` | BLE | Bluetooth MAC address (6 bytes) | \ No newline at end of file diff --git a/docs/research/references/iroh/iroh/04-sub-crates.md b/docs/research/references/iroh/iroh/04-sub-crates.md new file mode 100644 index 0000000..33e7758 --- /dev/null +++ b/docs/research/references/iroh/iroh/04-sub-crates.md @@ -0,0 +1,294 @@ +# Iroh: Sub-Crates + +## `iroh-base` + +**Purpose**: Fundamental types shared across all iroh crates. +**Features**: `key` (default), `relay` (default) + +### Key Types + +| Type | Description | +|------|-------------| +| `SecretKey` | Ed25519 signing key (32 bytes). Generated randomly or from bytes. | +| `PublicKey` | Ed25519 public key (32 bytes). Verifies signatures. | +| `EndpointId` | Type alias for `PublicKey` — used as network identity. | +| `Signature` | Ed25519 signature (64 bytes). | +| `RelayUrl` | Arc-wrapped `Url` identifying a relay server. | +| `EndpointAddr` | Combines `EndpointId` + `BTreeSet`. Primary addressing type. | +| `TransportAddr` | Enum: `Relay(RelayUrl)`, `Ip(SocketAddr)`, `Custom(CustomAddr)`. | +| `CustomAddr` | Opaque address for custom transports (id + bytes). | +| `KeyParsingError` | Error type for key parsing. | +| `RelayUrlParseError` | Error type for URL parsing. | + +### `EndpointAddr` Methods + +```rust +impl EndpointAddr { + pub fn new(id: PublicKey) -> Self; + pub fn from_parts(id: PublicKey, addrs: impl IntoIterator) -> Self; + pub fn with_relay_url(self, relay_url: RelayUrl) -> Self; + pub fn with_ip_addr(self, addr: SocketAddr) -> Self; + pub fn with_addrs(self, addrs: impl IntoIterator) -> Self; + pub fn is_empty(&self) -> bool; + pub fn ip_addrs(&self) -> impl Iterator; + pub fn relay_urls(&self) -> impl Iterator; +} +``` + +### Serialization +- `PublicKey`/`EndpointId`: Human-readable → base32 z-base-32; Binary → 32 raw bytes +- `EndpointAddr`: Serialized as `{id, addrs}` with `TransportAddr` as tagged enum +- `RelayUrl`: Serialized as URL string + +--- + +## `iroh-dns` + +**Purpose**: DNS resolver and endpoint info serialization for address discovery. +**Key Features**: pkarr signed packet creation/verification, DNS TXT record parsing, configurable DNS resolver. + +### Modules + +| Module | Description | +|--------|-------------| +| `dns` | `DnsResolver` — configurable async DNS resolver with IPv4/IPv6 staggered lookup | +| `endpoint_info` | `EndpointInfo`, `EndpointData`, `AddrFilter`, `UserData` — serialization/deserialization | +| `pkarr` | Pkarr signed packet creation and verification | +| `attrs` | Low-level TXT record attribute parsing | + +### `DnsResolver` + +```rust +impl DnsResolver { + pub fn new() -> Self; + pub fn with_nameserver(addr: SocketAddr) -> Self; + pub fn with_nameservers(addrs: Vec) -> Self; + + // Lookup methods + pub async fn lookup_ipv4(&self, host: String) -> Result<...>; + pub async fn lookup_ipv6(&self, host: String) -> Result<...>; + pub async fn lookup_ipv4_ipv6_staggered(&self, host: &str, timeout: Duration, delays: &[u64]) -> Result<...>; + pub async fn lookup_txt(&self, host: String) -> Result<...>; + pub async fn lookup_endpoint_by_id(&self, id: &EndpointId, origin: &str) -> Result; + + // Cache management + pub fn clear_cache(&self); + pub fn reset_resolver(&self); +} +``` + +### `EndpointInfo` & `EndpointData` + +```rust +pub struct EndpointInfo { + pub endpoint_id: EndpointId, + pub data: EndpointData, +} + +pub struct EndpointData { + addrs: Vec, + user_data: Option, +} + +impl EndpointData { + pub fn new(addrs: Vec) -> Self; + pub fn from_iter(addrs: impl IntoIterator) -> Self; + pub fn with_user_data(mut self, user_data: UserData) -> Self; + pub fn addrs(&self) -> impl Iterator; + pub fn user_data(&self) -> Option<&UserData>; + pub fn apply_filter(&self, filter: &AddrFilter) -> Cow<'_, EndpointData>; +} +``` + +### `AddrFilter` + +Controls which addresses are published in address lookup: + +```rust +pub enum AddrFilter { + RelayOnly, // Only relay URLs + Unfiltered, // All addresses + Custom(fn(&[TransportAddr]) -> Vec), +} +``` + +### Pkarr Integration + +```rust +// Creating signed packets +let info = EndpointInfo::new(secret_key.public()) + .with_relay_url(relay_url); +let packet = info.to_pkarr_signed_packet(&secret_key, 30)?; // 30 second TTL + +// Verifying and extracting +let info = EndpointInfo::from_pkarr_signed_packet(&packet)?; +``` + +--- + +## `iroh-relay` + +**Purpose**: Relay server and client implementation. Provides DERP-like relay protocol, QAD support, and relay server binary. + +### Key Exports + +| Type | Description | +|------|-------------| +| `RelayMap` | Thread-safe map of `RelayUrl → RelayConfig` | +| `RelayConfig` | Configuration for a single relay server | +| `RelayQuicConfig` | QUIC address discovery configuration | +| `KeyCache` | Cache for relay server public keys | +| `PingTracker` | Ping/pong tracking for relay connections | +| `MAX_PACKET_SIZE` | Maximum relay packet size (64KB - overhead) | + +### Modules + +| Module | Description | +|--------|-------------| +| `client` | HTTP client for relay server connections | +| `http` | HTTP-related relay functionality | +| `protos` | Protocol definitions (handshake, relay, streams) | +| `quic` | QUIC client for QAD probing | +| `server` | Full relay server implementation (`feature = "server"`) | +| `tls` | TLS configuration utilities | + +### `RelayConfig` + +```rust +pub struct RelayConfig { + pub url: RelayUrl, + pub quic: Option, +} + +impl RelayConfig { + pub fn new(url: RelayUrl, quic: Option) -> Self; + pub fn from(url: RelayUrl) -> Self; // No QAD +} +``` + +### `RelayMap` + +```rust +impl RelayMap { + pub fn empty() -> Self; + pub fn from(relay: RelayConfig) -> Self; + pub fn from_iter(iter: impl IntoIterator>) -> Self; + pub fn try_from_iter(iter: impl IntoIterator) -> Result; + pub fn insert(&self, url: RelayUrl, config: Arc) -> Option>; + pub fn remove(&self, url: &RelayUrl) -> Option>; + pub fn len(&self) -> usize; + pub fn is_empty(&self) -> bool; + pub fn urls>(&self) -> T; + pub fn relays>>(&self) -> T; +} +``` + +### Relay Protocol (DERP-like) + +The relay protocol is based on Tailscale's DERP protocol, adapted for iroh: + +1. Client connects via HTTPS, upgrades to custom protocol +2. Authentication via raw public key (Ed25519) +3. Encrypted datagram forwarding by `EndpointId` +4. QAD probes via QUIC for address discovery +5. Ping/pong keepalive mechanism + +### TLS Utilities + +```rust +pub use iroh_relay::tls::{CaRootsConfig, default_provider}; + +// Skip certificate verification (testing only) +let config = CaRootsConfig::insecure_skip_verify(); + +// Use system trust roots +let config = CaRootsConfig::platform_verifier(); + +// Use specific roots +let config = CaRootsConfig::from_pem(pem_bytes); +``` + +--- + +## `iroh-dns-server` + +**Purpose**: DNS server that resolves iroh `EndpointId`s to addressing information. Powers `dns.iroh.link`. + +### Key Features +- Serves DNS TXT records for `_iroh..` queries +- Integrates with pkarr for signed record verification +- Supports production (`dns.iroh.link`) and staging (`staging-dns.iroh.link`) origins +- Includes benchmarking support + +### Configuration Files +- `config.dev.toml` — Development configuration +- `config.prod.toml` — Production configuration + +--- + +## Internal Modules in `iroh` Crate + +### `socket` Module +The connectivity layer — manages the `Socket` struct that orchestrates: +- Multiple transport paths +- Network change detection +- Address discovery and publication +- Remote state actors (per-peer state machines) + +**Key sub-modules**: + +| Sub-module | Description | +|-----------|-------------| +| `transports/` | Transport implementations (IP, relay, custom) | +| `transports/ip.rs` | IPv4/IPv6 UDP transport | +| `transports/relay.rs` | Relay WebSocket transport | +| `transports/relay/actor.rs` | Relay connection management actor | +| `transports/custom.rs` | Unstable custom transport API | +| `remote_map.rs` | Per-peer `RemoteStateActor` management | +| `remote_map/remote_state.rs` | State machine for connecting to a peer | +| `mapped_addrs.rs` | Address mapping for QUIC layer | +| `concurrent_read_map.rs` | Lock-free concurrent map for remote actors | +| `metrics.rs` | Socket-level metrics | + +### `net_report` Module +Network condition reporter: +- Discovers external IP addresses (QAD) +- Measures relay latencies +- Detects NAT types +- Detects captive portals +- Selects preferred relay + +### `portmapper` Module +UPnP/PCP/NAT-PMP port mapping: +- Gateway discovery +- Port mapping procurement +- External address monitoring + +### `address_lookup` Module +Pluggable address discovery: + +| Sub-module | Description | +|-----------|-------------| +| `dns.rs` | `DnsAddressLookup` — resolves via DNS TXT records | +| `pkarr.rs` | `PkarrPublisher` — publishes via HTTP PUT to pkarr relay; `PkarrResolver` — resolves from pkarr relay | +| `memory.rs` | `MemoryLookup` — in-memory lookup for testing | + +### `runtime` Module +Tokio-based async runtime wrapper for `noq`: +- Task spawning with cancellation support +- Timer management +- Graceful and abrupt shutdown +- WASM browser support (delegates to `wasm-bindgen-futures`) + +### `defaults` Module +Default configuration values: +- Production relay servers (4 regions) +- Staging relay servers (2 regions) +- Timeout constants +- Environment variable for forcing staging (`IROH_FORCE_STAGING_RELAYS`) + +### `metrics` Module +`EndpointMetrics` collection: +- Socket metrics (datagrams sent/received, data by transport type) +- Net report metrics (reports generated, full vs incremental) +- Port mapper metrics \ No newline at end of file diff --git a/docs/research/references/iroh/iroh/05-data-flow-internals.md b/docs/research/references/iroh/iroh/05-data-flow-internals.md new file mode 100644 index 0000000..2d09c3f --- /dev/null +++ b/docs/research/references/iroh/iroh/05-data-flow-internals.md @@ -0,0 +1,261 @@ +# Iroh: Data Flow & Internal Architecture + +## Data Flow: Connecting to a Remote Endpoint + +``` +Endpoint::connect(endpoint_addr, alpn) + │ + ▼ +resolve_remote(endpoint_addr) + │ + ├─ If addr has direct IPs or relay URL → use those + │ + └─ If addr is just EndpointId → query AddressLookupServices + │ + ├─ PkarrPublisher/PkarrResolver (HTTP) + ├─ DnsAddressLookup (DNS TXT) + ├─ MemoryLookup (in-memory) + └─ ...custom implementations + │ + ▼ + Map EndpointId → MappedAddr for QUIC layer + │ + ▼ + noq::Endpoint::connect(client_config, dest_addr, server_name) + │ + ├─ TLS handshake with Raw Public Key authentication + │ server_name = ".iroh.invalid" + │ + └─ QUIC connection established + │ + ▼ + Connecting → Connection + │ + ├─ Connection stays on relay path initially + │ + └─ RemoteStateActor discovers direct paths + │ + ├─ QAD-discovered addresses + ├─ Addresses from Address Lookup + ├─ Port mapper external addresses + │ + └─ Path migration: relay → direct (if possible) +``` + +## Data Flow: Accepting Connections + +``` +Endpoint::accept() → Accept<'_> + │ + ▼ (incoming QUIC packet arrives on any transport) + │ + noq::Endpoint::accept() + │ + ▼ + Incoming + │ + ├─ incoming.remote_addr() → IncomingAddr (Ip/Relay/Custom) + ├─ incoming.remote_addr_validated() → bool + ├─ incoming.accept() → Accepting + ├─ incoming.refuse() → reject + ├─ incoming.retry() → QUIC retry (address validation) + └─ incoming.ignore() → drop silently + │ + Accepting + │ + ├─ accepting.alpn().await → alpn bytes + ├─ accepting.into_0rtt() → (OutgoingZeroRtt, Connection) [optional] + └─ accepting.await → Connection +``` + +## Data Flow: Router Accept Loop + +``` +Router::spawn() + │ + ├─ endpoint.set_alpns(registered_alpns) + │ + └─ Loop: + │ + ├─ endpoint.accept().await → Incoming + │ │ + │ ├─ Apply incoming_filter (optional) + │ │ ├─ Accept → continue + │ │ ├─ Retry → incoming.retry() + │ │ ├─ Reject → incoming.refuse() + │ │ └─ Ignore → incoming.ignore() + │ │ + │ ├─ incoming.accept() → Accepting + │ ├─ accepting.alpn().await → determine ALPN + │ │ + │ └─ protocols.get(alpn) → handler + │ │ + │ ├─ handler.on_accepting(accepting).await + │ └─ handler.accept(connection).await + │ + └─ On shutdown: + ├─ protocols.shutdown().await + ├─ handler_cancel_token.cancel() + └─ endpoint.close().await +``` + +## Actor Model: Per-Remote State + +Each remote peer gets a `RemoteStateActor` that manages the connection state: + +``` +┌───────────────────────────────────────────────┐ +│ RemoteStateActor │ +│ │ +│ ┌─────────────┐ ┌─────────────────┐ │ +│ │ Address │ │ Connection │ │ +│ │ Lookup │ │ Tracker │ │ +│ │ Resolution │ │ │ │ +│ └──────┬──────┘ └────────┬────────┘ │ +│ │ │ │ +│ ▼ ▼ │ +│ ┌──────────────────────────────────┐ │ +│ │ Path Selection │ │ +│ │ ┌────────┐ ┌────────┐ │ │ +│ │ │ IPv4 │ │ IPv6 │ │ │ +│ │ │primary │ │primary │ │ │ +│ │ └────────┘ └────────┘ │ │ +│ │ ┌────────┐ ┌────────┐ │ │ +│ │ │ Relay │ │Custom │ │ │ +│ │ │backup │ │primary │ │ │ +│ │ └────────┘ └────────┘ │ │ +│ └──────────────────────────────────┘ │ +│ │ +│ ┌──────────────────────────────────┐ │ +│ │ Mapped Addresses │ │ +│ │ EndpointId → MappedIPv6Addr │ │ +│ │ (RelayUrl, EndpointId) → Addr │ │ +│ │ CustomAddr → MappedIPv6Addr │ │ +│ └──────────────────────────────────┘ │ +│ │ +│ Messages: │ +│ ├─ ResolveRemote(EndpointAddr, reply) │ +│ ├─ AddConnection(EndpointId, WeakConn, reply)│ +│ └─ RemoteInfo(reply) │ +└───────────────────────────────────────────────┘ +``` + +## Data Flow: Socket Actor + +The `Actor` in `Socket` runs as a background task handling network changes: + +``` +┌────────────────────────────────────────────────────────────┐ +│ Socket Actor │ +│ │ +│ ┌──────────────────┐ ┌─────────────────┐ │ +│ │ Network Monitor │ │ Direct Addr │ │ +│ │ (netwatch) │ │ Update State │ │ +│ │ │ │ │ │ +│ │ Detects: │ │ Manages: │ │ +│ │ - Interface up/down│ │ - NetReport runs │ │ +│ │ - Address changes │ │ - Port mapper │ │ +│ │ - Route changes │ │ - Direct addrs │ │ +│ └────────┬─────────┘ └────────┬──────────┘ │ +│ │ │ │ +│ ▼ ▼ │ +│ ┌──────────────────────────────────────────────┐ │ +│ │ Triggers │ │ +│ │ - NetworkChange (major/minor) │ │ +│ │ - PeriodicReStun (every 30s-5min) │ │ +│ │ - PortmapUpdated │ │ +│ │ - RelayMapChange │ │ +│ │ - DirectAddrRefresh │ │ +│ │ - ResolveRemote (from connect) │ │ +│ │ - AddConnection (from new QUIC conn) │ │ +│ └──────────────────────────────────────────────┘ │ +│ │ +│ On address change: │ +│ ┌──────────────────────────────────────────────┐ │ +│ │ 1. Run net_report to discover external addrs │ │ +│ │ 2. Update direct_addrs watchable │ │ +│ │ 3. Publish new addresses to AddressLookup │ │ +│ │ 4. Notify noq of network changes │ │ +│ └──────────────────────────────────────────────┘ │ +└────────────────────────────────────────────────────────────┘ +``` + +## Shutdown Sequence + +``` +Endpoint::close() + │ + ├─ Cancel at_close_start token + │ (stops net_reports, address lookups) + │ + ├─ Clear address_lookup services + │ + ├─ noq_endpoint.close(0, b"") + │ (refuses new connections, starts close for existing) + │ + ├─ noq_endpoint.wait_idle().await + │ (waits for close frames to be acknowledged) + │ + ├─ Cancel at_endpoint_closed token + │ + ├─ Wait for actor task (100ms timeout, then abort) + │ + └─ runtime.shutdown().await + (waits for all spawned tasks) +``` + +## WASM/Browser Differences + +When compiled to `wasm32-unknown-unknown`: + +| Feature | Native | WASM/Browser | +|---------|--------|-------------| +| IP transports | Yes (IPv4 + IPv6) | No (no socket access) | +| DNS resolution | `DnsAddressLookup` (system DNS) | `PkarrResolver` (HTTP) | +| Network monitoring | `netwatch` (interface changes) | Not available | +| Port mapping | UPnP/PCP/NAT-PMP | Not available | +| Net report | Full (QAD, HTTPS probes) | Limited | +| Runtime | Tokio | `wasm-bindgen-futures` | +| Timer | Tokio timer | `web::Timer` wrapping `sleep_until` | + +## Thread Safety & Concurrency + +- `Endpoint` is `Clone` (wraps `Arc`) +- `Socket` is `Arc` — shared across all connections +- `RemoteMap` uses `ConcurrentReadMap` — lock-free reads for hot path +- `AddressLookupServices` uses `RwLock` — infrequent writes, frequent reads +- `DirectAddrs` uses `Watchable` — publishes changes to watchers +- `HomeRelayWatch` uses `n0_watcher::Direct` — efficient change notification + +## Error Handling Patterns + +Iroh uses the `n0_error::stack_error` macro for rich error chains: + +```rust +#[stack_error(derive, add_meta, from_sources)] +pub enum ConnectError { + #[error(transparent)] + Connect { source: ConnectWithOptsError }, + #[error(transparent)] + Connecting { source: ConnectingError }, + #[error(transparent)] + Connection { source: ConnectionError }, +} + +// Usage: +// ConnectError::Connect { source: ConnectWithOptsError::SelfConnect } +// ConnectError::Connecting { source: ConnectingError::AuthenticationError { .. } } +``` + +## Key Constants & Timeouts + +| Constant | Value | Purpose | +|----------|-------|---------| +| `HEARTBEAT_INTERVAL` | 5s | Keepalive PING interval | +| `PATH_MAX_IDLE_TIMEOUT` | 15s | Max idle before closing direct path | +| `RELAY_PATH_MAX_IDLE_TIMEOUT` | 30s | Max idle before closing relay path | +| `MAX_MULTIPATH_PATHS` | 12 | Max concurrent paths per connection | +| `DEFAULT_MAX_TLS_TICKETS` | 256 (8×32) | TLS session ticket cache size | +| `NET_REPORT_TIMEOUT` | 10s | Max time for net report | +| `FULL_REPORT_INTERVAL` | 5min | Time between full net reports | +| `DEFAULT_RELAY_QUIC_PORT` | 3478 | QAD port on relay servers | \ No newline at end of file diff --git a/docs/research/references/iroh/irpc/01-overview-and-architecture.md b/docs/research/references/iroh/irpc/01-overview-and-architecture.md new file mode 100644 index 0000000..b97d29c --- /dev/null +++ b/docs/research/references/iroh/irpc/01-overview-and-architecture.md @@ -0,0 +1,108 @@ +# irpc: Overview and Architecture + +## What is irpc? + +`irpc` is a **streaming RPC system** built for [iroh](https://docs.rs/iroh) and [noq](https://docs.rs/noq) (QUIC-based transports). It provides a framework for defining RPC protocols in Rust that work identically whether the communication is **in-process** (via tokio channels) or **cross-process/cross-network** (via QUIC streams). + +**Key design goals:** + +1. **Zero-overhead local use** — When used in-process, irpc should be as lightweight as raw tokio channels, replacing the common pattern of a giant `enum` over an `mpsc` channel with typed backchannels. +2. **Transparent local/remote abstraction** — The same protocol definition and client API works for both in-process and remote communication. +3. **Streaming-first** — Full support for unary RPC, server streaming, client streaming, and bidirectional streaming interaction patterns. +4. **QUIC-native** — Does not abstract over stream types; directly uses noq/iroh QUIC streams, enabling per-request stream tuning (priorities, etc.). + +**Non-goals:** + +- Cross-language interop (Rust-to-Rust only) +- Versioning (users must handle this themselves) +- Making remote calls look like local async function calls +- Runtime agnosticism (tokio only) + +## Crate Structure + +``` +irpc/ +├── src/lib.rs # Core library: traits, channels, Client, RPC module +├── src/util.rs # Varint utilities, noq endpoint setup helpers +├── src/tests.rs # Channel filter/map tests +├── irpc-derive/ # Procedural macro crate (rpc_requests) +├── irpc-iroh/ # Iroh transport integration +├── examples/ # Working examples (storage, compute, derive, local) +└── tests/ # Integration tests (channels, derive) +``` + +### Features + +| Feature | Default | Purpose | +|---|---|---| +| `rpc` | ✅ | Enables remote RPC (noq transport, postcard serialization) | +| `derive` | ✅ | Enables the `#[rpc_requests]` macro | +| `spans` | ✅ | Preserves tracing spans across message passing | +| `stream` | ✅ | Enables `into_stream()` on mpsc receivers | +| `noq_endpoint_setup` | ✅ | Utilities to create noq endpoints (testing, localhost) | +| `varint-util` | ❌ | Varint read/write utilities without full RPC | + +## High-Level Architecture + +``` +┌─────────────────────────────────────────────────────────┐ +│ Application │ +│ │ +│ ┌──────────┐ ┌───────────┐ ┌───────────┐ │ +│ │ Client │─────│ Protocol │─────│ Actor/ │ │ +│ │ │ │ Enum (S) │ │ Handler │ │ +│ └────┬─────┘ └───────────┘ └─────┬─────┘ │ +│ │ │ │ +│ ┌────▼─────────────────────────────────────▼─────┐ │ +│ │ WithChannels │ │ +│ │ ┌────────┐ ┌────────┐ ┌────────┐ ┌─────┐ │ │ +│ │ │ inner │ │ tx │ │ rx │ │span │ │ │ +│ │ │ (I) │ │(Sender)│ │(Recv) │ │ │ │ │ +│ │ └────────┘ └────────┘ └────────┘ └─────┘ │ │ +│ └────────────────────────────────────────────────┘ │ +│ │ +│ ┌────────────────────┐ ┌─────────────────────────┐ │ +│ │ Local Path │ │ Remote Path (rpc feat) │ │ +│ │ tokio::mpsc │ │ noq QUIC streams │ │ +│ │ tokio::oneshot │ │ postcard serialization │ │ +│ └────────────────────┘ └─────────────────────────┘ │ +└─────────────────────────────────────────────────────────┘ +``` + +### Core Flow + +1. **Define a protocol** — An enum where each variant represents an RPC method, annotated with `#[rpc(tx=..., rx=...)]`. +2. **The `rpc_requests` macro** generates: + - `Channels` impl for each request type + - A message enum wrapping each request in `WithChannels` + - `Service` and `RemoteService` trait implementations + - `From` conversions between request types, protocol enum, and message enum +3. **Client sends messages** — `Client` either sends over a local `mpsc` channel or serializes and sends over a QUIC stream. +4. **Actor/handler processes messages** — Matches on the message enum, extracts `WithChannels { inner, tx, rx, .. }`, and uses `tx`/`rx` to communicate back. + +## Dependency Graph + +``` +irpc (core) + ├── serde (always) + ├── tokio (sync, macros) + ├── tokio-util + ├── n0-error + ├── n0-future + ├── postcard (rpc feature) + ├── noq (rpc feature) + ├── smallvec (rpc feature) + ├── tracing (spans feature) + └── irpc-derive (derive feature) + +irpc-iroh + ├── irpc + ├── iroh + ├── iroh-base + ├── postcard + └── n0-error, n0-future, tokio, tracing, serde +``` + +## License + +Dual-licensed: Apache-2.0 OR MIT \ No newline at end of file diff --git a/docs/research/references/iroh/irpc/02-types-and-traits.md b/docs/research/references/iroh/irpc/02-types-and-traits.md new file mode 100644 index 0000000..97a594f --- /dev/null +++ b/docs/research/references/iroh/irpc/02-types-and-traits.md @@ -0,0 +1,239 @@ +# irpc: Key Types and Traits + +## Core Traits + +### `RpcMessage` + +```rust +pub trait RpcMessage: Debug + Serialize + DeserializeOwned + Send + Sync + Unpin + 'static {} +``` + +A blanket trait implemented for all types that satisfy the bounds. Every message sent through irpc (both local and remote) must implement this. The `Serialize + DeserializeOwned` requirement exists even without the `rpc` feature because the same protocol definition should work in both modes. + +### `Service` + +```rust +pub trait Service: Serialize + DeserializeOwned + Send + Sync + Debug + 'static { + type Message: Send + Unpin + 'static; +} +``` + +Implemented on the **protocol enum** (e.g., `StorageProtocol`). The `Message` associated type is the **message enum** — an enum with identical variant names but whose single field is `WithChannels`. + +The `Service` trait acts as a **scope** for channel type definitions, allowing the same inner request type to be used with multiple services. + +### `Channels` + +```rust +pub trait Channels: Send + 'static { + type Tx: Sender; + type Rx: Receiver; +} +``` + +Implemented on each **request type** (e.g., `Get`, `Set`). Specifies what kind of channels accompany that request when sent through service `S`. The `Tx` type is the response channel (server → client); the `Rx` type is the update channel (client → server). + +### `Sender` and `Receiver` + +```rust +pub trait Sender: Debug + Sealed {} +pub trait Receiver: Debug + Sealed {} +``` + +Sealed marker traits. Only the types in `irpc::channel` implement these: `oneshot::Sender`, `oneshot::Receiver`, `mpsc::Sender`, `mpsc::Receiver`, `NoSender`, `NoReceiver`. + +### `RemoteService` (rpc feature) + +```rust +pub trait RemoteService: Service + Sized { + fn with_remote_channels(self, rx: noq::RecvStream, tx: noq::SendStream) -> Self::Message; + + fn remote_handler(local_sender: LocalSender) -> Handler { + // Default: convert deserialized protocol enum + streams → Message, send to local sender + } +} +``` + +Implemented on the protocol enum. Maps a deserialized protocol variant + a pair of QUIC streams into a `WithChannels` message, which is then forwarded to the local actor. + +### `RemoteConnection` (rpc feature) + +```rust +pub trait RemoteConnection: Send + Sync + Debug + 'static { + fn clone_boxed(&self) -> Box; + fn open_bi(&self) -> BoxFuture>; + fn zero_rtt_accepted(&self) -> BoxFuture; +} +``` + +Abstraction over how to open a bidirectional QUIC stream. Implemented for: +- `noq::Connection` — direct noq connection +- `NoqLazyRemoteConnection` — lazy connection that caches the underlying QUIC connection +- `IrohRemoteConnection` — iroh connection (in `irpc-iroh`) +- `IrohLazyRemoteConnection` — lazy iroh connection (in `irpc-iroh`) +- `IrohZrttRemoteConnection` — 0-RTT iroh connection (in `irpc-iroh`) + +## Key Structs + +### `WithChannels` + +```rust +pub struct WithChannels, S: Service> { + pub inner: I, + pub tx: >::Tx, + pub rx: >::Rx, + #[cfg(feature = "spans")] + pub span: tracing::Span, +} +``` + +The central message wrapper. Wraps a request type `I` with its typed channels for service `S`. Implements `Deref` to `I` for convenient field access. + +**Construction** via tuple conversions: +- `(inner, tx, rx)` → full channels +- `(inner, tx)` → when `Rx = NoReceiver` (most common for RPC/server-streaming) +- `(inner,)` → when `Tx = NoSender, Rx = NoReceiver` (notify) + +### `Client` + +```rust +#[derive(Debug)] +pub struct Client(ClientInner, PhantomData); +``` + +The primary client type. Generic over a service `S`. Can be either local or remote. + +**Construction:** +- `Client::local(mpsc_sender)` — from a tokio mpsc sender +- `Client::noq(endpoint, addr)` — from a noq endpoint + address (rpc feature) +- `Client::boxed(remote_connection)` — from any `RemoteConnection` impl + +**Key methods** (all handle both local and remote transparently): + +| Method | Pattern | Tx Type | Rx Type | +|---|---|---|---| +| `rpc()` | Unary RPC | `oneshot::Sender` | `NoReceiver` | +| `server_streaming()` | Server streaming | `mpsc::Sender` | `NoReceiver` | +| `client_streaming()` | Client streaming | `oneshot::Sender` | `mpsc::Receiver` | +| `bidi_streaming()` | Bidirectional | `mpsc::Sender` | `mpsc::Receiver` | +| `notify()` | Fire-and-forget | `NoSender` | `NoReceiver` | +| `rpc_0rtt()` | 0-RTT unary | `oneshot::Sender` | `NoReceiver` | +| `server_streaming_0rtt()` | 0-RTT server streaming | `mpsc::Sender` | `NoReceiver` | +| `notify_0rtt()` | 0-RTT fire-and-forget | `NoSender` | `NoReceiver` | + +Each method creates the appropriate channel pair, wraps the message into `WithChannels`, and sends it. + +### `LocalSender` + +```rust +#[repr(transparent)] +pub struct LocalSender(crate::channel::mpsc::Sender); +``` + +A thin wrapper around `mpsc::Sender` for sending messages to a local actor. Provides: + +```rust +impl LocalSender { + pub fn send(&self, value: impl Into>) -> impl Future> + where + T: Channels, + S::Message: From>; + + pub fn send_raw(&self, value: S::Message) -> impl Future>; +} +``` + +### `Request` + +```rust +pub enum Request { + Local(L), + Remote(R), +} +``` + +A generic enum distinguishing local vs remote requests. `Client::request()` returns `Request, RemoteSender>`. + +### `RemoteSender` (rpc feature) + +```rust +pub struct RemoteSender(noq::SendStream, noq::RecvStream, PhantomData); +``` + +Holds a QUIC stream pair after opening a bidirectional stream. The `write()` method serializes the protocol message with postcard + varint length prefix and sends it over the send stream. + +### `Handler` (rpc feature) + +```rust +pub type Handler = Arc< + dyn Fn(R, noq::RecvStream, noq::SendStream) -> BoxFuture> + + Send + Sync + 'static, +>; +``` + +A shared handler function that processes incoming remote requests. Typically created via `Protocol::remote_handler(local_sender)`. + +## Error Types + +### `RequestError` + +```rust +pub enum RequestError { + Connect { source: noq::ConnectError }, // Connection establishment failed + Connection { source: noq::ConnectionError }, // Stream open failed + Other { source: AnyError }, // Generic error for non-noq transports +} +``` + +### `SendError` (in `channel` module) + +```rust +pub enum SendError { + ReceiverClosed, // Local: receiver dropped + MaxMessageSizeExceeded, // Remote: message > 16 MiB + Io { source: io::Error }, // Remote: network/serialization error +} +``` + +### `RecvError` (oneshot and mpsc variants) + +```rust +// oneshot::RecvError +pub enum RecvError { + SenderClosed, // Local: sender dropped + MaxMessageSizeExceeded, // Remote: message > 16 MiB + Io { source: io::Error }, // Remote: network/deserialization error +} + +// mpsc::RecvError +pub enum RecvError { + MaxMessageSizeExceeded, // Remote: message > 16 MiB + Io { source: io::Error }, // Remote: network/deserialization error +} +``` + +Note: `mpsc::RecvError` does **not** have `SenderClosed` — mpsc receivers return `Ok(None)` when the sender is dropped. + +### `WriteError` (rpc feature) + +```rust +pub enum WriteError { + Noq { source: noq::WriteError }, // QUIC stream write error + MaxMessageSizeExceeded, // Message > 16 MiB + Io { source: io::Error }, // Serialization error +} +``` + +### `Error` (top-level umbrella) + +```rust +pub enum Error { + Request { source: RequestError }, + Send { source: SendError }, + MpscRecv { source: mpsc::RecvError }, + OneshotRecv { source: oneshot::RecvError }, + Write { source: rpc::WriteError }, // rpc feature only +} +``` + +All error types implement `From` for `io::Error`, allowing integration with `?` in `io::Result` contexts. \ No newline at end of file diff --git a/docs/research/references/iroh/irpc/03-channel-system.md b/docs/research/references/iroh/irpc/03-channel-system.md new file mode 100644 index 0000000..fcccf7b --- /dev/null +++ b/docs/research/references/iroh/irpc/03-channel-system.md @@ -0,0 +1,168 @@ +# irpc: Channel System + +The channel system is the heart of irpc. It provides channel types that abstract over local (tokio) and remote (QUIC stream) communication, with the same API surface regardless of transport. + +## Channel Kinds + +irpc provides three kinds of channels, each with local and remote variants: + +### Oneshot Channels (`channel::oneshot`) + +Single-value, single-use channels for RPC responses. + +| Type | Local Backend | Remote Backend | +|---|---|---| +| `oneshot::Sender` | `tokio::sync::oneshot::Sender` | `BoxedSender` (FnOnce over QUIC write) | +| `oneshot::Receiver` | `FusedOneshotReceiver` | `BoxedReceiver` (boxed future over QUIC read) | + +**Creation:** `oneshot::channel::()` returns `(Sender, Receiver)` + +**Sender behavior:** +- Local: `send(value)` is synchronous-ish, fails only if receiver dropped +- Remote: `send(value)` is async — serializes with postcard, length-prefixes with varint, writes to QUIC stream + +**Receiver behavior:** +- Implements `Future>` +- Local: resolves to the value or `SenderClosed` error +- Remote: reads varint length prefix, reads that many bytes, deserializes with postcard + +**Filtering/Mapping** (on `Sender` where `T: Send + Sync + 'static`): +```rust +sender.with_filter(|v| v > 0) // Drop messages failing predicate +sender.with_map(|v: U| v.into()) // Transform before sending +sender.with_filter_map(|v| ...) // Combined filter + map +``` + +### MPSC Channels (`channel::mpsc`) + +Multi-producer, single-consumer streaming channels for server-streaming, client-streaming, and bidirectional patterns. + +| Type | Local Backend | Remote Backend | +|---|---|---| +| `mpsc::Sender` | `tokio::sync::mpsc::Sender` | `Arc>` (NoqSender) | +| `mpsc::Receiver` | `tokio::sync::mpsc::Receiver` | `Box>` (NoqReceiver) | + +**Creation:** `mpsc::channel::(buffer)` returns `(Sender, Receiver)` + +**Sender behavior:** +- `send(value).await` — sends, yielding if full (remote: serializes + writes to stream) +- `try_send(value).await` — non-blocking attempt; returns `Ok(false)` if would block +- `closed().await` — waits until all receivers are dropped +- `is_rpc()` — returns `true` for remote senders + +**Receiver behavior:** +- `recv().await` → `Result, RecvError>` — `None` means sender closed/cleanly finished +- `filter(pred)`, `map(fn)`, `filter_map(fn)` — chainable transformations +- `into_stream()` (with `stream` feature) — converts to `Stream>` + +**Cloning:** `mpsc::Sender` implements `Clone`. Local senders clone the underlying tokio sender; remote senders clone the `Arc`. + +### None Channels (`channel::none`) + +Placeholder channels for when no communication is needed. + +```rust +pub struct NoSender; // Implements Sender, does nothing +pub struct NoReceiver; // Implements Receiver, does nothing +``` + +Used as defaults when `#[rpc(tx=...)]` or `#[rpc(rx=...)]` are omitted. + +## Remote Channel Internals + +### NoqSender + +```rust +struct NoqSender(tokio::sync::Mutex>); + +enum NoqSenderState { + Open(NoqSenderInner), + Closed, +} + +struct NoqSenderInner { + send: noq::SendStream, + buffer: SmallVec<[u8; 128]>, // Stack-allocated buffer for small messages + _marker: PhantomData, +} +``` + +Key behaviors: +- **Mutex-protected state**: The inner state is `Mutex`-protected because `DynSender::send()` takes `&self`. When a send fails, the state transitions to `Closed` and all subsequent sends return `BrokenPipe`. +- **Buffer reuse**: Uses `SmallVec<[u8; 128]>` to avoid heap allocation for messages that serialize to ≤128 bytes. +- **Serialization**: Each message is postcard-serialized with a varint length prefix. If serialization exceeds `MAX_MESSAGE_SIZE` (16 MiB), the stream is reset with error code `ERROR_CODE_MAX_MESSAGE_SIZE_EXCEEDED` (1). +- **Serialization errors**: If postcard serialization fails, the stream is reset with `ERROR_CODE_INVALID_POSTCARD` (2). + +### NoqReceiver + +```rust +struct NoqReceiver { + recv: noq::RecvStream, + _marker: PhantomData, +} +``` + +Reads a varint length prefix, allocates a buffer of that size, reads the data, and deserializes with postcard. If the length exceeds `MAX_MESSAGE_SIZE`, stops the stream with the appropriate error code. + +### Oneshot Remote Sender + +For `oneshot::Sender` over QUIC, the sender is a `BoxedSender` — a `Box BoxFuture>>`. This captures the `noq::SendStream` and on invocation: +1. Computes `postcard::experimental::serialized_size(&value)` +2. Checks against `MAX_MESSAGE_SIZE` +3. Writes length-prefixed postcard data to the stream + +### Oneshot Remote Receiver + +For `oneshot::Receiver` over QUIC, the receiver is constructed from a `noq::RecvStream`: +1. Reads a varint length prefix +2. Reads that many bytes +3. Deserializes with postcard +4. Returns the value + +## Channel Conversion Table + +When a QUIC stream pair `(SendStream, RecvStream)` is received for a request: + +| Channel Kind | `Tx` (SendStream →) | `Rx` (RecvStream →) | +|---|---|---| +| `oneshot::Sender` | Serialize + write, then finish | Read length-prefixed data | +| `mpsc::Sender` | Repeatedly serialize + write | N/A | +| `oneshot::Receiver` | N/A | Read single length-prefixed value | +| `mpsc::Receiver` | N/A | Repeatedly read length-prefixed values | +| `NoSender` | Drop the stream | N/A | +| `NoReceiver` | N/A | Drop the stream | + +The `From` and `From` impls handle these conversions automatically based on the target type. + +## DynSender and DynReceiver Traits + +The `mpsc` module exposes traits for dynamic dispatch: + +```rust +pub trait DynSender: Debug + Send + Sync + 'static { + fn send(&self, value: T) -> Pin> + Send + '_>>; + fn try_send(&self, value: T) -> Pin> + Send + '_>>; + fn closed(&self) -> Pin + Send + Sync + '_>>; + fn is_rpc(&self) -> bool; +} + +pub trait DynReceiver: Debug + Send + Sync + 'static { + fn recv(&mut self) -> Pin, RecvError>> + Send + Sync + '_>>; +} +``` + +These enable boxing of remote senders/receivers while keeping the local variants unboxed for zero overhead. + +## FusedOneshotReceiver + +A thin wrapper around `tokio::sync::oneshot::Receiver` that prevents panics when polling an already-completed receiver. It tracks completion state and returns `Poll::Pending` indefinitely after resolution, matching the `FusedFuture` pattern. + +## Cancellation Safety + +For remote `mpsc::Sender`: +- If a `send()` future is dropped before completion, the underlying QUIC stream is closed. +- All clones of the sender will receive `SendError::Io(BrokenPipe)` on subsequent send attempts. +- This is documented behavior: **always poll send futures to completion if you want to reuse the sender**. + +For remote `oneshot::Sender`: +- Since it's `FnOnce`, dropping the future before sending simply means the value is never sent. The receiver will get `SenderClosed`. \ No newline at end of file diff --git a/docs/research/references/iroh/irpc/04-protocol-and-message-flow.md b/docs/research/references/iroh/irpc/04-protocol-and-message-flow.md new file mode 100644 index 0000000..6bfda73 --- /dev/null +++ b/docs/research/references/iroh/irpc/04-protocol-and-message-flow.md @@ -0,0 +1,272 @@ +# irpc: Protocol and Message Flow + +## Wire Protocol + +When the `rpc` feature is enabled, irpc uses the following wire format over QUIC streams: + +### Message Framing + +Every message on the wire is **length-prefixed using postcard varints** (LEB128 encoding): + +``` +┌─────────────────┬──────────────────────┐ +│ varint length │ postcard-serialized │ +│ (1-10 bytes) │ message data │ +└─────────────────┴──────────────────────┘ +``` + +- **Length prefix**: LEB128 varint encoding of `u64` length. Each byte uses 7 bits for the value and the MSB as a continuation bit. Maximum 10 bytes for a full `u64`. +- **Payload**: Postcard-encoded (compact, no-schema serde format) Rust message. + +### Maximum Message Size + +`MAX_MESSAGE_SIZE = 16 MiB (16 * 1024 * 1024)` + +Messages exceeding this limit are rejected: +- **Send side**: The sender checks `postcard::experimental::serialized_size()` before sending. If exceeded, the stream is reset with error code `1` (`ERROR_CODE_MAX_MESSAGE_SIZE_EXCEEDED`). +- **Receive side**: After reading the varint length, if it exceeds `MAX_MESSAGE_SIZE`, the stream is stopped with error code `1`. + +### Error Codes + +| Code | Constant | Meaning | +|---|---|---| +| `1` | `ERROR_CODE_MAX_MESSAGE_SIZE_EXCEEDED` | Message larger than 16 MiB | +| `2` | `ERROR_CODE_INVALID_POSTCARD` | Postcard serialization failed | + +These are used as QUIC stream reset/stop error codes. + +### Connection Closure + +Error code `0` on the QUIC connection means "clean close" — the remote side intentionally shut down. This is distinguished from actual errors. + +## Message Flow: Local Path + +``` +Client Actor + │ │ + │ Client::rpc(Get { key: "x" }) │ + │ │ + │ 1. Create oneshot channel pair │ + │ (tx, rx) = oneshot::channel() │ + │ │ + │ 2. Wrap into WithChannels │ + │ WithChannels { │ + │ inner: Get { key: "x" }, │ + │ tx: oneshot::Sender, │ + │ rx: NoReceiver, │ + │ span: current_span, │ + │ } │ + │ │ + │ 3. Convert to Message enum │ + │ StorageMessage::Get(wc) │ + │ │ + │ 4. Send over mpsc channel ────────►│ + │ │ + │ 5. Await on oneshot receiver │ + │ rx.await ◄─────────────────────│ + │ tx.send(res)│ + │ │ + │ Result: res │ +``` + +For bidirectional streaming: +``` +Client Actor + │ │ + │ Client::bidi_streaming(Sum, 4, 4) │ + │ │ + │ 1. Create channel pairs │ + │ (update_tx, update_rx) │ + │ (res_tx, res_rx) │ + │ │ + │ 2. WithChannels { │ + │ inner: Sum, │ + │ tx: mpsc::Sender, │ + │ rx: mpsc::Receiver, │ + │ } │ + │ │ + │ 3. Send message ──────────────────►│ + │ │ + │ 4. Use update_tx.send(val) ───────►│ + │ Use res_rx.recv() ◄─────────│ + │ res_tx.send(val) + │ │ +``` + +## Message Flow: Remote Path + +``` +Client Server + │ │ + │ Client::rpc(Get { key: "x" }) │ + │ │ + │ 1. open_bi() → (SendStream, RecvStream) + │ │ + │ 2. Serialize StorageProtocol::Get(Get { key: "x" }) + │ with postcard + varint prefix │ + │ │ + │ 3. Write to SendStream ───────────►│ + │ │ + │ │ 4. Accept bi stream + │ │ 5. Read varint + deserialize + │ │ 6. RemoteService::with_remote_channels() + │ │ → WithChannels { inner, tx, rx } + │ │ 7. Forward to local actor + │ │ + │ │ Actor processes, sends response + │ │ on the SendStream (which is the + │ │ oneshot::Sender backed by QUIC) + │ │ + │ 8. Read from RecvStream ◄──────────│ + │ 9. Deserialize response │ + │ │ + │ Result: res │ +``` + +For bidirectional streaming over remote: +``` +Client Server + │ │ + │ Client::bidi_streaming(Sum, 4, 4) │ + │ │ + │ open_bi() → (SendStream, RecvStream) + │ │ + │ SendStream → mpsc::Sender │ RecvStream → mpsc::Receiver + │ RecvStream → oneshot::Receiver│ SendStream → oneshot::Sender + │ (or mpsc::Receiver for │ + │ server-streaming with mpsc tx) │ + │ │ + │ The initial message is sent on │ + │ SendStream with varint prefix. │ + │ │ + │ Subsequent updates are sent on │ + │ the same SendStream as varint- │ + │ prefixed postcard messages. │ + │ │ + │ The response stream is read from │ + │ the RecvStream as varint-prefixed │ + │ postcard messages. │ +``` + +## Stream Direction Convention + +In irpc's QUIC stream model: +- **Client opens** a bidirectional stream (`open_bi()`) +- **SendStream** (client → server): carries the initial request message, plus any client-streaming updates +- **RecvStream** (server → client): carries the response(s) from the server + +The `RemoteService::with_remote_channels()` method decides how to map streams to channels: + +```rust +// For a simple RPC (tx=oneshot, rx=none): +fn with_remote_channels(self, rx: RecvStream, tx: SendStream) -> Self::Message { + // rx stream is unused (NoReceiver), tx carries response + WithChannels::from((msg, tx.into(), rx.into())) + // tx → oneshot::Sender (or mpsc::Sender) + // rx → NoReceiver +} +``` + +Wait — looking at the actual implementation more carefully: + +The `RemoteService::with_remote_channels` method takes `(self, rx: RecvStream, tx: SendStream)` where: +- `rx` = the `RecvStream` from the bidirectional stream (client reads from this) +- `tx` = the `SendStream` from the bidirectional stream (client writes to this) + +But for the **server side**, the `RecvStream` is what the server reads from (client updates), and `SendStream` is what the server writes to (server responses). + +In the `with_remote_channels` generated code: +```rust +// For rpc(tx=oneshot::Sender, rx=mpsc::Receiver): +WithChannels::from((msg, tx.into(), rx.into())) +// tx (SendStream) → oneshot::Sender — server writes response +// rx (RecvStream) → mpsc::Receiver — server reads client updates +``` + +So the naming in `with_remote_channels` is from the **server's perspective**: +- `rx` parameter = RecvStream = what server receives (client → server updates) +- `tx` parameter = SendStream = what server sends (server → client responses) + +## Connection Management + +### NoqLazyRemoteConnection + +```rust +struct NoqLazyRemoteConnection(Arc); + +struct NoqLazyRemoteConnectionInner { + endpoint: noq::Endpoint, + addr: SocketAddr, + connection: Mutex>, +} +``` + +- Lazily establishes connection on first use +- Caches the `noq::Connection` inside a `Mutex>` +- On `open_bi()`: if cached connection exists, tries to reuse it; if it fails, clears cache and reconnects once +- Thread-safe via `Arc` + `Mutex` + +### IrohLazyRemoteConnection (irpc-iroh) + +Same pattern but for iroh endpoints, with an additional `alpn` field for protocol identification. + +### 0-RTT Support + +irpc supports QUIC 0-RTT for reduced latency on reconnections: + +- `Client::rpc_0rtt()` — sends request immediately with 0-RTT data; if the server rejects 0-RTT, re-sends +- `Client::server_streaming_0rtt()` — same for server-streaming +- `Client::notify_0rtt()` — same for fire-and-forget + +The 0-RTT flow: +1. Client serializes the message into a buffer (`prepare_write()`) +2. Sends the buffer over a 0-RTT connection +3. Awaits `zero_rtt_accepted()` to check if 0-RTT was accepted +4. If not accepted, opens a new connection and re-sends the same buffer + +`RemoteConnection::zero_rtt_accepted()` returns `true` for regular connections and for lazy connections. For `IrohZrttRemoteConnection`, it checks the actual 0-RTT status via `handshake_completed()`. + +## Server-Side: Accepting Connections + +### Using noq (direct QUIC) + +```rust +irpc::rpc::listen(endpoint, handler) +``` + +This function: +1. Loops on `endpoint.accept()` to accept incoming connections +2. For each connection, spawns a task running `handle_connection()` +3. `handle_connection()` loops on `read_request_raw()` to read requests from bidirectional streams +4. Each request is deserialized and passed to the `Handler` + +### Using iroh + +```rust +IrohProtocol::with_sender(local_sender) +``` + +This creates a `ProtocolHandler` that can be registered with `iroh::protocol::Router`. When a connection arrives, it calls `handle_connection()` from irpc-iroh, which handles the protocol handshake and reads requests. + +For 0-RTT support: +```rust +Iroh0RttProtocol::with_sender(local_sender) +``` + +This implements `ProtocolHandler::on_accepting()` to handle 0-RTT connections. + +### Handler Function + +```rust +type Handler = Arc< + dyn Fn(R, noq::RecvStream, noq::SendStream) -> BoxFuture> + + Send + Sync + 'static, +>; +``` + +The handler receives: +1. The deserialized protocol message (`R`) +2. The `RecvStream` (for client → server updates) +3. The `SendStream` (for server → client responses) + +Typically created via `Protocol::remote_handler(local_sender)`, which converts streams to typed channels and forwards the `WithChannels` message to a local actor. \ No newline at end of file diff --git a/docs/research/references/iroh/irpc/05-rpc-requests-macro.md b/docs/research/references/iroh/irpc/05-rpc-requests-macro.md new file mode 100644 index 0000000..9457cd8 --- /dev/null +++ b/docs/research/references/iroh/irpc/05-rpc-requests-macro.md @@ -0,0 +1,278 @@ +# irpc: The rpc_requests Macro + +The `#[rpc_requests]` attribute macro is the primary way to define an irpc protocol. It generates the boilerplate for channel typing, message wrapping, and service trait implementations. + +## Basic Usage + +```rust +use irpc::{channel::{mpsc, oneshot}, rpc_requests, Client, WithChannels}; +use serde::{Deserialize, Serialize}; + +#[rpc_requests(message = ComputeMessage)] +#[derive(Debug, Serialize, Deserialize)] +enum ComputeProtocol { + /// Unary RPC: one request, one response + #[rpc(tx=oneshot::Sender)] + #[wrap(Multiply)] + Multiply(i64, i64), + + /// Bidirectional streaming + #[rpc(tx=mpsc::Sender, rx=mpsc::Receiver)] + #[wrap(Sum)] + Sum, +} +``` + +This single macro invocation generates: + +1. **Wrapper structs** (from `#[wrap]`): `Multiply` and `Sum` struct types +2. **`Channels` impls**: For each variant's inner type, specifying `Tx` and `Rx` +3. **`Service` impl**: `impl Service for ComputeProtocol { type Message = ComputeMessage; }` +4. **`RemoteService` impl** (rpc feature): Maps protocol variants + QUIC streams to messages +5. **`ComputeMessage` enum**: Wraps each request in `WithChannels` +6. **`From` conversions**: Between inner types, `ComputeProtocol`, and `ComputeMessage` + +## Macro Arguments + +### Top-level (on the enum) + +| Argument | Required | Description | +|---|---|---| +| `message = Name` | Recommended | Name of the generated message enum. Also generates `Service` and `RemoteService` impls. | +| `alias = "Suffix"` | Optional | Generates type aliases like `MultiplyMsg = WithChannels` | +| `rpc_feature = "feat"` | Optional | Feature-gates the `RemoteService` impl with `#[cfg(feature = "feat")]` | +| `no_rpc` | Optional | Skips generating `RemoteService` impl entirely | +| `no_spans` | Optional | Skips span-related code (for use without the `spans` feature) | + +### Per-variant + +#### `#[rpc(tx=Type, rx=Type)]` + +Specifies channel types for each request: +- `tx` — response channel type (server → client). Defaults to `NoSender`. +- `rx` — update channel type (client → server). Defaults to `NoReceiver`. + +Valid types: +- `oneshot::Sender` — single response +- `mpsc::Sender` — streaming response +- `oneshot::Receiver` — not valid as tx (use for rx pattern) +- `mpsc::Receiver` — streaming updates (client → server) +- `NoSender` / `NoReceiver` — no channel in that direction + +#### `#[wrap(TypeName, derive(Traits))]` + +Generates a struct from the variant's fields: +- `TypeName` — name of the generated struct +- Optional visibility prefix (e.g., `pub(crate) TypeName`) +- `derive(...)` — additional derive macros beyond the default `Serialize, Deserialize, Debug` + +If `#[wrap]` is not used, each variant must have exactly one unnamed field (a named type). + +## Generated Code Walkthrough + +Given this input: +```rust +#[rpc_requests(message = StoreMessage)] +#[derive(Debug, Serialize, Deserialize)] +enum StoreProtocol { + #[rpc(tx=oneshot::Sender)] + #[wrap(GetRequest, derive(Clone))] + Get(String), + + #[rpc(tx=oneshot::Sender<()>)] + #[wrap(SetRequest)] + Set { key: String, value: String }, +} +``` + +The macro generates: + +### 1. Wrapper Structs + +```rust +#[derive(Debug, Serialize, Deserialize, Clone)] +pub GetRequest(pub String); + +#[derive(Debug, Serialize, Deserialize)] +pub SetRequest { pub key: String, pub value: String } +``` + +The variants are rewritten to use these: +```rust +enum StoreProtocol { + Get(GetRequest), + Set(SetRequest), +} +``` + +### 2. Channels Implementations + +```rust +impl Channels for GetRequest { + type Tx = oneshot::Sender; + type Rx = NoReceiver; +} + +impl Channels for SetRequest { + type Tx = oneshot::Sender<()>; + type Rx = NoReceiver; +} +``` + +### 3. Message Enum + +```rust +#[doc = "Message enum for [`StoreProtocol`]"] +#[allow(missing_docs)] +#[derive(Debug)] +pub enum StoreMessage { + Get(WithChannels), + Set(WithChannels), +} +``` + +### 4. Service Implementation + +```rust +impl Service for StoreProtocol { + type Message = StoreMessage; +} +``` + +### 5. RemoteService Implementation (rpc feature) + +```rust +impl RemoteService for StoreProtocol { + fn with_remote_channels( + self, + rx: noq::RecvStream, + tx: noq::SendStream, + ) -> Self::Message { + match self { + StoreProtocol::Get(msg) => { + StoreMessage::from(WithChannels::from((msg, tx, rx))) + } + StoreProtocol::Set(msg) => { + StoreMessage::from(WithChannels::from((msg, tx, rx))) + } + } + } +} +``` + +### 6. From Conversions + +```rust +// Inner type → Protocol enum +impl From for StoreProtocol { ... } +impl From for StoreProtocol { ... } + +// WithChannels → Message enum +impl From> for StoreMessage { ... } +impl From> for StoreMessage { ... } +``` + +### 7. parent_span Method (spans feature) + +```rust +impl StoreMessage { + pub fn parent_span(&self) -> tracing::Span { + let span = match self { + StoreMessage::Get(inner) => inner.parent_span_opt(), + StoreMessage::Set(inner) => inner.parent_span_opt(), + }; + span.cloned().unwrap_or_else(|| tracing::Span::current()) + } +} +``` + +## Interaction Pattern Mapping + +The `#[rpc]` attribute maps directly to gRPC-like patterns: + +| Pattern | `tx` type | `rx` type | Example | +|---|---|---|---| +| **Unary RPC** | `oneshot::Sender` | `NoReceiver` | Get by key, return value | +| **Server streaming** | `mpsc::Sender` | `NoReceiver` | List all items | +| **Client streaming** | `oneshot::Sender` | `mpsc::Receiver` | Upload items, get count | +| **Bidirectional** | `mpsc::Sender` | `mpsc::Receiver` | Chat, live updates | +| **Notify (fire & forget)** | `NoSender` | `NoReceiver` | Log event | + +## Client Methods Generated by Patterns + +The `Client` methods correspond to channel types: + +```rust +// Unary RPC: tx=oneshot::Sender, rx=NoReceiver +client.rpc(Get { key: "x" }).await // → Result + +// Server streaming: tx=mpsc::Sender, rx=NoReceiver +client.server_streaming(List, 16).await // → Result> + +// Client streaming: tx=oneshot::Sender, rx=mpsc::Receiver +client.client_streaming(SetMany, 4).await // → Result<(mpsc::Sender, oneshot::Receiver)> + +// Bidirectional: tx=mpsc::Sender, rx=mpsc::Receiver +client.bidi_streaming(Sum, 4, 4).await // → Result<(mpsc::Sender, mpsc::Receiver)> + +// Notify: tx=NoSender, rx=NoReceiver +client.notify(Log { msg: "hi" }).await // → Result<()> +``` + +## Manual Protocol Definition (Without Macro) + +You can define protocols manually instead of using the macro: + +```rust +use irpc::{channel::{mpsc, none::NoReceiver, oneshot}, Channels, Service, WithChannels}; +use serde::{Deserialize, Serialize}; + +// 1. Define request types +#[derive(Debug, Serialize, Deserialize)] +struct Get { key: String } + +#[derive(Debug, Serialize, Deserialize)] +struct Set { key: String, value: String } + +// 2. Implement Channels for each type +impl Channels for Get { + type Tx = oneshot::Sender>; + type Rx = NoReceiver; +} + +impl Channels for Set { + type Tx = oneshot::Sender<()>; + type Rx = NoReceiver; +} + +// 3. Define protocol enum +#[derive(derive_more::From, Serialize, Deserialize, Debug)] +enum StorageProtocol { + Get(Get), + Set(Set), +} + +// 4. Define message enum +#[derive(derive_more::From)] +enum StorageMessage { + Get(WithChannels), + Set(WithChannels), +} + +// 5. Implement Service +impl Service for StorageProtocol { + type Message = StorageMessage; +} + +// 6. Implement RemoteService (rpc feature) +impl RemoteService for StorageProtocol { + fn with_remote_channels(self, rx: noq::RecvStream, tx: noq::SendStream) -> Self::Message { + match self { + StorageProtocol::Get(msg) => WithChannels::from((msg, tx, rx)).into(), + StorageProtocol::Set(msg) => WithChannels::from((msg, tx, rx)).into(), + } + } +} +``` + +This manual approach gives full control but requires more boilerplate. The macro generates all of this automatically. \ No newline at end of file diff --git a/docs/research/references/iroh/irpc/06-rpc-module-and-remote-transport.md b/docs/research/references/iroh/irpc/06-rpc-module-and-remote-transport.md new file mode 100644 index 0000000..3e74e7e --- /dev/null +++ b/docs/research/references/iroh/irpc/06-rpc-module-and-remote-transport.md @@ -0,0 +1,274 @@ +# irpc: RPC Module and Remote Transport + +The `rpc` module (enabled by the `rpc` feature) contains all cross-process RPC functionality: QUIC stream handling, connection management, serialization, and server-side request processing. + +## Module Structure + +```rust +pub mod rpc { + pub const MAX_MESSAGE_SIZE: u64 = 1024 * 1024 * 16; + pub const ERROR_CODE_MAX_MESSAGE_SIZE_EXCEEDED: u32 = 1; + pub const ERROR_CODE_INVALID_POSTCARD: u32 = 2; + + pub enum WriteError { Noq, MaxMessageSizeExceeded, Io } + pub trait RemoteConnection: Send + Sync + Debug + 'static { ... } + pub struct RemoteSender(SendStream, RecvStream, PhantomData); + pub type Handler = Arc BoxFuture> + Send + Sync>; + pub trait RemoteService: Service + Sized { ... } + pub async fn listen(endpoint, handler); + pub async fn handle_connection(connection, handler) -> io::Result<()>; + pub async fn read_request(connection) -> io::Result>; + pub async fn read_request_raw(connection) -> io::Result>; +} +``` + +## RemoteConnection Implementations + +### NoqLazyRemoteConnection + +The default remote connection for noq (QUIC-by-socket-address): + +```rust +struct NoqLazyRemoteConnection(Arc); + +struct NoqLazyRemoteConnectionInner { + endpoint: noq::Endpoint, + addr: SocketAddr, + connection: Mutex>, +} +``` + +**Behavior:** +- `open_bi()`: + 1. Locks the `Mutex>` + 2. If a cached connection exists, tries `conn.open_bi()` + 3. If that fails, clears the cache and establishes a new connection + 4. If no cached connection, establishes a new one + 5. Returns `(SendStream, RecvStream)` pair +- `zero_rtt_accepted()`: Always returns `true` (noq doesn't have 0-RTT concept in this context) +- `clone_boxed()`: Clones the `Arc`, sharing the same connection cache + +### Direct noq::Connection + +```rust +impl RemoteConnection for noq::Connection { + fn open_bi(&self) -> BoxFuture> { + // Directly opens a bidirectional stream on the connection + } + fn zero_rtt_accepted(&self) -> BoxFuture { Box::pin(async { true }) } +} +``` + +## RemoteSender + +```rust +pub struct RemoteSender(noq::SendStream, noq::RecvStream, PhantomData); +``` + +Created by `Client::request()` when the client is remote. Holds both sides of a QUIC bidirectional stream. + +### Key Methods + +```rust +impl RemoteSender { + pub fn new(send: SendStream, recv: RecvStream) -> Self; + + pub async fn write(self, msg: impl Into) -> Result<(SendStream, RecvStream), WriteError> { + let buf = prepare_write(msg)?; + self.write_raw(&buf).await + } + + // Internal: writes pre-serialized buffer + pub(crate) async fn write_raw(self, buf: &[u8]) -> Result<(SendStream, RecvStream), WriteError>; +} +``` + +The `write()` method: +1. Converts `msg` into the protocol enum `S` via `Into` +2. Checks serialized size against `MAX_MESSAGE_SIZE` +3. Length-prefixes with varint + postcard serialization +4. Writes to the `SendStream` +5. Returns the stream pair (now usable for response channels) + +The `write_raw()` method is used for 0-RTT where the message is pre-serialized to allow re-sending without re-serialization. + +### prepare_write + +```rust +fn prepare_write(msg: impl Into) -> Result, WriteError> { + let msg = msg.into(); + if postcard::experimental::serialized_size(&msg)? as u64 > MAX_MESSAGE_SIZE { + return Err(WriteError::MaxMessageSizeExceeded); + } + let mut buf = SmallVec::<[u8; 128]>::new(); + buf.write_length_prefixed(&msg)?; + Ok(buf) +} +``` + +Uses `SmallVec<[u8; 128]>` to avoid heap allocation for small messages. + +## Stream-to-Channel Conversions + +When a QUIC stream pair is received on the server side, it needs to be converted into typed channels. The `From` implementations handle this: + +### SendStream → Channel Tx + +```rust +// NoSender: drop the stream +impl From for NoSender { ... } + +// Oneshot: serialize and send single value, then done +impl From for oneshot::Sender { ... } + +// MPSC: repeatedly serialize and send values +impl From for mpsc::Sender { ... } +``` + +### RecvStream → Channel Rx + +```rust +// NoReceiver: drop the stream +impl From for NoReceiver { ... } + +// Oneshot: read single length-prefixed value +impl From for oneshot::Receiver { ... } + +// MPSC: repeatedly read length-prefixed values +impl From for mpsc::Receiver { ... } +``` + +## Server-Side Request Processing + +### read_request_raw + +```rust +pub async fn read_request_raw( + connection: &noq::Connection, +) -> io::Result> +``` + +1. Calls `connection.accept_bi()` to accept an incoming bidirectional stream +2. If `ApplicationClosed(0)`, returns `Ok(None)` (clean shutdown) +3. Reads a varint length prefix from the `RecvStream` +4. Checks against `MAX_MESSAGE_SIZE` +5. Reads `length` bytes from the stream +6. Deserializes with `postcard::from_bytes::()` +7. Returns `(deserialized_message, RecvStream, SendStream)` + +### read_request (typed) + +```rust +pub async fn read_request( + connection: &noq::Connection, +) -> io::Result> +``` + +Calls `read_request_raw()` and then applies `S::with_remote_channels()` to convert the raw protocol message + stream pair into a `WithChannels`-wrapped `Message`. + +### handle_connection + +```rust +pub async fn handle_connection( + connection: noq::Connection, + handler: Handler, +) -> io::Result<()> +``` + +Loops: +1. Calls `read_request_raw()` to get the next request +2. If `None`, returns `Ok(())` (connection closed) +3. Invokes `handler(msg, rx, tx)` to process the request +4. Continues until the connection closes or an error occurs + +Each connection is handled in a separate task (spawned by `listen()`). + +### listen + +```rust +pub async fn listen( + endpoint: noq::Endpoint, + handler: Handler, +) +``` + +The top-level server loop: +1. Accepts incoming connections from the `noq::Endpoint` +2. Spawns a task for each connection +3. Each task calls `handle_connection()` +4. Uses a `JoinSet` to manage and clean up completed tasks + +## The Handler and Local Forwarding + +The typical handler is created by `Protocol::remote_handler(local_sender)`: + +```rust +fn remote_handler(local_sender: LocalSender) -> Handler { + Arc::new(move |msg, rx, tx| { + let msg = Self::with_remote_channels(msg, rx, tx); + Box::pin(local_sender.send_raw(msg)) + }) +} +``` + +This converts the raw (deserialized protocol message, RecvStream, SendStream) tuple into a typed `WithChannels` message and forwards it to the local actor via the mpsc channel. The local actor can then use the typed channels without knowing whether they're local or remote. + +## Full Request Lifecycle (Remote) + +``` + CLIENT SERVER + │ │ + │ 1. Client::request() │ + │ → open_bi() on connection │ + │ │ + │ 2. RemoteSender::write(protocol_msg) │ + │ → serialize + send on SendStream ────►│ + │ │ 3. accept_bi() + │ │ 4. read_request_raw() + │ │ → read varint + data + │ │ → deserialize protocol_msg + │ │ + │ │ 5. RemoteService::with_remote_channels() + │ │ → creates WithChannels + │ │ → SendStream → tx channel + │ │ → RecvStream → rx channel + │ │ + │ │ 6. handler(msg, rx, tx) + │ │ → local_sender.send_raw(message) + │ │ → message goes to actor + │ │ + │ │ 7. Actor processes: + │ │ match message { + │ │ Msg::Get(wc) => { + │ │ let res = db.get(wc.inner.key); + │ │ wc.tx.send(res).await; + │ │ // tx.send() writes to SendStream + │ │ } + │ │ } + │ │ + │ 8. RecvStream reads response ◄───────────│ + │ 9. Deserialize response │ + │ 10. Return to caller │ +``` + +## 0-RTT Flow + +``` + CLIENT SERVER + │ │ + │ 1. Serialize message into buffer │ + │ (prepare_write) │ + │ │ + │ 2. Open 0-RTT connection │ + │ → write buffer immediately ─────────►│ + │ │ + │ 3. Check zero_rtt_accepted() │ + │ → If true: done, read response │ + │ → If false: │ + │ 4. Open new (full) connection │ + │ 5. Re-send same buffer ────────────►│ + │ │ + │ 6. Read response ◄──────────────────────│ +``` + +The key insight: the message buffer is pre-serialized so it can be re-sent without re-serialization if 0-RTT is rejected. \ No newline at end of file diff --git a/docs/research/references/iroh/irpc/07-irpc-iroh.md b/docs/research/references/iroh/irpc/07-irpc-iroh.md new file mode 100644 index 0000000..f336254 --- /dev/null +++ b/docs/research/references/iroh/irpc/07-irpc-iroh.md @@ -0,0 +1,271 @@ +# irpc: irpc-iroh — Iroh Transport Integration + +The `irpc-iroh` crate provides transport integration for iroh, enabling irpc to work with iroh's QUIC connections that use endpoint IDs (rather than socket addresses) for routing. + +## Crate Overview + +```toml +[package] +name = "irpc-iroh" +version = "0.13.0" +description = "Iroh transport for irpc" +``` + +Dependencies: `iroh`, `irpc`, `tokio`, `tracing`, `serde`, `postcard`, `n0-error`, `n0-future` + +## Key Types + +### IrohRemoteConnection + +```rust +#[derive(Debug, Clone)] +pub struct IrohRemoteConnection(Connection); +``` + +Wraps an existing iroh `Connection`. Simplest way to use irpc with iroh — create a connection externally and wrap it. + +```rust +impl RemoteConnection for IrohRemoteConnection { + fn clone_boxed(&self) -> Box { ... } + fn open_bi(&self) -> BoxFuture> { + // Delegates to connection.open_bi() + } + fn zero_rtt_accepted(&self) -> BoxFuture { + // Always true — fully authenticated connection + } +} +``` + +**Note:** This stops working when the underlying connection is closed. For automatic reconnection, use `IrohLazyRemoteConnection`. + +### IrohZrttRemoteConnection + +```rust +#[derive(Debug, Clone)] +pub struct IrohZrttRemoteConnection(OutgoingZeroRttConnection); +``` + +Wraps an iroh 0-RTT (Zero Round Trip Time) connection. This enables sending data before the full handshake completes for reduced latency on reconnections. + +```rust +impl RemoteConnection for IrohZrttRemoteConnection { + fn open_bi(&self) -> BoxFuture> { + // Delegates to the 0-RTT connection's open_bi() + } + fn zero_rtt_accepted(&self) -> BoxFuture { + // Actually checks handshake_completed() to determine + // if 0-RTT data was accepted + } +} +``` + +The `zero_rtt_accepted()` method: +- Returns `true` if `ZeroRttStatus::Accepted` +- Returns `false` if `ZeroRttStatus::Rejected` or on error +- This allows the `Client` to decide whether to re-send data + +### IrohLazyRemoteConnection + +```rust +#[derive(Debug, Clone)] +pub struct IrohLazyRemoteConnection(Arc); + +struct IrohRemoteConnectionInner { + endpoint: iroh::Endpoint, + addr: iroh::EndpointAddr, + connection: tokio::sync::Mutex>, + alpn: Vec, +} +``` + +The lazy connection caches the underlying iroh `Connection` and reconnects automatically: + +1. On first `open_bi()`, establishes a connection via `endpoint.connect(addr, alpn)` +2. Caches the connection in a `Mutex>` +3. On subsequent `open_bi()`, tries to reuse the cached connection +4. If the cached connection fails, clears the cache and reconnects once + +The `alpn` field is required because iroh connections need an ALPN protocol identifier. + +### `client()` Function + +```rust +pub fn client( + endpoint: iroh::Endpoint, + addr: impl Into, + alpn: impl AsRef<[u8]>, +) -> irpc::Client +``` + +Convenience function to create a `Client` using iroh. Creates an `IrohLazyRemoteConnection` and wraps it with `Client::boxed()`. + +## Server-Side: IrohProtocol + +### IrohProtocol + +```rust +pub struct IrohProtocol { + handler: Handler, + request_id: AtomicU64, +} +``` + +Implements `iroh::protocol::ProtocolHandler`, allowing it to be registered with iroh's `Router`: + +```rust +impl ProtocolHandler for IrohProtocol { + async fn accept(&self, connection: Connection) -> Result<(), AcceptError> { + // Handle the connection using irpc's handle_connection + let handler = self.handler.clone(); + let fut = handle_connection(&connection, handler).map_err(AcceptError::from_err); + fut.instrument(span).await + } +} +``` + +**Usage:** +```rust +let protocol = IrohProtocol::with_sender(local_sender); +// or +let protocol = IrohProtocol::new(handler); + +let router = Router::builder(endpoint) + .accept(ALPN, protocol) + .spawn(); +``` + +### Iroh0RttProtocol + +```rust +pub struct Iroh0RttProtocol { ... } +``` + +Supports 0-RTT connections by implementing `ProtocolHandler::on_accepting()`: + +```rust +impl ProtocolHandler for Iroh0RttProtocol { + async fn on_accepting(&self, accepting: Accepting) -> Result { + let zrtt_conn = accepting.into_0rtt(); + // Handle 0-RTT data immediately + handle_connection(&zrtt_conn, handler).await?; + // Wait for handshake completion + let conn = zrtt_conn.handshake_completed().await?; + Ok(conn) + } + + async fn accept(&self, _connection: Connection) -> Result<(), AcceptError> { + // Noop — handled in on_accepting + Ok(()) + } +} +``` + +**Warning:** 0-RTT data is replayable. Only use for idempotent operations. See . + +### IncomingRemoteConnection Trait + +```rust +pub trait IncomingRemoteConnection { + fn accept_bi(&self) -> impl Future> + Send; + fn close(&self, error_code: VarInt, reason: &[u8]); + fn remote_id(&self) -> Result; +} +``` + +Abstraction over `Connection` and `IncomingZeroRttConnection`, enabling `handle_connection` and `read_request` to work with both regular and 0-RTT connections. + +Implemented for: +- `Connection` — regular iroh connection +- `IncomingZeroRttConnection` — 0-RTT connection + +## handle_connection (iroh variant) + +```rust +pub async fn handle_connection( + connection: &impl IncomingRemoteConnection, + handler: Handler, +) -> io::Result<()> +``` + +Similar to the noq version but works with iroh's `IncomingRemoteConnection` trait. Records the remote endpoint ID in the tracing span. + +## read_request and read_request_raw (iroh variants) + +Same logic as the noq versions but using `IncomingRemoteConnection` instead of `noq::Connection`: + +```rust +pub async fn read_request( + connection: &impl IncomingRemoteConnection, +) -> io::Result> + +pub async fn read_request_raw( + connection: &impl IncomingRemoteConnection, +) -> io::Result> +``` + +## listen (iroh variant) + +```rust +pub async fn listen(endpoint: iroh::Endpoint, handler: Handler) +``` + +Accepts connections from an iroh `Endpoint` and handles them with the provided handler. Uses `n0_future::task::JoinSet` for task management. + +## Example Usage + +### Server + +```rust +use irpc::{rpc_requests, channel::oneshot, Client, WithChannels}; +use irpc_iroh::IrohProtocol; +use iroh::{endpoint::presets, protocol::Router, Endpoint}; + +#[rpc_requests(message = FooMessage)] +#[derive(Debug, Serialize, Deserialize)] +enum FooProtocol { + #[rpc(tx=oneshot::Sender)] + Get(String), +} + +async fn server() -> Result<()> { + let (tx, rx) = tokio::sync::mpsc::channel(16); + tokio::task::spawn(actor(rx)); + let client = Client::::local(tx); + + let endpoint = Endpoint::bind(presets::N0).await?; + let protocol = IrohProtocol::with_sender(client.as_local().unwrap()); + let router = Router::builder(endpoint).accept(ALPN, protocol).spawn(); + // ... keep running +} +``` + +### Client + +```rust +async fn connect(endpoint_id: EndpointId) -> Result> { + let endpoint = Endpoint::bind(presets::N0).await?; + let client = irpc_iroh::client(endpoint, endpoint_id, ALPN); + Ok(client) +} + +// Or with direct connection: +async fn connect_direct(endpoint: Endpoint, addr: EndpointAddr) -> Result> { + let conn = endpoint.connect(addr, ALPN).await?; + Ok(Client::boxed(IrohRemoteConnection::new(conn))) +} +``` + +### 0-RTT Client + +```rust +async fn connect_0rtt(endpoint: Endpoint, addr: EndpointAddr) -> Result> { + let connecting = endpoint.connect_with_opts(addr, ALPN, Default::default()).await?; + match connecting.into_0rtt() { + Ok(conn) => Ok(Client::boxed(IrohZrttRemoteConnection::new(conn))), + Err(connecting) => { + let conn = connecting.await?; + Ok(Client::boxed(IrohRemoteConnection::new(conn))) + } + } +} +``` \ No newline at end of file diff --git a/docs/research/references/iroh/irpc/08-serialization-and-utilities.md b/docs/research/references/iroh/irpc/08-serialization-and-utilities.md new file mode 100644 index 0000000..b516532 --- /dev/null +++ b/docs/research/references/iroh/irpc/08-serialization-and-utilities.md @@ -0,0 +1,134 @@ +# irpc: Serialization and Utility Modules + +## Varint Utilities + +The `varint-util` module (available with `rpc` or `varint-util` feature) provides LEB128 varint encoding/decoding compatible with postcard's format. + +### Async Reading + +```rust +pub async fn read_varint_u64(reader: &mut R) -> io::Result> +``` + +Reads a LEB128-encoded `u64` from an async reader. Returns `Ok(None)` on `UnexpectedEof` at the first byte position (clean stream end). + +**Format:** Each byte uses 7 bits for the value, MSB as continuation bit. Values stored little-endian (least significant group first). + +### Sync Writing + +```rust +pub fn write_varint_u64_sync(writer: &mut W, value: u64) -> io::Result +``` + +Writes a `u64` as LEB128 to a synchronous writer. + +### Length-Prefixed Encoding + +```rust +// Sync: +pub fn write_length_prefixed(write: impl io::Write, value: T) -> io::Result<()> +pub trait WriteVarintExt: io::Write { + fn write_varint_u64(&mut self, value: u64) -> io::Result; + fn write_length_prefixed(&mut self, value: T) -> io::Result<()>; +} + +// Async: +pub trait AsyncReadVarintExt: AsyncRead + Unpin { + fn read_varint_u64(&mut self) -> impl Future>>; + fn read_length_prefixed(&mut self, max_size: usize) -> impl Future>; +} + +pub trait AsyncWriteVarintExt: AsyncWrite + Unpin { + fn write_varint_u64(&mut self, value: u64) -> impl Future>; + fn write_length_prefixed(&mut self, value: V) -> impl Future>; +} +``` + +The length-prefix format is: +``` +[varint-encoded-length][postcard-serialized-data] +``` + +Used internally by irpc for framing all messages on QUIC streams. The `max_size` parameter in `read_length_prefixed` prevents memory exhaustion from malicious length values. + +## noq Endpoint Setup + +The `noq_endpoint_setup` feature provides helpers for creating noq endpoints with TLS configuration: + +```rust +pub fn configure_client(server_certs: &[&[u8]]) -> Result +pub fn configure_server() -> Result<(ServerConfig, Vec)> +pub fn configure_client_insecure() -> Result + +// Non-WASM only: +pub fn make_client_endpoint(bind_addr: SocketAddr, server_certs: &[&[u8]]) -> Result +pub fn make_insecure_client_endpoint(bind_addr: SocketAddr) -> Result +pub fn make_server_endpoint(bind_addr: SocketAddr) -> Result<(Endpoint, Vec)> +``` + +- `configure_server()`: Creates a self-signed certificate with rcgen and configures the server with TLS 1.3. Returns the DER-encoded certificate for clients to trust. +- `configure_client()`: Configures a client to trust specific DER certificates. +- `configure_client_insecure()`: Skips certificate verification (for testing only). +- Server endpoints set `max_concurrent_uni_streams(0)` to disable unidirectional streams (only bidirectional streams are used). +- Keep-alive interval is set to 1 second on client configs. + +## FusedOneshotReceiver + +```rust +pub(crate) struct FusedOneshotReceiver(pub tokio::sync::oneshot::Receiver); +``` + +A wrapper that prevents panics when polling an already-completed oneshot receiver. After the inner receiver resolves, subsequent polls return `Poll::Pending` indefinitely instead of panicking. + +This is important because irpc's `oneshot::Receiver` can be wrapped in `Receiver::Boxed` (a `BoxFuture`), and the inner future might be polled multiple times in certain select patterns. + +## now_or_never + +```rust +pub(crate) fn now_or_never(future: F) -> Option +``` + +Attempts to complete a future immediately without blocking. If the future would block, returns `None`. Used internally by `NoqSenderInner::try_send()` to attempt an immediate write to the QUIC stream without yielding. + +Implementation uses a no-op waker to poll the future once. + +## Spans Feature + +When the `spans` feature is enabled (default), `WithChannels` includes a `span: tracing::Span` field: + +```rust +pub struct WithChannels, S: Service> { + pub inner: I, + pub tx: >::Tx, + pub rx: >::Rx, + #[cfg(feature = "spans")] + pub span: tracing::Span, +} +``` + +The span is captured from `tracing::Span::current()` at the time of `WithChannels` construction (via `From` implementations). This preserves tracing context across async message-passing boundaries. + +The `rpc_requests` macro generates a `parent_span()` method on the message enum when `no_spans` is not set: + +```rust +impl ComputeMessage { + pub fn parent_span(&self) -> tracing::Span { + let span = match self { + ComputeMessage::Multiply(inner) => inner.parent_span_opt(), + ComputeMessage::Sum(inner) => inner.parent_span_opt(), + }; + span.cloned().unwrap_or_else(|| tracing::Span::current()) + } +} +``` + +This allows server-side handlers to enter the client's tracing span: + +```rust +async fn handle(msg: ComputeMessage) { + let _entered = msg.parent_span().enter(); + // ... processing happens in the client's tracing context +} +``` + +When `no_spans` is set in the macro, no span-related code is generated, making it compatible with builds that don't have the `spans` feature enabled. \ No newline at end of file diff --git a/docs/research/references/iroh/irpc/09-design-patterns-and-examples.md b/docs/research/references/iroh/irpc/09-design-patterns-and-examples.md new file mode 100644 index 0000000..554c5fc --- /dev/null +++ b/docs/research/references/iroh/irpc/09-design-patterns-and-examples.md @@ -0,0 +1,249 @@ +# irpc: Design Patterns and Usage Examples + +## Pattern 1: Actor Model (Most Common) + +The primary usage pattern is an actor that receives messages and processes them sequentially: + +```rust +struct StorageActor { + recv: tokio::sync::mpsc::Receiver, + state: BTreeMap, +} + +impl StorageActor { + pub fn spawn() -> StorageApi { + let (tx, rx) = tokio::sync::mpsc::channel(16); + let actor = Self { recv: rx, state: BTreeMap::new() }; + tokio::task::spawn(actor.run()); + StorageApi { inner: Client::local(tx) } + } + + async fn run(mut self) { + while let Some(msg) = self.recv.recv().await { + self.handle(msg).await; + } + } + + async fn handle(&mut self, msg: StorageMessage) { + match msg { + StorageMessage::Get(wc) => { + let WithChannels { inner, tx, .. } = wc; + tx.send(self.state.get(&inner.key).cloned()).await.ok(); + } + StorageMessage::Set(wc) => { + let WithChannels { inner, tx, .. } = wc; + self.state.insert(inner.key, inner.value); + tx.send(()).await.ok(); + } + } + } +} +``` + +**Key points:** +- The actor owns state and processes messages sequentially +- `Client::local(tx)` wraps the sender side of the mpsc channel +- `WithChannels` destructuring gives access to `inner` (the request data), `tx` (response channel), and `rx` (update channel) +- The `..` pattern ignores `rx` when it's `NoReceiver` and `span` (with `spans` feature) + +## Pattern 2: Concurrent Task Per Request + +For long-running or independent requests, spawn a task per message: + +```rust +async fn run(mut self) { + while let Ok(Some(msg)) = self.recv.recv().await { + tokio::task::spawn(async move { + if let Err(cause) = Self::handle(msg).await { + eprintln!("Error: {cause}"); + } + }); + } +} +``` + +This is useful for CPU-intensive or I/O-bound requests that shouldn't block other requests. + +## Pattern 3: Local-Only Usage + +irpc can be used without any RPC feature for pure in-process communication: + +```rust +// Cargo.toml: default-features = false, features = ["derive"] +#[rpc_requests(message = StorageMessage, no_rpc, no_spans)] +#[derive(Serialize, Deserialize, Debug)] +enum StorageProtocol { + #[rpc(tx=oneshot::Sender>)] + Get(Get), + #[rpc(tx=oneshot::Sender<()>)] + Set(Set), +} +``` + +The `no_rpc` flag prevents `RemoteService` from being generated, and `no_spans` removes the tracing dependency. This leaves only the local channel mechanism, with minimal dependencies (serde, tokio, tokio-util). + +## Pattern 4: API Type Wrapping Client + +The recommended pattern is to wrap `Client` in a higher-level API type: + +```rust +struct StorageApi { + inner: Client, +} + +impl StorageApi { + // Local + pub fn spawn() -> Self { + let (tx, rx) = tokio::sync::mpsc::channel(16); + tokio::task::spawn(StorageActor::new(rx).run()); + Self { inner: Client::local(tx) } + } + + // Remote (noq) + pub fn connect(endpoint: noq::Endpoint, addr: SocketAddr) -> Self { + Self { inner: Client::noq(endpoint, addr) } + } + + // Remote (iroh) + pub fn connect_iroh(endpoint: iroh::Endpoint, addr: EndpointAddr) -> Self { + Self { inner: irpc_iroh::client(endpoint, addr, ALPN) } + } + + // Type-safe methods that work for both local and remote + pub async fn get(&self, key: String) -> irpc::Result> { + self.inner.rpc(Get { key }).await + } + + pub async fn set(&self, key: String, value: String) -> irpc::Result<()> { + self.inner.rpc(Set { key, value }).await + } + + pub async fn list(&self) -> irpc::Result> { + self.inner.server_streaming(List, 16).await + } +} +``` + +This encapsulates the protocol details and provides a clean, type-safe API. The same `StorageApi` works identically whether connected locally or remotely. + +## Pattern 5: Server Setup + +### With noq + +```rust +fn serve(api: &StorageApi, endpoint: noq::Endpoint) -> Result> { + let local = api.inner.as_local().context("cannot listen on remote service")?; + let handler = StorageProtocol::remote_handler(local); + Ok(tokio::task::spawn(irpc::rpc::listen(endpoint, handler))) +} +``` + +### With iroh + +```rust +fn serve(api: &StorageApi, endpoint: iroh::Endpoint) -> Result { + let local = api.inner.as_local().context("cannot listen on remote service")?; + let protocol = IrohProtocol::with_sender(local); + Ok(Router::builder(endpoint).accept(ALPN, protocol).spawn()) +} +``` + +## Pattern 6: Low-Level Request Handling + +For more control than the `Client` methods provide, use `request()` directly: + +```rust +async fn custom_request(&self, msg: Get) -> anyhow::Result>> { + match self.inner.request().await? { + Request::Local(request) => { + let (tx, rx) = oneshot::channel(); + request.send((msg, tx)).await?; + Ok(rx) + } + Request::Remote(request) => { + let (_tx, rx) = request.write(msg).await?; + Ok(rx.into()) + } + } +} +``` + +This allows custom channel creation logic, e.g., different buffer sizes for local vs remote. + +## Pattern 7: Channel Filtering and Mapping + +irpc channels support filtering and mapping, which work for both local and remote channels: + +```rust +// Server-side: filter responses to only include values > 10 +let filtered_tx = wc.tx.with_filter(|v: &i64| *v > 10); + +// Server-side: transform responses +let mapped_tx = wc.tx.with_map(|v: i64| v * 2); + +// Client-side: filter received updates +let filtered_rx = rx.filter(|update: &Update| update.is_relevant()); +``` + +For remote channels, these create boxed wrappers. For local channels, they also create boxed wrappers. The overhead is negligible for remote (network latency dominates) but present for local. + +## Pattern 8: Using the `wrap` Attribute + +The `#[wrap]` attribute generates named structs from variant fields: + +```rust +#[rpc_requests(message = StoreMessage)] +#[derive(Debug, Serialize, Deserialize)] +enum StoreProtocol { + #[rpc(tx=oneshot::Sender>)] + #[wrap(GetRequest, derive(Clone))] + Get(String), // Generates: pub struct GetRequest(pub String); + + #[rpc(tx=oneshot::Sender<()>)] + #[wrap(SetRequest)] + Set { key: String, value: String }, // Generates: pub struct SetRequest { pub key: String, pub value: String } +} +``` + +Benefits: +- Named request types can be imported and constructed by name +- Additional derives (e.g., `Clone`) can be added +- Custom visibility can be specified: `#[wrap(pub(crate) GetRequest)]` +- The generated struct inherits the enum's visibility by default + +## Pattern 9: 0-RTT Connections + +For reduced latency on reconnections with iroh: + +```rust +// Client side +let result = client.rpc_0rtt(Get { key: "x".into() }).await?; + +// Server side (iroh) +let protocol = Iroh0RttProtocol::with_sender(local_sender); +let router = Router::builder(endpoint).accept(ALPN, protocol).spawn(); +``` + +**Important:** Only use 0-RTT for idempotent operations, as the data may be replayed by an attacker. + +## Pattern 10: Shared State in Actor + +For actors that need shared state accessible from multiple handlers: + +```rust +struct Actor { + recv: tokio::sync::mpsc::Receiver, + state: Arc>, +} +``` + +Or use the actor pattern with internal mutation: + +```rust +struct Actor { + recv: tokio::sync::mpsc::Receiver, + db: HashMap, // owned state +} +``` + +Since the actor processes messages sequentially, no internal synchronization is needed. \ No newline at end of file diff --git a/docs/research/references/iroh/irpc/10-quick-reference.md b/docs/research/references/iroh/irpc/10-quick-reference.md new file mode 100644 index 0000000..c367e43 --- /dev/null +++ b/docs/research/references/iroh/irpc/10-quick-reference.md @@ -0,0 +1,230 @@ +# irpc: Quick Reference + +## Crate Info + +- **Name:** `irpc` +- **Version:** 0.13.0 +- **License:** Apache-2.0 OR MIT +- **Repository:** https://github.com/n0-computer/irpc +- **MSRV:** 1.89 + +## Feature Flags + +| Feature | Default | Dependencies Added | +|---|---|---| +| `rpc` | ✅ | noq, postcard, smallvec, tracing, tokio/io-util | +| `derive` | ✅ | irpc-derive | +| `spans` | ✅ | tracing | +| `stream` | ✅ | futures-util | +| `noq_endpoint_setup` | ✅ | rustls, rcgen, futures-buffered | +| `varint-util` | ❌ | postcard, smallvec, tokio/io-util | + +## Type Quick Reference + +### Core Types + +``` +Service trait — implemented on protocol enum, defines Message type +Channels trait — implemented on request types, defines Tx/Rx types +RpcMessage trait — blanket impl for Debug+Serialize+DeserializeOwned+Send+Sync+Unpin+'static +Sender trait — sealed marker for sender types +Receiver trait — sealed marker for receiver types +WithChannels struct — wraps request I with tx/rx/span for service S +Client struct — client to service S (local or remote) +LocalSender struct — local sender wrapping mpsc::Sender +Request enum — Local(L) or Remote(R) request +RemoteSender struct — holds QUIC stream pair for sending initial message +``` + +### Channel Types + +``` +oneshot::Sender — Tokio or Boxed; single value; async send +oneshot::Receiver — Tokio or Boxed; single value; Future impl +mpsc::Sender — Tokio or Arc; stream; async send/try_send +mpsc::Receiver — Tokio or Box; stream; async recv +NoSender — No-op sender +NoReceiver — No-op receiver +``` + +### Remote Types (rpc feature) + +``` +RemoteConnection trait — open_bi(), zero_rtt_accepted(), clone_boxed() +NoqLazyRemoteConnection — lazy noq connection with cache +Handler type — Arc ...> +``` + +### irpc-iroh Types + +``` +IrohRemoteConnection — wraps iroh::Connection +IrohZrttRemoteConnection — wraps iroh::OutgoingZeroRttConnection +IrohLazyRemoteConnection — lazy iroh connection with cache +IrohProtocol — ProtocolHandler for iroh Router +Iroh0RttProtocol — ProtocolHandler with 0-RTT support +IncomingRemoteConnection trait — abstraction over Connection and ZeroRttConnection +``` + +## Interaction Patterns Cheatsheet + +```rust +// ═══════════════════════════════════════════ +// Protocol Definition +// ═══════════════════════════════════════════ + +#[rpc_requests(message = MyMessage)] +#[derive(Debug, Serialize, Deserialize)] +enum MyProtocol { + // Unary RPC + #[rpc(tx=oneshot::Sender)] + #[wrap(GetReq)] + Get(String), + + // Server streaming + #[rpc(tx=mpsc::Sender)] + #[wrap(ListReq)] + List(ListParams), + + // Client streaming + #[rpc(tx=oneshot::Sender, rx=mpsc::Receiver)] + #[wrap(UploadReq)] + Upload, + + // Bidirectional streaming + #[rpc(tx=mpsc::Sender, rx=mpsc::Receiver)] + #[wrap(ProcessReq)] + Process(ProcessConfig), + + // Fire and forget + #[rpc] + #[wrap(LogReq)] + Log(String), +} + +// ═══════════════════════════════════════════ +// Client Usage +// ═══════════════════════════════════════════ + +// Local +let (tx, rx) = tokio::sync::mpsc::channel(16); +tokio::task::spawn(actor(rx)); +let client: Client = Client::local(tx); + +// Remote (noq) +let client: Client = Client::noq(endpoint, addr); + +// Remote (iroh) +let client: Client = irpc_iroh::client(endpoint, addr, alpn); + +// ═══════════════════════════════════════════ +// Making Requests +// ═══════════════════════════════════════════ + +// Unary +let result: Response = client.rpc(GetReq("key".into())).await?; + +// Server streaming +let mut rx: mpsc::Receiver = client.server_streaming(ListReq(params), 16).await?; +while let Some(item) = rx.recv().await? { ... } + +// Client streaming +let (update_tx, response_rx): (mpsc::Sender, oneshot::Receiver) = + client.client_streaming(Upload, 4).await?; +update_tx.send(item).await?; +let count = response_rx.await?; + +// Bidirectional +let (update_tx, mut result_rx): (mpsc::Sender, mpsc::Receiver) = + client.bidi_streaming(ProcessReq(config), 4, 16).await?; +update_tx.send(update).await?; +while let Some(result) = result_rx.recv().await? { ... } + +// Fire and forget +client.notify(LogReq("message".into())).await?; + +// ═══════════════════════════════════════════ +// Server Setup +// ═══════════════════════════════════════════ + +// noq +let handler = MyProtocol::remote_handler(local_sender); +irpc::rpc::listen(endpoint, handler).await; + +// iroh +let protocol = IrohProtocol::with_sender(local_sender); +Router::builder(endpoint).accept(ALPN, protocol).spawn(); + +// ═══════════════════════════════════════════ +// Actor Message Handling +// ═══════════════════════════════════════════ + +async fn handle(&mut self, msg: MyMessage) { + match msg { + MyMessage::Get(wc) => { + let WithChannels { inner, tx, .. } = wc; + let result = self.db.get(&inner.0).cloned(); + tx.send(result).await.ok(); + } + MyMessage::List(wc) => { + let WithChannels { tx, .. } = wc; + for item in &self.items { + if tx.send(item.clone()).await.is_err() { break; } + } + } + MyMessage::Upload(wc) => { + let WithChannels { tx, mut rx, .. } = wc; + let mut count = 0; + while let Ok(Some(item)) = rx.recv().await { + self.process(item); + count += 1; + } + tx.send(count).await.ok(); + } + MyMessage::Process(wc) => { + let WithChannels { tx, mut rx, inner, .. } = wc; + tokio::task::spawn(async move { + while let Ok(Some(update)) = rx.recv().await { + if let Some(result) = process(update, &inner) { + if tx.send(result).await.is_err() { break; } + } + } + }); + } + MyMessage::Log(wc) => { + let WithChannels { inner, .. } = wc; + println!("{}", inner.0); + } + } +} +``` + +## Error Handling Quick Reference + +```rust +// Client-side errors +use irpc::{Error, RequestError, Result}; + +// Request errors (connection/stream open failures) +match client.rpc(GetReq("key".into())).await { + Ok(result) => { ... } + Err(Error::Request { source }) => { ... } // Connection failed + Err(Error::OneshotRecv { source }) => { ... } // Response channel error +} + +// Channel errors +use irpc::channel::{SendError, mpsc::RecvError, oneshot::RecvError}; + +// SendError: ReceiverClosed | MaxMessageSizeExceeded | Io +// RecvError (oneshot): SenderClosed | MaxMessageSizeExceeded | Io +// RecvError (mpsc): MaxMessageSizeExceeded | Io +``` + +## Constants + +```rust +pub const MAX_MESSAGE_SIZE: u64 = 16 * 1024 * 1024; // 16 MiB +pub const ERROR_CODE_MAX_MESSAGE_SIZE_EXCEEDED: u32 = 1; +pub const ERROR_CODE_INVALID_POSTCARD: u32 = 2; +// Connection close code 0 = clean shutdown +``` \ No newline at end of file