docs(research): add iroh suite deep-dive references for iroh, irpc, iroh-blobs, iroh-gossip, iroh-live, and iroh-docs
This commit is contained in:
@@ -0,0 +1,138 @@
|
||||
# iroh-blobs: Overview and Architecture
|
||||
|
||||
**Version**: 0.100.0
|
||||
**Repository**: https://github.com/n0-computer/iroh-blobs
|
||||
**License**: MIT OR Apache-2.0
|
||||
**Rust Edition**: 2021
|
||||
**MSRV**: 1.89
|
||||
|
||||
## What It Is
|
||||
|
||||
`iroh-blobs` is a Rust crate for content-addressed blob transfer over QUIC connections, built on top of [iroh](https://docs.rs/iroh). It implements a request-response protocol for streaming BLAKE3-verified data between peers, along with store implementations for persisting blobs locally.
|
||||
|
||||
The core value proposition: transfer arbitrary-sized data with **cryptographic integrity guaranteed in-stream** — every 16 KiB chunk group can be verified against the BLAKE3 hash tree as it arrives, without waiting for the complete transfer.
|
||||
|
||||
## Core Concepts
|
||||
|
||||
| Concept | Description |
|
||||
|---------|-------------|
|
||||
| **Blob** | A sequence of bytes of arbitrary size, identified by its BLAKE3 hash. No metadata. |
|
||||
| **Link** | A 32-byte BLAKE3 hash of a blob — the content address. |
|
||||
| **HashSeq** | A blob whose content is a sequence of BLAKE3 hashes (each 32 bytes). Length must be a multiple of 32. |
|
||||
| **Provider** | The side serving data. Waits for incoming requests and responds. |
|
||||
| **Requester** | The side requesting data. Initiates connections and sends requests. |
|
||||
| **Tag** | A persistent named reference to a `HashAndFormat`, protecting blobs from garbage collection. |
|
||||
| **TempTag** | An ephemeral in-memory reference that protects content while the process runs. |
|
||||
| **Chunk** | The fundamental BLAKE3 unit: 1024 bytes. |
|
||||
| **Chunk Group** | Iroh's grouping of 16 chunks (16 KiB), the minimum granularity for range requests and verification. |
|
||||
|
||||
## Architecture Diagram
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────────────────────┐
|
||||
│ Application │
|
||||
│ │
|
||||
│ ┌──────────┐ ┌──────────┐ ┌──────────────────┐ │
|
||||
│ │ Blobs │ │ Tags │ │ Downloader │ │
|
||||
│ │ API │ │ API │ │ API │ │
|
||||
│ └────┬─────┘ └────┬─────┘ └───────┬──────────┘ │
|
||||
│ │ │ │ │
|
||||
│ └──────────────┴────────────────┘ │
|
||||
│ │ │
|
||||
│ ┌───────┴───────┐ │
|
||||
│ │ Store (API) │ ← Actor-based, RPC │
|
||||
│ │ Commands │ message passing │
|
||||
│ └───────┬───────┘ │
|
||||
│ │ │
|
||||
│ ┌─────────────┼─────────────┐ │
|
||||
│ │ │ │ │
|
||||
│ ┌─────┴─────┐ ┌────┴────┐ ┌─────┴─────┐ │
|
||||
│ │ MemStore │ │ FsStore │ │ Readonly │ │
|
||||
│ │ │ │ (redb + │ │ MemStore │ │
|
||||
│ │ │ │ fs) │ │ │ │
|
||||
│ └────────────┘ └─────────┘ └───────────┘ │
|
||||
└─────────────────────────────────────────────────────┘
|
||||
|
||||
┌─────────────────────────────────────────────────────┐
|
||||
│ Network Layer │
|
||||
│ │
|
||||
│ ┌──────────────────┐ ┌──────────────────────┐ │
|
||||
│ │ BlobsProtocol │ │ Remote (Client) │ │
|
||||
│ │ (Provider side) │ │ (Requester side) │ │
|
||||
│ │ │ │ │ │
|
||||
│ │ handle_conn() │ │ Remote::fetch() │ │
|
||||
│ │ handle_stream() │ │ Remote::local() │ │
|
||||
│ └────────┬─────────┘ └──────────┬───────────┘ │
|
||||
│ │ │ │
|
||||
│ └──────── iroh QUIC ───────┘ │
|
||||
│ ALPN: /iroh-bytes/4 │
|
||||
└─────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
## Module Structure
|
||||
|
||||
```
|
||||
iroh-blobs/src/
|
||||
├── lib.rs # Crate root, re-exports
|
||||
├── hash.rs # Hash, BlobFormat, HashAndFormat
|
||||
├── hashseq.rs # HashSeq type
|
||||
├── format.rs # Format module (Collection)
|
||||
│ └── collection.rs # Collection type with metadata
|
||||
├── protocol.rs # Wire protocol types (GetRequest, etc.)
|
||||
│ └── range_spec.rs # ChunkRangesSeq, RangeSpec wire encoding
|
||||
├── net_protocol.rs # BlobsProtocol (iroh ProtocolHandler)
|
||||
├── provider.rs # Server-side request handling
|
||||
│ └── events.rs # Event system (connect/disconnect/progress)
|
||||
├── get.rs # Client-side FSM for getting data
|
||||
│ ├── error.rs # GetError, GetResult types
|
||||
│ └── request.rs # Request execution helpers
|
||||
├── api/ # High-level store API
|
||||
│ ├── blobs.rs # Blob operations (add, export, read, etc.)
|
||||
│ │ └── reader.rs # BlobReader (AsyncRead + AsyncSeek)
|
||||
│ ├── downloader.rs # Multi-source download coordinator
|
||||
│ ├── remote.rs # Remote peer interaction (fetch, observe)
|
||||
│ ├── tags.rs # Tag management API
|
||||
│ ├── proto.rs # Store command protocol (RPC messages)
|
||||
│ └── proto/ # Proto sub-modules
|
||||
│ └── bitfield.rs # Bitfield type for chunk tracking
|
||||
├── store/ # Storage implementations
|
||||
│ ├── mod.rs # IROH_BLOCK_SIZE, GcConfig
|
||||
│ ├── mem.rs # MemStore (in-memory, mutable)
|
||||
│ ├── fs.rs # FsStore (filesystem + redb hybrid)
|
||||
│ ├── readonly_mem.rs # Read-only memory store
|
||||
│ ├── gc.rs # Garbage collection
|
||||
│ ├── util.rs # Shared utilities (Tag, SparseMemFile, etc.)
|
||||
│ └── test.rs # Test utilities
|
||||
├── ticket.rs # BlobTicket (shareable connection info)
|
||||
├── metrics.rs # Prometheus metrics definitions
|
||||
└── util/ # Utilities
|
||||
├── channel.rs # Channel helpers
|
||||
├── connection_pool.rs # Connection pooling
|
||||
├── stream.rs # Stream abstractions
|
||||
└── temp_tag.rs # TempTag, TagCounter, TempTags scope management
|
||||
```
|
||||
|
||||
## Key Dependencies
|
||||
|
||||
| Dependency | Purpose |
|
||||
|------------|---------|
|
||||
| `bao-tree` | BLAKE3 verified streaming, outboard storage, BaoTree encoding/decoding |
|
||||
| `iroh` | QUIC networking, endpoint, router |
|
||||
| `irpc` | RPC framework for store commands |
|
||||
| `postcard` | Wire serialization (compact, no-schema) |
|
||||
| `redb` | Embedded key-value database (fs-store feature) |
|
||||
| `range-collections` | RangeSet2 / ChunkRanges for chunk tracking |
|
||||
| `bytes` | Efficient byte buffer handling |
|
||||
|
||||
## Feature Flags
|
||||
|
||||
| Feature | Default | Description |
|
||||
|---------|---------|-------------|
|
||||
| `fs-store` | ✅ | Filesystem-based store with redb + file hybrid |
|
||||
| `rpc` | ✅ | RPC support via `noq` / `irpc` |
|
||||
| `metrics` | ❌ | Prometheus metrics |
|
||||
| `hide-proto-docs` | ✅ | Hides protocol docs from rustdocs |
|
||||
|
||||
## BLAKE3 Block Size
|
||||
|
||||
The crate uses a fixed block size of `IROH_BLOCK_SIZE = BlockSize::from_chunk_log(4)`, which means each chunk group is 2^4 = 16 chunks = 16 × 1024 = 16,384 bytes (16 KiB). This is the minimum granularity for range requests and verification.
|
||||
195
docs/research/references/iroh/iroh-blobs/02-key-types.md
Normal file
195
docs/research/references/iroh/iroh-blobs/02-key-types.md
Normal file
@@ -0,0 +1,195 @@
|
||||
# iroh-blobs: Key Types and Data Structures
|
||||
|
||||
## Hash
|
||||
|
||||
```rust
|
||||
// src/hash.rs
|
||||
pub struct Hash(blake3::Hash); // 32-byte BLAKE3 hash, wraps blake3::Hash
|
||||
```
|
||||
|
||||
The fundamental content-address. Created via `Hash::new(data)` or `Hash::from_bytes([u8; 32])`. Has a constant `Hash::EMPTY` for the empty blob. Supports hex display, serde (compact binary for non-human-readable), and is stored as a 32-byte fixed array in redb.
|
||||
|
||||
Wire format: 32 raw bytes (postcard serialization). No framing overhead.
|
||||
|
||||
## BlobFormat
|
||||
|
||||
```rust
|
||||
pub enum BlobFormat {
|
||||
Raw, // A single blob
|
||||
HashSeq, // A sequence of BLAKE3 hashes
|
||||
}
|
||||
```
|
||||
|
||||
Distinguishes between a raw binary blob and a hash sequence. Wire format: single byte (0 = Raw, 1 = HashSeq).
|
||||
|
||||
## HashAndFormat
|
||||
|
||||
```rust
|
||||
pub struct HashAndFormat {
|
||||
pub hash: Hash,
|
||||
pub format: BlobFormat,
|
||||
}
|
||||
```
|
||||
|
||||
Pairs a hash with its format. Wire format: 33 bytes (32 for hash + 1 for format). Display format: hex string, optionally prefixed with 's' for HashSeq.
|
||||
|
||||
## HashSeq
|
||||
|
||||
```rust
|
||||
// src/hashseq.rs
|
||||
pub struct HashSeq(Bytes); // Wrapper around Bytes, length must be multiple of 32
|
||||
```
|
||||
|
||||
A blob interpreted as a sequence of 32-byte BLAKE3 hashes. Created from `Bytes` via `HashSeq::new(bytes)` (returns `None` if length is not a multiple of 32). Iterable, supports `get(index)`, `pop_front()`.
|
||||
|
||||
Used extensively: collections are stored as a HashSeq where the first child is metadata and subsequent children are data blobs.
|
||||
|
||||
## Bitfield
|
||||
|
||||
```rust
|
||||
// src/api/proto/bitfield.rs
|
||||
pub struct Bitfield {
|
||||
pub size: u64, // Total size of the blob in bytes
|
||||
pub ranges: ChunkRanges, // Which chunks are verified/present
|
||||
}
|
||||
```
|
||||
|
||||
Tracks which chunks of a blob are present and verified. Key methods:
|
||||
- `is_complete()` — all chunks present
|
||||
- `validated_size()` — how many bytes are verified
|
||||
- `diff(&other)` — compute the delta between two bitfields
|
||||
|
||||
Used by the observe protocol and internal state tracking.
|
||||
|
||||
## Tag
|
||||
|
||||
```rust
|
||||
// src/store/util.rs
|
||||
pub struct Tag(pub Bytes); // Named reference, arbitrary bytes, typically UTF-8
|
||||
```
|
||||
|
||||
A persistent named reference to content in the store. Tags protect content from garbage collection. Auto-generated tags use the format `"auto-2026-01-15T12:34:56.789Z"`. Tags are stored in the store's database and can be listed, created, renamed, and deleted.
|
||||
|
||||
## TempTag
|
||||
|
||||
```rust
|
||||
// src/util/temp_tag.rs
|
||||
pub struct TempTag {
|
||||
inner: HashAndFormat,
|
||||
on_drop: Option<Weak<dyn TagDrop>>, // Callback when dropped
|
||||
}
|
||||
```
|
||||
|
||||
An ephemeral, in-memory tag. While a `TempTag` exists, its referenced content is protected from garbage collection. When dropped, the `TagDrop` callback notifies the store to unprotect. Can be `leak()`ed to make the protection permanent for the process lifetime.
|
||||
|
||||
Scopes: `TempTagScope` manages groups of temp tags. `Scope::GLOBAL` is the default scope. Batches of operations can create scoped temp tags that are cleaned up together.
|
||||
|
||||
## BlobTicket
|
||||
|
||||
```rust
|
||||
// src/ticket.rs
|
||||
pub struct BlobTicket {
|
||||
addr: EndpointAddr, // How to reach the provider (includes EndpointId, relay URL, direct addresses)
|
||||
format: BlobFormat, // Raw or HashSeq
|
||||
hash: Hash, // What to retrieve
|
||||
}
|
||||
```
|
||||
|
||||
A shareable token containing everything needed to retrieve a blob from a provider. Serialized via `iroh_tickets::Ticket` trait (base32-encoded with "blob" prefix). Wire format uses postcard with a variant discriminator.
|
||||
|
||||
```rust
|
||||
// Creating a ticket
|
||||
let ticket = BlobTicket::new(addr, hash, BlobFormat::Raw);
|
||||
|
||||
// From a ticket string
|
||||
let ticket: BlobTicket = ticket_str.parse()?;
|
||||
```
|
||||
|
||||
## ChunkRanges and ChunkRangesSeq
|
||||
|
||||
### ChunkRanges
|
||||
|
||||
```rust
|
||||
pub type ChunkRanges = RangeSet2<ChunkNum>; // From range_collections crate
|
||||
```
|
||||
|
||||
A set of non-overlapping chunk ranges. Supports boolean operations (union, intersection, difference). The fundamental unit is `ChunkNum` (a u64 newtype representing a 1024-byte BLAKE3 chunk).
|
||||
|
||||
Helper trait `ChunkRangesExt` provides:
|
||||
- `ChunkRanges::all()` — all chunks
|
||||
- `ChunkRanges::bytes(range)` — byte range rounded up to chunk boundaries
|
||||
- `ChunkRanges::chunks(range)` — chunk range from u64 bounds
|
||||
- `ChunkRanges::last_chunk()` — the very last chunk (for size verification)
|
||||
- `ChunkRanges::chunk(n)` — a single chunk
|
||||
- `ChunkRanges::offset(n)` — a single byte offset rounded to chunk
|
||||
|
||||
### ChunkRangesSeq
|
||||
|
||||
```rust
|
||||
// src/protocol/range_spec.rs
|
||||
pub struct ChunkRangesSeq(SmallVec<[(u64, ChunkRanges); 2]>);
|
||||
```
|
||||
|
||||
A sequence of `ChunkRanges`, one per blob in a HashSeq. Uses run-length encoding: stores `(offset, ranges)` pairs, where offset is the first blob index with that range spec. Unspecified indices default to the most recent range (or empty for finite sequences).
|
||||
|
||||
Key methods:
|
||||
- `ChunkRangesSeq::all()` — request everything (root + all children, forever)
|
||||
- `ChunkRangesSeq::root()` — request only the root blob
|
||||
- `ChunkRangesSeq::empty()` — request nothing
|
||||
- `ChunkRangesSeq::from_ranges(ranges)` — from explicit iterator
|
||||
- `ChunkRangesSeq::from_ranges_infinite(ranges)` — last range repeats forever
|
||||
- `.iter_non_empty_infinite()` — iterate only non-empty ranges
|
||||
- `.is_blob()` — true if requesting a single blob (offset 0 with one entry)
|
||||
|
||||
### RangeSpec (Wire Format)
|
||||
|
||||
```rust
|
||||
pub struct RangeSpec(SmallVec<[u64; 2]>);
|
||||
```
|
||||
|
||||
The on-wire encoding of `ChunkRanges`. Uses alternating spans: first span is deselected, second is selected, etc. SmallVec avoids allocation for the common case of a single range.
|
||||
|
||||
Examples:
|
||||
- `[]` — empty (nothing selected)
|
||||
- `[0]` — everything from chunk 0 selected (entire blob)
|
||||
- `[2, 5, 3, 1]` — chunks 2-7 and 10-11 selected
|
||||
- `[u64::MAX]` — only the last chunk (size proof)
|
||||
|
||||
### ChunkRangesSeq Wire Format
|
||||
|
||||
Serialized as `(SmallVec<[(u64, RangeSpec); 2]>)` where each element is `(delta_offset, rangespec)`. The `delta_offset` is the distance from the previous entry. Uses postcard varint encoding for compact transmission.
|
||||
|
||||
## Store Command Protocol
|
||||
|
||||
The store API uses an RPC-style command pattern via `irpc`. Each command has a `Command` enum variant with typed request/response channels:
|
||||
|
||||
```rust
|
||||
#[rpc_requests(message = Command, alias = "Msg", rpc_feature = "rpc")]
|
||||
pub enum Request {
|
||||
ListBlobs(ListRequest),
|
||||
Batch(BatchRequest),
|
||||
DeleteBlobs(BlobDeleteRequest),
|
||||
ImportBao(ImportBaoRequest), // streaming: rx bao items, tx result
|
||||
ExportBao(ExportBaoRequest), // streaming: tx encoded items
|
||||
ExportRanges(ExportRangesRequest), // streaming: tx range data
|
||||
Observe(ObserveRequest), // streaming: tx bitfield updates
|
||||
BlobStatus(BlobStatusRequest),
|
||||
ImportBytes(ImportBytesRequest),
|
||||
ImportByteStream(ImportByteStreamRequest), // duplex streaming
|
||||
ImportPath(ImportPathRequest),
|
||||
ExportPath(ExportPathRequest),
|
||||
ListTags(ListTagsRequest),
|
||||
SetTag(SetTagRequest),
|
||||
DeleteTags(DeleteTagsRequest),
|
||||
RenameTag(RenameTagRequest),
|
||||
CreateTag(CreateTagRequest),
|
||||
CreateTempTag(CreateTempTagRequest),
|
||||
ListTempTags(ListTempTagsRequest),
|
||||
SyncDb(SyncDbRequest),
|
||||
WaitIdle(WaitIdleRequest),
|
||||
Shutdown(ShutdownRequest),
|
||||
ClearProtected(ClearProtectedRequest),
|
||||
}
|
||||
```
|
||||
|
||||
This allows both local (in-process) and remote (RPC) store access through the same API surface.
|
||||
249
docs/research/references/iroh/iroh-blobs/03-transfer-protocol.md
Normal file
249
docs/research/references/iroh/iroh-blobs/03-transfer-protocol.md
Normal file
@@ -0,0 +1,249 @@
|
||||
# iroh-blobs: Transfer Protocol
|
||||
|
||||
## Overview
|
||||
|
||||
The transfer protocol is a **request-response** protocol operating over QUIC streams (via iroh). The ALPN is `b"/iroh-bytes/4"`.
|
||||
|
||||
The requester opens a bidirectional QUIC stream, sends a request, and the provider responds with BLAKE3-verified streaming data on the same stream.
|
||||
|
||||
**Key properties**:
|
||||
- Data integrity is verified in-stream — every 16 KiB chunk group can be independently verified against the BLAKE3 hash tree
|
||||
- No upper limit on blob or collection size — streaming design avoids buffering entire transfers
|
||||
- Zero round-trip overhead for multiple small blobs (via HashSeq/GetManyRequest)
|
||||
- Range requests supported at chunk granularity
|
||||
|
||||
## Request Types
|
||||
|
||||
```rust
|
||||
pub enum Request {
|
||||
Get(GetRequest),
|
||||
Observe(ObserveRequest),
|
||||
Slot2, Slot3, Slot4, Slot5, Slot6, Slot7, // Reserved
|
||||
Push(PushRequest),
|
||||
GetMany(GetManyRequest),
|
||||
}
|
||||
```
|
||||
|
||||
Wire format: 1-byte discriminator (postcard-encoded `RequestType` enum), followed by postcard-serialized request body.
|
||||
|
||||
### GetRequest
|
||||
|
||||
```rust
|
||||
pub struct GetRequest {
|
||||
pub hash: Hash, // BLAKE3 hash of the root blob
|
||||
pub ranges: ChunkRangesSeq, // What ranges to request
|
||||
}
|
||||
```
|
||||
|
||||
The most common request type. The `ranges` field uses `ChunkRangesSeq` to express which parts of the root blob and its children to request.
|
||||
|
||||
**Common patterns**:
|
||||
|
||||
```rust
|
||||
// Request an entire single blob
|
||||
let req = GetRequest::blob(hash);
|
||||
// -> ChunkRangesSeq with a single element: all chunks of the root
|
||||
|
||||
// Request a HashSeq (root + all children)
|
||||
let req = GetRequest::all(hash);
|
||||
// -> ChunkRangesSeq::all() - infinite sequence of "all chunks"
|
||||
|
||||
// Request parts of a single blob
|
||||
let req = GetRequest::builder()
|
||||
.root(ChunkRanges::bytes(0..1000))
|
||||
.build(hash);
|
||||
|
||||
// Request a HashSeq with specific child ranges
|
||||
let req = GetRequest::builder()
|
||||
.root(ChunkRanges::all()) // full root (the hash seq)
|
||||
.child(1, ChunkRanges::bytes(0..100)) // partial child 1
|
||||
.next(ChunkRanges::all()) // full remaining children
|
||||
.build_open(hash); // build_open = last range repeats forever
|
||||
```
|
||||
|
||||
### GetManyRequest
|
||||
|
||||
```rust
|
||||
pub struct GetManyRequest {
|
||||
pub hashes: Vec<Hash>, // Sorted, deduplicated list of hashes
|
||||
pub ranges: ChunkRangesSeq, // Ranges for each hash (no root entry)
|
||||
}
|
||||
```
|
||||
|
||||
Like a `GetRequest` for a HashSeq, but the hashes are provided by the requester instead of looked up from the provider. This avoids the provider needing to have a pre-existing HashSeq blob.
|
||||
|
||||
```rust
|
||||
let req = GetManyRequest::builder()
|
||||
.hash(hash1, ChunkRanges::all())
|
||||
.hash(hash2, ChunkRanges::all())
|
||||
.build();
|
||||
// Deduplicates and sorts hashes automatically
|
||||
```
|
||||
|
||||
### PushRequest
|
||||
|
||||
```rust
|
||||
pub struct PushRequest(GetRequest); // Wraps a GetRequest
|
||||
```
|
||||
|
||||
The inverse of a GetRequest — the requester pushes data to the provider. The request describes what will be sent, followed by the actual data stream. Providers may reject push requests (disabled by default via `EventMask`).
|
||||
|
||||
### ObserveRequest
|
||||
|
||||
```rust
|
||||
pub struct ObserveRequest {
|
||||
pub hash: Hash,
|
||||
pub ranges: RangeSpec, // Which ranges to observe
|
||||
}
|
||||
```
|
||||
|
||||
Subscribes to availability changes for a blob's bitfield. The provider sends `ObserveItem` updates as chunks become available.
|
||||
|
||||
## Response Format
|
||||
|
||||
### For Get/GetMany/Push
|
||||
|
||||
The response is BLAKE3-verified streaming data (bao-tree format). For each blob in the request:
|
||||
|
||||
1. **8-byte size header** (little-endian u64) — the total size of the blob
|
||||
2. **BLAKE3 verified stream** — encoded data for the requested ranges, using bao-tree's mixed encoding:
|
||||
- `BaoContentItem::Parent(node, (left_hash, right_hash))` — internal hash tree nodes (64 bytes each)
|
||||
- `BaoContentItem::Leaf(Leaf { offset, data })` — actual data chunks
|
||||
|
||||
The data is sent in order: ascending chunks for each blob, blobs in HashSeq order.
|
||||
|
||||
**Verification**: The requester validates each chunk group against the expected BLAKE3 hash tree. Invalid data is detected within at most 16 KiB of reception. Missing data (provider doesn't have a chunk) causes the provider to close the stream at the point where data becomes unavailable.
|
||||
|
||||
### For Observe
|
||||
|
||||
The provider sends length-prefixed `ObserveItem` messages:
|
||||
|
||||
```rust
|
||||
pub struct ObserveItem {
|
||||
pub size: u64, // Blob size
|
||||
pub ranges: ChunkRanges, // Available chunks
|
||||
}
|
||||
```
|
||||
|
||||
Updates are sent as deltas — only the new chunks that have become available since the last update.
|
||||
|
||||
## Error Handling
|
||||
|
||||
Error codes for stream/connection closure:
|
||||
|
||||
| Code | Name | Meaning |
|
||||
|------|------|---------|
|
||||
| 0 | StreamDropped | RecvStream was dropped |
|
||||
| 1 | ProviderTerminating | Provider is shutting down |
|
||||
| 2 | RequestReceived | Only one request per stream allowed |
|
||||
| 1 (application) | ERR_PERMISSION | Permission denied |
|
||||
| 2 (application) | ERR_LIMIT | Rate limited |
|
||||
| 3 (application) | ERR_INTERNAL | Internal error |
|
||||
|
||||
## Client-Side FSM (Get)
|
||||
|
||||
The `get::fsm` module implements the get request as a **finite state machine** for maximum control:
|
||||
|
||||
```
|
||||
AtInitial
|
||||
│ (open QUIC stream)
|
||||
▼
|
||||
AtConnected
|
||||
│ (send request, drop writer)
|
||||
▼
|
||||
ConnectedNext ─┬─ StartRoot(hash, ranges) // offset 0 = root blob
|
||||
├─ StartChild(offset, ranges) // offset > 0 = child blob
|
||||
└─ Closing // empty request
|
||||
│
|
||||
AtStartRoot / AtStartChild
|
||||
│ (determine hash for child)
|
||||
▼
|
||||
AtBlobHeader
|
||||
│ (read 8-byte size)
|
||||
▼
|
||||
AtBlobContent
|
||||
│ (stream BLAKE3-verified items)
|
||||
├─ More(content_item) → AtBlobContent // loop
|
||||
└─ Done → AtEndBlob
|
||||
│
|
||||
AtEndBlob
|
||||
│ (iterate to next blob in sequence)
|
||||
├─ MoreChildren(AtStartChild)
|
||||
└─ Closing
|
||||
│ (drain remaining bytes)
|
||||
▼
|
||||
Stats (transfer statistics)
|
||||
```
|
||||
|
||||
Each state transition is explicit. The FSM gives the consumer full control:
|
||||
- `AtBlobContent::next()` returns `BlobContentNext::More((content, item))` or `BlobContentNext::Done(end)`
|
||||
- `AtBlobHeader::next()` reads the size header and creates a `ResponseDecoder`
|
||||
- `AtStartChild::next(hash)` requires the caller to supply the hash (from the HashSeq)
|
||||
|
||||
### Stats Tracking
|
||||
|
||||
```rust
|
||||
pub struct Stats {
|
||||
pub payload_bytes_read: u64, // Actual data bytes
|
||||
pub other_bytes_read: u64, // Hash pairs, headers
|
||||
pub payload_bytes_written: u64, // For push
|
||||
pub other_bytes_written: u64, // For push
|
||||
pub elapsed: Duration,
|
||||
}
|
||||
```
|
||||
|
||||
## Provider-Side Handling
|
||||
|
||||
```rust
|
||||
pub async fn handle_connection(connection: Connection, store: Store, events: EventSender);
|
||||
```
|
||||
|
||||
The provider accepts QUIC streams on a connection. For each stream:
|
||||
1. Read the request type byte
|
||||
2. Deserialize the request
|
||||
3. Dispatch to `handle_get`, `handle_get_many`, `handle_observe`, or `handle_push`
|
||||
4. For `handle_get`: iterate over the `ChunkRangesSeq`, streaming each blob via `store.export_bao(hash, ranges)`
|
||||
5. For HashSeq requests: load the root blob, parse it as `HashSeq`, then stream each requested child
|
||||
|
||||
### Event System
|
||||
|
||||
The provider can emit events for monitoring and access control:
|
||||
|
||||
```rust
|
||||
pub struct EventMask {
|
||||
pub connected: ConnectMode, // None, Notify, Intercept
|
||||
pub get: RequestMode, // None, Notify, Intercept, NotifyLog, InterceptLog, Disabled
|
||||
pub get_many: RequestMode,
|
||||
pub push: RequestMode, // Disabled by default!
|
||||
pub observe: ObserveMode,
|
||||
pub throttle: ThrottleMode, // None, Intercept
|
||||
}
|
||||
```
|
||||
|
||||
- **None**: No events, requests processed normally
|
||||
- **Notify**: Events sent but cannot block requests
|
||||
- **Intercept**: Events sent as RPC requests; handler can reject with `AbortReason`
|
||||
- **Disabled**: All requests of this type rejected
|
||||
|
||||
Progress events: `TransferStarted`, `TransferProgress`, `TransferCompleted`, `TransferAborted`.
|
||||
|
||||
## Collection Format
|
||||
|
||||
```rust
|
||||
pub struct Collection {
|
||||
blobs: Vec<(String, Hash)>, // Named references to child blobs
|
||||
}
|
||||
```
|
||||
|
||||
Wire format (as a HashSeq blob):
|
||||
1. First child blob: `CollectionMeta` serialized with postcard
|
||||
2. Remaining children: the actual data blobs
|
||||
|
||||
```rust
|
||||
pub struct CollectionMeta {
|
||||
header: [u8; 13], // Must be b"CollectionV0."
|
||||
names: Vec<String>, // Names for each child blob
|
||||
}
|
||||
```
|
||||
|
||||
The header `b"CollectionV0."` is a magic number for format identification. The meta blob's hash becomes the first entry in the HashSeq, followed by the hashes of each data blob. Names correspond 1:1 with data blobs (excluding the meta entry).
|
||||
250
docs/research/references/iroh/iroh-blobs/04-storage.md
Normal file
250
docs/research/references/iroh/iroh-blobs/04-storage.md
Normal file
@@ -0,0 +1,250 @@
|
||||
# iroh-blobs: Storage Architecture
|
||||
|
||||
## Overview
|
||||
|
||||
iroh-blobs provides three store implementations sharing a common `Store` API surface:
|
||||
|
||||
| Store | Location | Mutable | Use Case |
|
||||
|-------|----------|---------|----------|
|
||||
| `MemStore` | In-memory | ✅ | Small data, testing, WASM |
|
||||
| `FsStore` | Filesystem + redb | ✅ | Production, large data |
|
||||
| `ReadonlyMemStore` | In-memory | ❌ | Static data serving |
|
||||
|
||||
All stores implement the same RPC-based command protocol (`Command` enum), allowing both local in-process and remote RPC access through the same `Store` type.
|
||||
|
||||
## Store API Surface
|
||||
|
||||
The `Store` type (from `api::Store`) is the primary interface. It's accessed via typed sub-APIs:
|
||||
|
||||
```rust
|
||||
let store: Store = /* ... */;
|
||||
|
||||
// Blob operations
|
||||
store.blobs() // → Blobs API (add, export, read, delete, observe, etc.)
|
||||
store.tags() // → Tags API (create, list, set, delete, rename)
|
||||
|
||||
// Direct operations
|
||||
store.add_bytes(data) // → AddProgress
|
||||
store.add_slice(data) // → TempTag (convenience)
|
||||
store.get_bytes(hash) // → Result<Bytes>
|
||||
store.has(hash) // → bool
|
||||
store.shutdown() // Clean shutdown
|
||||
store.wait_idle() // Wait for all tasks to complete
|
||||
store.sync_db() // Sync database to disk (FsStore)
|
||||
```
|
||||
|
||||
## Blobs API
|
||||
|
||||
```rust
|
||||
let blobs = store.blobs();
|
||||
|
||||
// Import
|
||||
blobs.add_slice(data) // → AddProgress (raw format)
|
||||
blobs.add_bytes(data) // → AddProgress (raw format)
|
||||
blobs.add_bytes_with_opts(AddBytesOptions{..}) // → AddProgress (with format)
|
||||
blobs.import_byte_stream(format) // → streaming import
|
||||
|
||||
// Export
|
||||
blobs.reader(hash) // → BlobReader (AsyncRead + AsyncSeek)
|
||||
blobs.export(hash, path) // → export to filesystem
|
||||
blobs.export_bao(hash, ranges) // → ExportBao (BLAKE3 verified stream)
|
||||
blobs.export_ranges(hash, ranges) // → ExportRanges (raw data ranges)
|
||||
|
||||
// Observe (subscribe to chunk availability)
|
||||
blobs.observe(hash) // → ObserveAt (bitfield stream)
|
||||
|
||||
// Status
|
||||
blobs.status(hash) // → BlobStatus (NotFound/Partial/Complete)
|
||||
|
||||
// Import BAO-encoded data
|
||||
blobs.import_bao_bytes(hash, ranges, data) // → import verified BAO stream
|
||||
blobs.import_bao_reader(hash, ranges, reader) // → import from async reader
|
||||
|
||||
// Batch operations (scoped temp tags)
|
||||
blobs.batch() // → Batch (auto-cleanup scope)
|
||||
|
||||
// Delete
|
||||
blobs.delete(hashes) // → force delete (use GC normally)
|
||||
```
|
||||
|
||||
## Tags API
|
||||
|
||||
```rust
|
||||
let tags = store.tags();
|
||||
|
||||
tags.set(name, value) // Set a persistent tag
|
||||
tags.create(value) // Auto-generate a tag name, return Tag
|
||||
tags.get(name) // → Option<TagInfo>
|
||||
tags.list() // → Stream<TagInfo>
|
||||
tags.list_hash_seq() // → Stream<TagInfo> (only HashSeq format)
|
||||
tags.delete(name) // Delete a tag
|
||||
tags.delete_range(range) // Delete tags in range
|
||||
tags.delete_prefix(prefix) // Delete tags with prefix
|
||||
tags.rename(from, to) // Atomically rename a tag
|
||||
tags.temp_tag(value) // → TempTag (ephemeral protection)
|
||||
```
|
||||
|
||||
## MemStore Architecture
|
||||
|
||||
The in-memory store uses a simple actor pattern:
|
||||
|
||||
```
|
||||
MemStore (ApiClient)
|
||||
│
|
||||
└── Actor (tokio task)
|
||||
├── State
|
||||
│ ├── data: HashMap<Hash, BaoFileHandle> // All blob data
|
||||
│ ├── tags: BTreeMap<Tag, HashAndFormat> // Persistent tags
|
||||
│ └── empty_hash: BaoFileHandle // Special entry for empty blob
|
||||
├── tasks: JoinSet<TaskResult> // Spawned import/export tasks
|
||||
├── temp_tags: TempTags // Ephemeral protection
|
||||
├── protected: HashSet<Hash> // GC-protected hashes
|
||||
└── idle_waiters: Vec<oneshot::Sender<()>> // Wait-idle notifications
|
||||
```
|
||||
|
||||
### BaoFileHandle / BaoFileStorage
|
||||
|
||||
```rust
|
||||
pub enum BaoFileStorage {
|
||||
Partial(PartialMemStorage), // Still downloading
|
||||
Complete(CompleteStorage), // Fully available
|
||||
}
|
||||
|
||||
pub struct PartialMemStorage {
|
||||
data: SparseMemFile, // Sparse byte array for data
|
||||
outboard: SparseMemFile, // Sparse byte array for BLAKE3 hash tree
|
||||
size: SizeInfo, // Known/estimated size
|
||||
bitfield: Bitfield, // Which chunks are verified
|
||||
}
|
||||
|
||||
pub struct CompleteStorage {
|
||||
data: Bytes, // Complete data
|
||||
outboard: Bytes, // Complete outboard (hash tree)
|
||||
}
|
||||
```
|
||||
|
||||
The `watch::Sender<BaoFileStorage>` pattern allows subscribers to observe state changes (for the `observe` API).
|
||||
|
||||
### Data Flow (Import)
|
||||
|
||||
1. `add_bytes(data)` → compute outboard via `PreOrderMemOutboard::create()` → transition `Partial → Complete`
|
||||
2. `import_bao(hash, size, stream)` → receive `BaoContentItem` stream → write to `PartialMemStorage` → update bitfield → transition to `Complete` when all chunks present
|
||||
|
||||
### Data Flow (Export)
|
||||
|
||||
1. `export_bao(hash, ranges)` → look up `BaoFileHandle` → `traverse_ranges_validated(data, outboard, &ranges, tx)` — streams validated BAO data
|
||||
|
||||
## FsStore Architecture (Hybrid Store)
|
||||
|
||||
The filesystem store uses a **hybrid approach** that stores small data inline in redb and large data as files on disk.
|
||||
|
||||
### Design Rationale (from DESIGN.md)
|
||||
|
||||
- **Databases** are good for small blobs (low per-entry overhead, fast random access)
|
||||
- **Filesystems** are good for large blobs (OS-level caching, direct file access)
|
||||
- **Neither alone** works well for both cases
|
||||
|
||||
### Layout
|
||||
|
||||
```
|
||||
<data_dir>/
|
||||
├── db/ # redb database
|
||||
│ ├── metadata table # Hash → EntryState
|
||||
│ ├── inline_data table # Hash → Bytes (for small blobs)
|
||||
│ ├── inline_outboard table # Hash → Bytes (for small outboards)
|
||||
│ └── tags table # Tag → HashAndFormat
|
||||
├── data/<hash>.data # Large blob data files
|
||||
├── data/<hash>.outboard # Large outboard files
|
||||
├── data/<hash>.sizes # Size tracking for partial files
|
||||
└── data/<hash>.bitfield # Validated chunk tracking for partial files
|
||||
```
|
||||
|
||||
### EntryState
|
||||
|
||||
```rust
|
||||
// Simplified from src/store/fs/entry_state.rs
|
||||
pub enum EntryState {
|
||||
Complete(CompleteEntryState),
|
||||
Partial(PartialEntryState),
|
||||
}
|
||||
|
||||
pub struct CompleteEntryState {
|
||||
pub data: DataLocation, // Inline, Owned (canonical path), or External (user path)
|
||||
pub outboard: OutboardLocation, // Inline, Owned, or NotNeeded
|
||||
pub size: u64,
|
||||
}
|
||||
|
||||
pub enum DataLocation {
|
||||
Inline, // Stored in redb inline_data table
|
||||
Owned, // File at canonical path <hash>.data
|
||||
External(Vec<PathBuf>), // User-owned file paths
|
||||
}
|
||||
|
||||
pub enum OutboardLocation {
|
||||
Inline, // Stored in redb inline_outboard table
|
||||
Owned, // File at canonical path <hash>.outboard
|
||||
NotNeeded, // Data ≤ 16 KiB, no outboard needed
|
||||
}
|
||||
|
||||
pub struct PartialEntryState {
|
||||
// Either we know the verified size, or we don't yet
|
||||
pub verified_size: Option<NonZeroU64>,
|
||||
}
|
||||
```
|
||||
|
||||
### Thresholds
|
||||
|
||||
- **Data inline threshold**: 16 KiB (default) — blobs smaller than this are stored entirely in redb
|
||||
- **Outboard inline threshold**: 16 KiB (default) — outboards smaller than this are stored in redb
|
||||
- Data ≤ 16 KiB has no outboard (not needed for verification of a single chunk group)
|
||||
|
||||
### Blob Lifecycle
|
||||
|
||||
**Adding a local file (known data, unknown hash)**:
|
||||
1. Compute the full BLAKE3 hash and outboard
|
||||
2. Atomically move the file into the store under the hash name
|
||||
3. Apply inlining rules: small files → redb, large files → filesystem
|
||||
|
||||
**Syncing from remote (known hash, unknown data)**:
|
||||
1. Start with no data — keep state in memory (not in database)
|
||||
2. As chunks arrive, write incrementally to partial files
|
||||
3. Once size is known to exceed the inline threshold, create database entry + filesystem files
|
||||
4. On completion, transition to `Complete` state and apply inlining rules
|
||||
|
||||
**Deletion**:
|
||||
- Tags protect content from GC
|
||||
- `TempTag` provides ephemeral (process-lifetime) protection
|
||||
- HashSeq tags protect the root blob AND all referenced child blobs
|
||||
- GC is mark-and-sweep: mark all reachable content via tags → sweep (delete) everything else
|
||||
- Explicit `force` deletion bypasses protection (emergency use only)
|
||||
|
||||
### FsStore Actor Architecture
|
||||
|
||||
```
|
||||
FsStore (ApiClient)
|
||||
│
|
||||
└── MainActor (tokio task)
|
||||
├── TaskContext { config, db_actor_sender }
|
||||
├── EntityMap: HashMap<Hash, ActiveEntityState> // Currently active entities
|
||||
├── JoinSet<TaskResult> // Running tasks
|
||||
├── TempTags // Ephemeral protection
|
||||
├── ProtectedSet // GC protection
|
||||
└── idle_waiters
|
||||
```
|
||||
|
||||
The FsStore uses an **entity manager** pattern where each hash gets a `BaoFileHandle` (like MemStore) when active, and entries are cleaned up when tasks complete.
|
||||
|
||||
## Garbage Collection
|
||||
|
||||
```rust
|
||||
pub struct GcConfig {
|
||||
pub interval: Duration,
|
||||
pub add_protected: Option<ProtectCb>, // Optional callback to add more protected hashes
|
||||
}
|
||||
```
|
||||
|
||||
GC is a two-phase process:
|
||||
1. **Mark**: Walk all tags (persistent + temp), collect reachable hashes. For HashSeq format, traverse the hash sequence to find all child hashes.
|
||||
2. **Sweep**: Delete all blobs not in the reachable set, in batches of 100.
|
||||
|
||||
GC runs automatically at a configurable interval via `run_gc(store, config)`, or manually via `gc_run_once(store, live)`.
|
||||
@@ -0,0 +1,202 @@
|
||||
# iroh-blobs: Remote API and Downloader
|
||||
|
||||
## Remote API
|
||||
|
||||
The `Remote` type (`api::remote::Remote`) provides the client-side interface for interacting with remote iroh-blobs providers. It's a thin wrapper around `ApiClient` that exposes fetch, observe, and push operations.
|
||||
|
||||
```rust
|
||||
let remote = store.remote(); // or Remote::from_sender(client)
|
||||
|
||||
// Get local info about what we already have
|
||||
let local = remote.local(hash_and_format).await?;
|
||||
|
||||
// Compute what we need
|
||||
let missing = local.missing();
|
||||
|
||||
// Execute a download
|
||||
let stats = remote.execute_get(connection, request).await?;
|
||||
|
||||
// Or use the simpler fetch API
|
||||
let progress = remote.fetch(connection, hash, format, store);
|
||||
```
|
||||
|
||||
### LocalInfo
|
||||
|
||||
```rust
|
||||
pub struct LocalInfo {
|
||||
pub size: Option<u64>, // Total size if known
|
||||
pub present: ChunkRanges, // Chunks we already have
|
||||
pub missing: ChunkRanges, // Chunks we still need
|
||||
pub hash_and_format: HashAndFormat,
|
||||
}
|
||||
```
|
||||
|
||||
`LocalInfo` is computed by querying the local store's bitfield for a given hash and comparing it against what a full download would require.
|
||||
|
||||
### Fetch Process
|
||||
|
||||
The `fetch` method handles the complete lifecycle:
|
||||
|
||||
1. **Local check**: Query the store for what we already have
|
||||
2. **Request computation**: If format is HashSeq, read the local HashSeq to compute precise missing ranges
|
||||
3. **Connection**: Open a QUIC stream to the provider
|
||||
4. **Transfer**: Use the get FSM to stream data into the store
|
||||
5. **Verification**: BLAKE3 verification happens in-stream during the transfer
|
||||
|
||||
For HashSeq format:
|
||||
- First fetch the root blob (the HashSeq)
|
||||
- Parse it to get child hashes
|
||||
- For each child, check local availability and compute missing ranges
|
||||
- Fetch only what's missing
|
||||
|
||||
### Observe
|
||||
|
||||
```rust
|
||||
// Subscribe to bitfield updates from a remote provider
|
||||
let mut stream = remote.observe(connection, hash).stream().await?;
|
||||
while let Some(bitfield) = stream.next().await {
|
||||
// Process availability updates
|
||||
}
|
||||
```
|
||||
|
||||
The observe protocol sends `ObserveItem` messages (size + available ranges) whenever new chunks become available on the provider. The initial message contains the full current state, subsequent messages contain deltas.
|
||||
|
||||
### Push
|
||||
|
||||
```rust
|
||||
// Push local data to a remote provider
|
||||
let progress = remote.push(connection, request, store);
|
||||
```
|
||||
|
||||
Push uses the same FSM-style approach but in reverse — the local side reads from the store and writes BLAKE3-verified data to the QUIC stream.
|
||||
|
||||
## Downloader API
|
||||
|
||||
The `Downloader` (`api::downloader::Downloader`) coordinates downloads from multiple sources:
|
||||
|
||||
```rust
|
||||
let downloader = Downloader::new(store, endpoint);
|
||||
|
||||
// Download from specific providers
|
||||
let progress = downloader.download(DownloadRequest {
|
||||
request: FiniteRequest::Get(get_request),
|
||||
providers: vec![endpoint_id_1, endpoint_id_2],
|
||||
strategy: SplitStrategy::Split,
|
||||
}).stream();
|
||||
```
|
||||
|
||||
### SplitStrategy
|
||||
|
||||
```rust
|
||||
pub enum SplitStrategy {
|
||||
Split, // Split the request across multiple providers
|
||||
None, // Use a single provider
|
||||
}
|
||||
```
|
||||
|
||||
When `SplitStrategy::Split` is used, the downloader:
|
||||
1. Splits the `GetRequest` into per-child requests
|
||||
2. Distributes children across available providers
|
||||
3. Downloads in parallel from multiple sources
|
||||
4. Stores each completed child into the local store
|
||||
|
||||
### DownloadRequest
|
||||
|
||||
```rust
|
||||
pub struct DownloadRequest {
|
||||
pub request: FiniteRequest, // What to download
|
||||
pub providers: Vec<EndpointId>, // Who to download from
|
||||
pub strategy: SplitStrategy, // How to split work
|
||||
}
|
||||
|
||||
pub enum FiniteRequest {
|
||||
Get(GetRequest),
|
||||
GetMany(GetManyRequest),
|
||||
}
|
||||
```
|
||||
|
||||
### Download Progress
|
||||
|
||||
```rust
|
||||
pub enum DownloadProgressItem {
|
||||
TryProvider { id: EndpointId, request: Arc<GetRequest> },
|
||||
ProviderFailed { id: EndpointId, request: Arc<GetRequest> },
|
||||
PartComplete { request: Arc<GetRequest> },
|
||||
Progress(u64),
|
||||
DownloadError,
|
||||
}
|
||||
```
|
||||
|
||||
## Connection Pooling
|
||||
|
||||
The `util::connection_pool::ConnectionPool` manages reusable QUIC connections:
|
||||
|
||||
```rust
|
||||
let pool = ConnectionPool::new(endpoint, ALPN, options);
|
||||
let connection = pool.connect(endpoint_id).await?;
|
||||
```
|
||||
|
||||
Options include connection timeout, idle timeout, and maximum connections per peer.
|
||||
|
||||
## Integration with iroh
|
||||
|
||||
### BlobsProtocol
|
||||
|
||||
```rust
|
||||
// src/net_protocol.rs
|
||||
pub struct BlobsProtocol {
|
||||
inner: Arc<BlobsInner>, // (Store, EventSender)
|
||||
}
|
||||
|
||||
impl ProtocolHandler for BlobsProtocol {
|
||||
async fn accept(&self, conn: Connection) -> Result<(), AcceptError> {
|
||||
crate::provider::handle_connection(conn, store, events).await;
|
||||
Ok(())
|
||||
}
|
||||
async fn shutdown(&self) { /* shutdown store */ }
|
||||
}
|
||||
```
|
||||
|
||||
Usage with iroh Router:
|
||||
|
||||
```rust
|
||||
let endpoint = Endpoint::bind(presets::N0).await?;
|
||||
let store = MemStore::new(); // or FsStore::load(path).await?
|
||||
let blobs = BlobsProtocol::new(&store, None);
|
||||
let router = Router::builder(endpoint)
|
||||
.accept(iroh_blobs::ALPN, blobs)
|
||||
.spawn();
|
||||
```
|
||||
|
||||
### Creating a BlobTicket
|
||||
|
||||
```rust
|
||||
let endpoint = Endpoint::bind(presets::N0).await?;
|
||||
endpoint.online().await;
|
||||
let addr = endpoint.addr();
|
||||
|
||||
let tag = store.add_slice(b"hello world").await?;
|
||||
let ticket = BlobTicket::new(addr, tag.hash, tag.format);
|
||||
println!("Share this: {ticket}");
|
||||
```
|
||||
|
||||
### Fetching from a Ticket
|
||||
|
||||
```rust
|
||||
// On the requester side
|
||||
let ticket: BlobTicket = ticket_str.parse()?;
|
||||
let (addr, hash, format) = ticket.into_parts();
|
||||
|
||||
let endpoint = Endpoint::bind(presets::N0).await?;
|
||||
let conn = endpoint.connect(addr, iroh_blobs::ALPN).await?;
|
||||
|
||||
let request = match format {
|
||||
BlobFormat::Raw => GetRequest::blob(hash),
|
||||
BlobFormat::HashSeq => GetRequest::all(hash),
|
||||
};
|
||||
|
||||
// Use the get FSM
|
||||
let fsm = get::fsm::start(conn, request, RequestCounters::default());
|
||||
let connected = fsm.next().await?;
|
||||
// ... drive the FSM to completion
|
||||
```
|
||||
@@ -0,0 +1,312 @@
|
||||
# iroh-blobs: Data Flow and Complete Example
|
||||
|
||||
## Complete Data Flow: Provider Side
|
||||
|
||||
```
|
||||
QUIC Connection Arrives
|
||||
│
|
||||
▼
|
||||
handle_connection(conn, store, events)
|
||||
│
|
||||
┌──────────┴──────────┐
|
||||
│ Accept QUIC BIDI │
|
||||
│ streams in loop │
|
||||
└──────────┬──────────┘
|
||||
│
|
||||
handle_stream(pair, store)
|
||||
│
|
||||
┌──────────┴──────────┐
|
||||
│ Read Request type │
|
||||
│ byte + deserialize │
|
||||
└──────────┬──────────┘
|
||||
│
|
||||
┌─────────────┬───────┼───────┬──────────────┐
|
||||
│ │ │ │ │
|
||||
handle_get handle_get handle handle (reserved)
|
||||
_many _observe _push
|
||||
│ │ │ │
|
||||
▼ ▼ ▼ ▼
|
||||
┌─────────────────────────────────────────────────┐
|
||||
│ For each (offset, ranges) in request.ranges: │
|
||||
│ │
|
||||
│ if offset == 0: │
|
||||
│ send_blob(store, 0, hash, ranges, writer) │
|
||||
│ else: │
|
||||
│ lookup hash in HashSeq[offset-1] │
|
||||
│ send_blob(store, offset, child_hash, ranges, writer) │
|
||||
│ │
|
||||
│ send_blob: │
|
||||
│ store.export_bao(hash, ranges) │
|
||||
│ .write_with_progress(writer, ctx, &hash, idx) │
|
||||
└─────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
## Complete Data Flow: Requester Side (Get FSM)
|
||||
|
||||
```
|
||||
Create GetRequest
|
||||
│
|
||||
▼
|
||||
fsm::start(connection, request, counters)
|
||||
│
|
||||
▼
|
||||
AtInitial.next()
|
||||
│ (open_bi, send request)
|
||||
▼
|
||||
AtConnected.next()
|
||||
│
|
||||
┌───────────┼───────────┐
|
||||
│ │ │
|
||||
StartRoot StartChild Closing
|
||||
(offset=0) (offset>0) (empty)
|
||||
│ │ │
|
||||
▼ ▼ ▼
|
||||
AtBlobHeader AtBlobHeader AtClosing
|
||||
.next() .next(hash) .next()
|
||||
│ │ │
|
||||
▼ ▼ ▼
|
||||
(size, AtBlobContent) Stats
|
||||
│
|
||||
┌────────┴────────┐
|
||||
│ │
|
||||
More(item) Done
|
||||
(loop back to (AtEndBlob)
|
||||
AtBlobContent) │
|
||||
┌─────┼─────┐
|
||||
│ │
|
||||
MoreChildren Closing
|
||||
(AtStartChild) (AtClosing)
|
||||
│ │
|
||||
└───────────┘
|
||||
```
|
||||
|
||||
### Blob Content Items
|
||||
|
||||
During `AtBlobContent`, items arrive as `BaoContentItem`:
|
||||
|
||||
```rust
|
||||
pub enum BaoContentItem {
|
||||
Parent(ParentNode), // (node, (left_hash, right_hash)) — 64 bytes
|
||||
Leaf(Leaf), // { offset: u64, data: Bytes } — actual data
|
||||
}
|
||||
```
|
||||
|
||||
- **Parent nodes** contain BLAKE3 hash pairs for tree verification. They're overhead (~64 bytes per internal node).
|
||||
- **Leaf nodes** contain actual data chunks. Each leaf's data is at most `IROH_BLOCK_SIZE` bytes (16 KiB).
|
||||
|
||||
Verification is automatic: the `ResponseDecoder` from `bao-tree` validates each chunk against the expected hash tree rooted at the request hash.
|
||||
|
||||
## Blob Verification and BaoTree Encoding
|
||||
|
||||
### How BLAKE3 Verified Streaming Works
|
||||
|
||||
1. **The hash is the root** of a binary Merkle tree
|
||||
2. **Internal nodes** store `(left_child_hash, right_child_hash)` — 64 bytes each
|
||||
3. **Leaf nodes** store the actual data chunks (up to 1024 bytes each in standard BLAKE3, or 16 KiB in iroh's block size)
|
||||
4. **Chunk groups** (16 chunks = 16 KiB) are the minimum verification unit in iroh-blobs
|
||||
|
||||
For a request with specific ranges:
|
||||
- The provider traverses the tree, yielding only nodes needed to verify the requested ranges
|
||||
- The requester can verify each chunk group independently after receiving its parent hash pair
|
||||
- Maximum undetected corruption: 16 KiB (one chunk group)
|
||||
|
||||
### Outboard Storage
|
||||
|
||||
The **outboard** is the BLAKE3 hash tree stored separately from the data. For the provider:
|
||||
- Small blobs (≤16 KiB): outboard is empty (not needed, single chunk group)
|
||||
- Large blobs: outboard stored as `PreOrderMemOutboard` (in-memory) or as a file (filesystem store)
|
||||
|
||||
For the requester, the outboard is built incrementally as data arrives.
|
||||
|
||||
## Import and Export Flows
|
||||
|
||||
### Import Bytes (Local Data)
|
||||
|
||||
```
|
||||
add_bytes(data) / add_slice(data)
|
||||
│
|
||||
▼
|
||||
ImportBytesRequest { data, format, scope }
|
||||
│
|
||||
▼
|
||||
Actor::import_bytes()
|
||||
│ 1. Send AddProgressItem::Size(len)
|
||||
│ 2. Send AddProgressItem::CopyDone
|
||||
│ 3. Compute outboard: PreOrderMemOutboard::create(&data, IROH_BLOCK_SIZE)
|
||||
│ 4. Return ImportEntry { data, outboard, scope, format, tx }
|
||||
│
|
||||
▼
|
||||
Actor::finish_import()
|
||||
│ 1. Get hash from outboard.root()
|
||||
│ 2. Get or create BaoFileHandle for hash
|
||||
│ 3. Transition BaoFileStorage::Partial → Complete
|
||||
│ 4. Create TempTag for the hash_and_format
|
||||
│ 5. Send AddProgressItem::Done(temp_tag)
|
||||
```
|
||||
|
||||
### Import BAO Stream (Remote Data)
|
||||
|
||||
```
|
||||
import_bao_bytes(hash, ranges, data) / import_bao_reader(hash, ranges, reader)
|
||||
│
|
||||
▼
|
||||
ImportBaoRequest { hash, size }
|
||||
│
|
||||
▼
|
||||
Actor::import_bao()
|
||||
│ 1. Set size on partial entry
|
||||
│ 2. Create BaoTree for the size
|
||||
│ 3. For each BaoContentItem from stream:
|
||||
│ - Parent: write hash pair to outboard
|
||||
│ - Leaf: write data to storage, update bitfield
|
||||
│ - If bitfield becomes complete: transition Partial → Complete
|
||||
│ 4. Send result
|
||||
```
|
||||
|
||||
### Export BAO
|
||||
|
||||
```
|
||||
export_bao(hash, ranges) → ExportBao
|
||||
│
|
||||
▼
|
||||
Actor::export_bao()
|
||||
│ 1. Look up BaoFileHandle for hash
|
||||
│ 2. If not found: send EncodeError::NotFound and return
|
||||
│ 3. Create BaoTreeSender from data + outboard readers
|
||||
│ 4. Call traverse_ranges_validated(data, outboard, &ranges, tx)
|
||||
│ → streams validated BAO items to the sender
|
||||
```
|
||||
|
||||
### Export Path (To Filesystem)
|
||||
|
||||
```
|
||||
export(hash, target_path) → ExportPath
|
||||
│
|
||||
▼
|
||||
Actor::export_path()
|
||||
│ 1. Look up BaoFileHandle for hash
|
||||
│ 2. Create parent directories if needed
|
||||
│ 3. Create file at target_path
|
||||
│ 4. Send ExportProgressItem::Size(total_size)
|
||||
│ 5. Read data from store in 64 KiB chunks
|
||||
│ 6. Write to file, yielding ExportProgressItem::CopyProgress(offset)
|
||||
│ 7. Send ExportProgressItem::Done
|
||||
```
|
||||
|
||||
## Observe Protocol Detail
|
||||
|
||||
```
|
||||
Requester Provider
|
||||
│ │
|
||||
│ ObserveRequest {hash, ranges} │
|
||||
│─────────────────────────────────►│
|
||||
│ │
|
||||
│ ObserveItem {size, ranges} │ (initial state)
|
||||
│◄─────────────────────────────────│
|
||||
│ │
|
||||
│ ... (time passes, more data │
|
||||
│ becomes available) │
|
||||
│ │
|
||||
│ ObserveItem {size, ranges} │ (delta update)
|
||||
│◄─────────────────────────────────│
|
||||
│ │
|
||||
│ ... (continue until │
|
||||
│ requester stops │
|
||||
│ or connection closes) │
|
||||
│ │
|
||||
│ STOP_STREAM │
|
||||
│─────────────────────────────────►│
|
||||
```
|
||||
|
||||
The observe protocol uses `Bitfield::diff()` to send only the new chunks since the last update, minimizing bandwidth.
|
||||
|
||||
## Full Working Example
|
||||
|
||||
```rust
|
||||
use iroh::{protocol::Router, Endpoint, endpoint::presets};
|
||||
use iroh_blobs::{store::mem::MemStore, BlobsProtocol, ticket::BlobTicket, BlobFormat};
|
||||
|
||||
// === Provider Side ===
|
||||
async fn provider() -> anyhow::Result<()> {
|
||||
let endpoint = Endpoint::bind(presets::N0).await?;
|
||||
let store = MemStore::new();
|
||||
|
||||
// Add some data
|
||||
let tag = store.add_slice(b"Hello, iroh-blobs!").await?;
|
||||
|
||||
let _ = endpoint.online().await;
|
||||
let addr = endpoint.addr();
|
||||
|
||||
// Create ticket for sharing
|
||||
let ticket = BlobTicket::new(addr, tag.hash, BlobFormat::Raw);
|
||||
println!("Ticket: {ticket}");
|
||||
|
||||
// Start serving
|
||||
let blobs = BlobsProtocol::new(&store, None);
|
||||
let router = Router::builder(endpoint)
|
||||
.accept(iroh_blobs::ALPN, blobs)
|
||||
.spawn();
|
||||
|
||||
tokio::signal::ctrl_c().await?;
|
||||
router.shutdown().await?;
|
||||
Ok(())
|
||||
}
|
||||
|
||||
// === Requester Side ===
|
||||
async fn requester(ticket: BlobTicket) -> anyhow::Result<()> {
|
||||
let (addr, hash, format) = ticket.into_parts();
|
||||
|
||||
let endpoint = Endpoint::bind(presets::N0).await?;
|
||||
let conn = endpoint.connect(addr, iroh_blobs::ALPN).await?;
|
||||
|
||||
// Build request based on format
|
||||
let request = match format {
|
||||
BlobFormat::Raw => iroh_blobs::protocol::GetRequest::blob(hash),
|
||||
BlobFormat::HashSeq => iroh_blobs::protocol::GetRequest::all(hash),
|
||||
};
|
||||
|
||||
// Use the get FSM
|
||||
let start = iroh_blobs::get::fsm::start(conn, request, Default::default());
|
||||
let connected = start.next().await?;
|
||||
let connected = connected.next().await?;
|
||||
|
||||
match connected {
|
||||
iroh_blobs::get::fsm::ConnectedNext::StartRoot(at_root) => {
|
||||
let (at_content, size) = at_root.next().next().await?;
|
||||
let (at_end, data) = at_content.concatenate_into_vec().await?;
|
||||
println!("Got {} bytes: {:?}", size, data);
|
||||
// ...
|
||||
}
|
||||
iroh_blobs::get::fsm::ConnectedNext::StartChild(at_child) => {
|
||||
// Need to know the child hash
|
||||
}
|
||||
iroh_blobs::get::fsm::ConnectedNext::Closing(at_closing) => {
|
||||
println!("Empty response");
|
||||
}
|
||||
}
|
||||
|
||||
Ok(())
|
||||
}
|
||||
```
|
||||
|
||||
## Simplified Fetch (Using Store + Remote)
|
||||
|
||||
```rust
|
||||
// The simplest way to download data
|
||||
let store = MemStore::new();
|
||||
let remote = store.remote();
|
||||
|
||||
// Fetch with automatic local availability checking
|
||||
let result = remote.fetch(connection, hash, format, &store).await?;
|
||||
// Result includes Stats with transfer metrics
|
||||
```
|
||||
|
||||
## Key Error Types
|
||||
|
||||
| Error Type | Location | Purpose |
|
||||
|------------|----------|---------|
|
||||
| `GetError` | `get::error` | Errors during get FSM |
|
||||
| `ExportBaoError` | `api` | Errors during BAO export |
|
||||
| `RequestError` | `api` | Store command errors |
|
||||
| `DecodeError` | `get::fsm` | BAO stream decode errors |
|
||||
| `ProgressError` | `provider::events` | Provider event errors |
|
||||
60
docs/research/references/iroh/iroh-blobs/README.md
Normal file
60
docs/research/references/iroh/iroh-blobs/README.md
Normal file
@@ -0,0 +1,60 @@
|
||||
# iroh-blobs Reference Documentation
|
||||
|
||||
This directory contains a comprehensive reference for the `iroh-blobs` crate (v0.100.0), a Rust library for content-addressed blob transfer over QUIC connections using BLAKE3 verified streaming.
|
||||
|
||||
## Documents
|
||||
|
||||
1. **[Overview and Architecture](01-overview-and-architecture.md)** — Core concepts, module structure, feature flags, and architecture diagram. Start here.
|
||||
|
||||
2. **[Key Types and Data Structures](02-key-types.md)** — Detailed reference for `Hash`, `BlobFormat`, `HashAndFormat`, `HashSeq`, `Bitfield`, `Tag`, `TempTag`, `BlobTicket`, `ChunkRanges`/`ChunkRangesSeq`/`RangeSpec`, and the store command protocol.
|
||||
|
||||
3. **[Transfer Protocol](03-transfer-protocol.md)** — Wire protocol specification: request types (`GetRequest`, `GetManyRequest`, `PushRequest`, `ObserveRequest`), response format (BLAKE3 verified streaming), the client-side FSM, provider handling, event system, and the Collection format.
|
||||
|
||||
4. **[Storage Architecture](04-storage.md)** — Store implementations: `MemStore` (in-memory), `FsStore` (hybrid redb + filesystem), `ReadonlyMemStore`. Covers the actor pattern, `BaoFileHandle`/`BaoFileStorage`, partial/complete states, the hybrid inline/file approach, entry states, blob lifecycle, and garbage collection.
|
||||
|
||||
5. **[Remote API and Downloader](05-remote-and-downloader.md)** — `Remote` API for fetching from/observing/pushing to peers, `Downloader` for multi-source downloads, connection pooling, and iroh integration via `BlobsProtocol`.
|
||||
|
||||
6. **[Data Flow and Examples](06-data-flow-and-examples.md)** — End-to-end data flow diagrams for provider and requester sides, BLAKE3 verification mechanics, import/export flows, observe protocol detail, and complete working examples.
|
||||
|
||||
## Quick Reference
|
||||
|
||||
### Creating a Provider
|
||||
|
||||
```rust
|
||||
use iroh::{protocol::Router, Endpoint, endpoint::presets};
|
||||
use iroh_blobs::{store::mem::MemStore, BlobsProtocol};
|
||||
|
||||
let endpoint = Endpoint::bind(presets::N0).await?;
|
||||
let store = MemStore::new();
|
||||
let tag = store.add_slice(b"data").await?;
|
||||
let blobs = BlobsProtocol::new(&store, None);
|
||||
let router = Router::builder(endpoint)
|
||||
.accept(iroh_blobs::ALPN, blobs)
|
||||
.spawn();
|
||||
```
|
||||
|
||||
### Key Constants
|
||||
|
||||
| Constant | Value | Meaning |
|
||||
|----------|-------|---------|
|
||||
| `ALPN` | `b"/iroh-bytes/4"` | QUIC ALPN protocol identifier |
|
||||
| `IROH_BLOCK_SIZE` | `BlockSize::from_chunk_log(4)` | 16 KiB chunk groups |
|
||||
| `MAX_MESSAGE_SIZE` | `1 MiB` | Maximum request message size |
|
||||
| `Hash::EMPTY` | BLAKE3 of `b""` | Hash of the empty blob |
|
||||
|
||||
### Core Crate Exports
|
||||
|
||||
```rust
|
||||
pub use hash::{BlobFormat, Hash, HashAndFormat};
|
||||
pub use hashseq::HashSeq;
|
||||
pub use net_protocol::BlobsProtocol;
|
||||
pub use protocol::ALPN;
|
||||
pub mod api; // Store API, Blobs, Tags, Downloader, Remote
|
||||
pub mod format; // Collection type
|
||||
pub mod get; // Client-side FSM
|
||||
pub mod protocol; // Wire protocol types (GetRequest, etc.)
|
||||
pub mod provider; // Server-side handling
|
||||
pub mod store; // Storage implementations
|
||||
pub mod ticket; // BlobTicket
|
||||
pub mod util; // Connection pool, temp tags, stream helpers
|
||||
```
|
||||
Reference in New Issue
Block a user