docs(research): add iroh suite deep-dive references for iroh, irpc, iroh-blobs, iroh-gossip, iroh-live, and iroh-docs

This commit is contained in:
2026-06-10 12:34:30 +00:00
parent 6e71d1f306
commit 5bb5e1064c
49 changed files with 9923 additions and 0 deletions

View File

@@ -0,0 +1,138 @@
# iroh-blobs: Overview and Architecture
**Version**: 0.100.0
**Repository**: https://github.com/n0-computer/iroh-blobs
**License**: MIT OR Apache-2.0
**Rust Edition**: 2021
**MSRV**: 1.89
## What It Is
`iroh-blobs` is a Rust crate for content-addressed blob transfer over QUIC connections, built on top of [iroh](https://docs.rs/iroh). It implements a request-response protocol for streaming BLAKE3-verified data between peers, along with store implementations for persisting blobs locally.
The core value proposition: transfer arbitrary-sized data with **cryptographic integrity guaranteed in-stream** — every 16 KiB chunk group can be verified against the BLAKE3 hash tree as it arrives, without waiting for the complete transfer.
## Core Concepts
| Concept | Description |
|---------|-------------|
| **Blob** | A sequence of bytes of arbitrary size, identified by its BLAKE3 hash. No metadata. |
| **Link** | A 32-byte BLAKE3 hash of a blob — the content address. |
| **HashSeq** | A blob whose content is a sequence of BLAKE3 hashes (each 32 bytes). Length must be a multiple of 32. |
| **Provider** | The side serving data. Waits for incoming requests and responds. |
| **Requester** | The side requesting data. Initiates connections and sends requests. |
| **Tag** | A persistent named reference to a `HashAndFormat`, protecting blobs from garbage collection. |
| **TempTag** | An ephemeral in-memory reference that protects content while the process runs. |
| **Chunk** | The fundamental BLAKE3 unit: 1024 bytes. |
| **Chunk Group** | Iroh's grouping of 16 chunks (16 KiB), the minimum granularity for range requests and verification. |
## Architecture Diagram
```
┌─────────────────────────────────────────────────────┐
│ Application │
│ │
│ ┌──────────┐ ┌──────────┐ ┌──────────────────┐ │
│ │ Blobs │ │ Tags │ │ Downloader │ │
│ │ API │ │ API │ │ API │ │
│ └────┬─────┘ └────┬─────┘ └───────┬──────────┘ │
│ │ │ │ │
│ └──────────────┴────────────────┘ │
│ │ │
│ ┌───────┴───────┐ │
│ │ Store (API) │ ← Actor-based, RPC │
│ │ Commands │ message passing │
│ └───────┬───────┘ │
│ │ │
│ ┌─────────────┼─────────────┐ │
│ │ │ │ │
│ ┌─────┴─────┐ ┌────┴────┐ ┌─────┴─────┐ │
│ │ MemStore │ │ FsStore │ │ Readonly │ │
│ │ │ │ (redb + │ │ MemStore │ │
│ │ │ │ fs) │ │ │ │
│ └────────────┘ └─────────┘ └───────────┘ │
└─────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────┐
│ Network Layer │
│ │
│ ┌──────────────────┐ ┌──────────────────────┐ │
│ │ BlobsProtocol │ │ Remote (Client) │ │
│ │ (Provider side) │ │ (Requester side) │ │
│ │ │ │ │ │
│ │ handle_conn() │ │ Remote::fetch() │ │
│ │ handle_stream() │ │ Remote::local() │ │
│ └────────┬─────────┘ └──────────┬───────────┘ │
│ │ │ │
│ └──────── iroh QUIC ───────┘ │
│ ALPN: /iroh-bytes/4 │
└─────────────────────────────────────────────────────┘
```
## Module Structure
```
iroh-blobs/src/
├── lib.rs # Crate root, re-exports
├── hash.rs # Hash, BlobFormat, HashAndFormat
├── hashseq.rs # HashSeq type
├── format.rs # Format module (Collection)
│ └── collection.rs # Collection type with metadata
├── protocol.rs # Wire protocol types (GetRequest, etc.)
│ └── range_spec.rs # ChunkRangesSeq, RangeSpec wire encoding
├── net_protocol.rs # BlobsProtocol (iroh ProtocolHandler)
├── provider.rs # Server-side request handling
│ └── events.rs # Event system (connect/disconnect/progress)
├── get.rs # Client-side FSM for getting data
│ ├── error.rs # GetError, GetResult types
│ └── request.rs # Request execution helpers
├── api/ # High-level store API
│ ├── blobs.rs # Blob operations (add, export, read, etc.)
│ │ └── reader.rs # BlobReader (AsyncRead + AsyncSeek)
│ ├── downloader.rs # Multi-source download coordinator
│ ├── remote.rs # Remote peer interaction (fetch, observe)
│ ├── tags.rs # Tag management API
│ ├── proto.rs # Store command protocol (RPC messages)
│ └── proto/ # Proto sub-modules
│ └── bitfield.rs # Bitfield type for chunk tracking
├── store/ # Storage implementations
│ ├── mod.rs # IROH_BLOCK_SIZE, GcConfig
│ ├── mem.rs # MemStore (in-memory, mutable)
│ ├── fs.rs # FsStore (filesystem + redb hybrid)
│ ├── readonly_mem.rs # Read-only memory store
│ ├── gc.rs # Garbage collection
│ ├── util.rs # Shared utilities (Tag, SparseMemFile, etc.)
│ └── test.rs # Test utilities
├── ticket.rs # BlobTicket (shareable connection info)
├── metrics.rs # Prometheus metrics definitions
└── util/ # Utilities
├── channel.rs # Channel helpers
├── connection_pool.rs # Connection pooling
├── stream.rs # Stream abstractions
└── temp_tag.rs # TempTag, TagCounter, TempTags scope management
```
## Key Dependencies
| Dependency | Purpose |
|------------|---------|
| `bao-tree` | BLAKE3 verified streaming, outboard storage, BaoTree encoding/decoding |
| `iroh` | QUIC networking, endpoint, router |
| `irpc` | RPC framework for store commands |
| `postcard` | Wire serialization (compact, no-schema) |
| `redb` | Embedded key-value database (fs-store feature) |
| `range-collections` | RangeSet2 / ChunkRanges for chunk tracking |
| `bytes` | Efficient byte buffer handling |
## Feature Flags
| Feature | Default | Description |
|---------|---------|-------------|
| `fs-store` | ✅ | Filesystem-based store with redb + file hybrid |
| `rpc` | ✅ | RPC support via `noq` / `irpc` |
| `metrics` | ❌ | Prometheus metrics |
| `hide-proto-docs` | ✅ | Hides protocol docs from rustdocs |
## BLAKE3 Block Size
The crate uses a fixed block size of `IROH_BLOCK_SIZE = BlockSize::from_chunk_log(4)`, which means each chunk group is 2^4 = 16 chunks = 16 × 1024 = 16,384 bytes (16 KiB). This is the minimum granularity for range requests and verification.

View File

@@ -0,0 +1,195 @@
# iroh-blobs: Key Types and Data Structures
## Hash
```rust
// src/hash.rs
pub struct Hash(blake3::Hash); // 32-byte BLAKE3 hash, wraps blake3::Hash
```
The fundamental content-address. Created via `Hash::new(data)` or `Hash::from_bytes([u8; 32])`. Has a constant `Hash::EMPTY` for the empty blob. Supports hex display, serde (compact binary for non-human-readable), and is stored as a 32-byte fixed array in redb.
Wire format: 32 raw bytes (postcard serialization). No framing overhead.
## BlobFormat
```rust
pub enum BlobFormat {
Raw, // A single blob
HashSeq, // A sequence of BLAKE3 hashes
}
```
Distinguishes between a raw binary blob and a hash sequence. Wire format: single byte (0 = Raw, 1 = HashSeq).
## HashAndFormat
```rust
pub struct HashAndFormat {
pub hash: Hash,
pub format: BlobFormat,
}
```
Pairs a hash with its format. Wire format: 33 bytes (32 for hash + 1 for format). Display format: hex string, optionally prefixed with 's' for HashSeq.
## HashSeq
```rust
// src/hashseq.rs
pub struct HashSeq(Bytes); // Wrapper around Bytes, length must be multiple of 32
```
A blob interpreted as a sequence of 32-byte BLAKE3 hashes. Created from `Bytes` via `HashSeq::new(bytes)` (returns `None` if length is not a multiple of 32). Iterable, supports `get(index)`, `pop_front()`.
Used extensively: collections are stored as a HashSeq where the first child is metadata and subsequent children are data blobs.
## Bitfield
```rust
// src/api/proto/bitfield.rs
pub struct Bitfield {
pub size: u64, // Total size of the blob in bytes
pub ranges: ChunkRanges, // Which chunks are verified/present
}
```
Tracks which chunks of a blob are present and verified. Key methods:
- `is_complete()` — all chunks present
- `validated_size()` — how many bytes are verified
- `diff(&other)` — compute the delta between two bitfields
Used by the observe protocol and internal state tracking.
## Tag
```rust
// src/store/util.rs
pub struct Tag(pub Bytes); // Named reference, arbitrary bytes, typically UTF-8
```
A persistent named reference to content in the store. Tags protect content from garbage collection. Auto-generated tags use the format `"auto-2026-01-15T12:34:56.789Z"`. Tags are stored in the store's database and can be listed, created, renamed, and deleted.
## TempTag
```rust
// src/util/temp_tag.rs
pub struct TempTag {
inner: HashAndFormat,
on_drop: Option<Weak<dyn TagDrop>>, // Callback when dropped
}
```
An ephemeral, in-memory tag. While a `TempTag` exists, its referenced content is protected from garbage collection. When dropped, the `TagDrop` callback notifies the store to unprotect. Can be `leak()`ed to make the protection permanent for the process lifetime.
Scopes: `TempTagScope` manages groups of temp tags. `Scope::GLOBAL` is the default scope. Batches of operations can create scoped temp tags that are cleaned up together.
## BlobTicket
```rust
// src/ticket.rs
pub struct BlobTicket {
addr: EndpointAddr, // How to reach the provider (includes EndpointId, relay URL, direct addresses)
format: BlobFormat, // Raw or HashSeq
hash: Hash, // What to retrieve
}
```
A shareable token containing everything needed to retrieve a blob from a provider. Serialized via `iroh_tickets::Ticket` trait (base32-encoded with "blob" prefix). Wire format uses postcard with a variant discriminator.
```rust
// Creating a ticket
let ticket = BlobTicket::new(addr, hash, BlobFormat::Raw);
// From a ticket string
let ticket: BlobTicket = ticket_str.parse()?;
```
## ChunkRanges and ChunkRangesSeq
### ChunkRanges
```rust
pub type ChunkRanges = RangeSet2<ChunkNum>; // From range_collections crate
```
A set of non-overlapping chunk ranges. Supports boolean operations (union, intersection, difference). The fundamental unit is `ChunkNum` (a u64 newtype representing a 1024-byte BLAKE3 chunk).
Helper trait `ChunkRangesExt` provides:
- `ChunkRanges::all()` — all chunks
- `ChunkRanges::bytes(range)` — byte range rounded up to chunk boundaries
- `ChunkRanges::chunks(range)` — chunk range from u64 bounds
- `ChunkRanges::last_chunk()` — the very last chunk (for size verification)
- `ChunkRanges::chunk(n)` — a single chunk
- `ChunkRanges::offset(n)` — a single byte offset rounded to chunk
### ChunkRangesSeq
```rust
// src/protocol/range_spec.rs
pub struct ChunkRangesSeq(SmallVec<[(u64, ChunkRanges); 2]>);
```
A sequence of `ChunkRanges`, one per blob in a HashSeq. Uses run-length encoding: stores `(offset, ranges)` pairs, where offset is the first blob index with that range spec. Unspecified indices default to the most recent range (or empty for finite sequences).
Key methods:
- `ChunkRangesSeq::all()` — request everything (root + all children, forever)
- `ChunkRangesSeq::root()` — request only the root blob
- `ChunkRangesSeq::empty()` — request nothing
- `ChunkRangesSeq::from_ranges(ranges)` — from explicit iterator
- `ChunkRangesSeq::from_ranges_infinite(ranges)` — last range repeats forever
- `.iter_non_empty_infinite()` — iterate only non-empty ranges
- `.is_blob()` — true if requesting a single blob (offset 0 with one entry)
### RangeSpec (Wire Format)
```rust
pub struct RangeSpec(SmallVec<[u64; 2]>);
```
The on-wire encoding of `ChunkRanges`. Uses alternating spans: first span is deselected, second is selected, etc. SmallVec avoids allocation for the common case of a single range.
Examples:
- `[]` — empty (nothing selected)
- `[0]` — everything from chunk 0 selected (entire blob)
- `[2, 5, 3, 1]` — chunks 2-7 and 10-11 selected
- `[u64::MAX]` — only the last chunk (size proof)
### ChunkRangesSeq Wire Format
Serialized as `(SmallVec<[(u64, RangeSpec); 2]>)` where each element is `(delta_offset, rangespec)`. The `delta_offset` is the distance from the previous entry. Uses postcard varint encoding for compact transmission.
## Store Command Protocol
The store API uses an RPC-style command pattern via `irpc`. Each command has a `Command` enum variant with typed request/response channels:
```rust
#[rpc_requests(message = Command, alias = "Msg", rpc_feature = "rpc")]
pub enum Request {
ListBlobs(ListRequest),
Batch(BatchRequest),
DeleteBlobs(BlobDeleteRequest),
ImportBao(ImportBaoRequest), // streaming: rx bao items, tx result
ExportBao(ExportBaoRequest), // streaming: tx encoded items
ExportRanges(ExportRangesRequest), // streaming: tx range data
Observe(ObserveRequest), // streaming: tx bitfield updates
BlobStatus(BlobStatusRequest),
ImportBytes(ImportBytesRequest),
ImportByteStream(ImportByteStreamRequest), // duplex streaming
ImportPath(ImportPathRequest),
ExportPath(ExportPathRequest),
ListTags(ListTagsRequest),
SetTag(SetTagRequest),
DeleteTags(DeleteTagsRequest),
RenameTag(RenameTagRequest),
CreateTag(CreateTagRequest),
CreateTempTag(CreateTempTagRequest),
ListTempTags(ListTempTagsRequest),
SyncDb(SyncDbRequest),
WaitIdle(WaitIdleRequest),
Shutdown(ShutdownRequest),
ClearProtected(ClearProtectedRequest),
}
```
This allows both local (in-process) and remote (RPC) store access through the same API surface.

View File

@@ -0,0 +1,249 @@
# iroh-blobs: Transfer Protocol
## Overview
The transfer protocol is a **request-response** protocol operating over QUIC streams (via iroh). The ALPN is `b"/iroh-bytes/4"`.
The requester opens a bidirectional QUIC stream, sends a request, and the provider responds with BLAKE3-verified streaming data on the same stream.
**Key properties**:
- Data integrity is verified in-stream — every 16 KiB chunk group can be independently verified against the BLAKE3 hash tree
- No upper limit on blob or collection size — streaming design avoids buffering entire transfers
- Zero round-trip overhead for multiple small blobs (via HashSeq/GetManyRequest)
- Range requests supported at chunk granularity
## Request Types
```rust
pub enum Request {
Get(GetRequest),
Observe(ObserveRequest),
Slot2, Slot3, Slot4, Slot5, Slot6, Slot7, // Reserved
Push(PushRequest),
GetMany(GetManyRequest),
}
```
Wire format: 1-byte discriminator (postcard-encoded `RequestType` enum), followed by postcard-serialized request body.
### GetRequest
```rust
pub struct GetRequest {
pub hash: Hash, // BLAKE3 hash of the root blob
pub ranges: ChunkRangesSeq, // What ranges to request
}
```
The most common request type. The `ranges` field uses `ChunkRangesSeq` to express which parts of the root blob and its children to request.
**Common patterns**:
```rust
// Request an entire single blob
let req = GetRequest::blob(hash);
// -> ChunkRangesSeq with a single element: all chunks of the root
// Request a HashSeq (root + all children)
let req = GetRequest::all(hash);
// -> ChunkRangesSeq::all() - infinite sequence of "all chunks"
// Request parts of a single blob
let req = GetRequest::builder()
.root(ChunkRanges::bytes(0..1000))
.build(hash);
// Request a HashSeq with specific child ranges
let req = GetRequest::builder()
.root(ChunkRanges::all()) // full root (the hash seq)
.child(1, ChunkRanges::bytes(0..100)) // partial child 1
.next(ChunkRanges::all()) // full remaining children
.build_open(hash); // build_open = last range repeats forever
```
### GetManyRequest
```rust
pub struct GetManyRequest {
pub hashes: Vec<Hash>, // Sorted, deduplicated list of hashes
pub ranges: ChunkRangesSeq, // Ranges for each hash (no root entry)
}
```
Like a `GetRequest` for a HashSeq, but the hashes are provided by the requester instead of looked up from the provider. This avoids the provider needing to have a pre-existing HashSeq blob.
```rust
let req = GetManyRequest::builder()
.hash(hash1, ChunkRanges::all())
.hash(hash2, ChunkRanges::all())
.build();
// Deduplicates and sorts hashes automatically
```
### PushRequest
```rust
pub struct PushRequest(GetRequest); // Wraps a GetRequest
```
The inverse of a GetRequest — the requester pushes data to the provider. The request describes what will be sent, followed by the actual data stream. Providers may reject push requests (disabled by default via `EventMask`).
### ObserveRequest
```rust
pub struct ObserveRequest {
pub hash: Hash,
pub ranges: RangeSpec, // Which ranges to observe
}
```
Subscribes to availability changes for a blob's bitfield. The provider sends `ObserveItem` updates as chunks become available.
## Response Format
### For Get/GetMany/Push
The response is BLAKE3-verified streaming data (bao-tree format). For each blob in the request:
1. **8-byte size header** (little-endian u64) — the total size of the blob
2. **BLAKE3 verified stream** — encoded data for the requested ranges, using bao-tree's mixed encoding:
- `BaoContentItem::Parent(node, (left_hash, right_hash))` — internal hash tree nodes (64 bytes each)
- `BaoContentItem::Leaf(Leaf { offset, data })` — actual data chunks
The data is sent in order: ascending chunks for each blob, blobs in HashSeq order.
**Verification**: The requester validates each chunk group against the expected BLAKE3 hash tree. Invalid data is detected within at most 16 KiB of reception. Missing data (provider doesn't have a chunk) causes the provider to close the stream at the point where data becomes unavailable.
### For Observe
The provider sends length-prefixed `ObserveItem` messages:
```rust
pub struct ObserveItem {
pub size: u64, // Blob size
pub ranges: ChunkRanges, // Available chunks
}
```
Updates are sent as deltas — only the new chunks that have become available since the last update.
## Error Handling
Error codes for stream/connection closure:
| Code | Name | Meaning |
|------|------|---------|
| 0 | StreamDropped | RecvStream was dropped |
| 1 | ProviderTerminating | Provider is shutting down |
| 2 | RequestReceived | Only one request per stream allowed |
| 1 (application) | ERR_PERMISSION | Permission denied |
| 2 (application) | ERR_LIMIT | Rate limited |
| 3 (application) | ERR_INTERNAL | Internal error |
## Client-Side FSM (Get)
The `get::fsm` module implements the get request as a **finite state machine** for maximum control:
```
AtInitial
│ (open QUIC stream)
AtConnected
│ (send request, drop writer)
ConnectedNext ─┬─ StartRoot(hash, ranges) // offset 0 = root blob
├─ StartChild(offset, ranges) // offset > 0 = child blob
└─ Closing // empty request
AtStartRoot / AtStartChild
│ (determine hash for child)
AtBlobHeader
│ (read 8-byte size)
AtBlobContent
│ (stream BLAKE3-verified items)
├─ More(content_item) → AtBlobContent // loop
└─ Done → AtEndBlob
AtEndBlob
│ (iterate to next blob in sequence)
├─ MoreChildren(AtStartChild)
└─ Closing
│ (drain remaining bytes)
Stats (transfer statistics)
```
Each state transition is explicit. The FSM gives the consumer full control:
- `AtBlobContent::next()` returns `BlobContentNext::More((content, item))` or `BlobContentNext::Done(end)`
- `AtBlobHeader::next()` reads the size header and creates a `ResponseDecoder`
- `AtStartChild::next(hash)` requires the caller to supply the hash (from the HashSeq)
### Stats Tracking
```rust
pub struct Stats {
pub payload_bytes_read: u64, // Actual data bytes
pub other_bytes_read: u64, // Hash pairs, headers
pub payload_bytes_written: u64, // For push
pub other_bytes_written: u64, // For push
pub elapsed: Duration,
}
```
## Provider-Side Handling
```rust
pub async fn handle_connection(connection: Connection, store: Store, events: EventSender);
```
The provider accepts QUIC streams on a connection. For each stream:
1. Read the request type byte
2. Deserialize the request
3. Dispatch to `handle_get`, `handle_get_many`, `handle_observe`, or `handle_push`
4. For `handle_get`: iterate over the `ChunkRangesSeq`, streaming each blob via `store.export_bao(hash, ranges)`
5. For HashSeq requests: load the root blob, parse it as `HashSeq`, then stream each requested child
### Event System
The provider can emit events for monitoring and access control:
```rust
pub struct EventMask {
pub connected: ConnectMode, // None, Notify, Intercept
pub get: RequestMode, // None, Notify, Intercept, NotifyLog, InterceptLog, Disabled
pub get_many: RequestMode,
pub push: RequestMode, // Disabled by default!
pub observe: ObserveMode,
pub throttle: ThrottleMode, // None, Intercept
}
```
- **None**: No events, requests processed normally
- **Notify**: Events sent but cannot block requests
- **Intercept**: Events sent as RPC requests; handler can reject with `AbortReason`
- **Disabled**: All requests of this type rejected
Progress events: `TransferStarted`, `TransferProgress`, `TransferCompleted`, `TransferAborted`.
## Collection Format
```rust
pub struct Collection {
blobs: Vec<(String, Hash)>, // Named references to child blobs
}
```
Wire format (as a HashSeq blob):
1. First child blob: `CollectionMeta` serialized with postcard
2. Remaining children: the actual data blobs
```rust
pub struct CollectionMeta {
header: [u8; 13], // Must be b"CollectionV0."
names: Vec<String>, // Names for each child blob
}
```
The header `b"CollectionV0."` is a magic number for format identification. The meta blob's hash becomes the first entry in the HashSeq, followed by the hashes of each data blob. Names correspond 1:1 with data blobs (excluding the meta entry).

View File

@@ -0,0 +1,250 @@
# iroh-blobs: Storage Architecture
## Overview
iroh-blobs provides three store implementations sharing a common `Store` API surface:
| Store | Location | Mutable | Use Case |
|-------|----------|---------|----------|
| `MemStore` | In-memory | ✅ | Small data, testing, WASM |
| `FsStore` | Filesystem + redb | ✅ | Production, large data |
| `ReadonlyMemStore` | In-memory | ❌ | Static data serving |
All stores implement the same RPC-based command protocol (`Command` enum), allowing both local in-process and remote RPC access through the same `Store` type.
## Store API Surface
The `Store` type (from `api::Store`) is the primary interface. It's accessed via typed sub-APIs:
```rust
let store: Store = /* ... */;
// Blob operations
store.blobs() // → Blobs API (add, export, read, delete, observe, etc.)
store.tags() // → Tags API (create, list, set, delete, rename)
// Direct operations
store.add_bytes(data) // → AddProgress
store.add_slice(data) // → TempTag (convenience)
store.get_bytes(hash) // → Result<Bytes>
store.has(hash) // → bool
store.shutdown() // Clean shutdown
store.wait_idle() // Wait for all tasks to complete
store.sync_db() // Sync database to disk (FsStore)
```
## Blobs API
```rust
let blobs = store.blobs();
// Import
blobs.add_slice(data) // → AddProgress (raw format)
blobs.add_bytes(data) // → AddProgress (raw format)
blobs.add_bytes_with_opts(AddBytesOptions{..}) // → AddProgress (with format)
blobs.import_byte_stream(format) // → streaming import
// Export
blobs.reader(hash) // → BlobReader (AsyncRead + AsyncSeek)
blobs.export(hash, path) // → export to filesystem
blobs.export_bao(hash, ranges) // → ExportBao (BLAKE3 verified stream)
blobs.export_ranges(hash, ranges) // → ExportRanges (raw data ranges)
// Observe (subscribe to chunk availability)
blobs.observe(hash) // → ObserveAt (bitfield stream)
// Status
blobs.status(hash) // → BlobStatus (NotFound/Partial/Complete)
// Import BAO-encoded data
blobs.import_bao_bytes(hash, ranges, data) // → import verified BAO stream
blobs.import_bao_reader(hash, ranges, reader) // → import from async reader
// Batch operations (scoped temp tags)
blobs.batch() // → Batch (auto-cleanup scope)
// Delete
blobs.delete(hashes) // → force delete (use GC normally)
```
## Tags API
```rust
let tags = store.tags();
tags.set(name, value) // Set a persistent tag
tags.create(value) // Auto-generate a tag name, return Tag
tags.get(name) // → Option<TagInfo>
tags.list() // → Stream<TagInfo>
tags.list_hash_seq() // → Stream<TagInfo> (only HashSeq format)
tags.delete(name) // Delete a tag
tags.delete_range(range) // Delete tags in range
tags.delete_prefix(prefix) // Delete tags with prefix
tags.rename(from, to) // Atomically rename a tag
tags.temp_tag(value) // → TempTag (ephemeral protection)
```
## MemStore Architecture
The in-memory store uses a simple actor pattern:
```
MemStore (ApiClient)
└── Actor (tokio task)
├── State
│ ├── data: HashMap<Hash, BaoFileHandle> // All blob data
│ ├── tags: BTreeMap<Tag, HashAndFormat> // Persistent tags
│ └── empty_hash: BaoFileHandle // Special entry for empty blob
├── tasks: JoinSet<TaskResult> // Spawned import/export tasks
├── temp_tags: TempTags // Ephemeral protection
├── protected: HashSet<Hash> // GC-protected hashes
└── idle_waiters: Vec<oneshot::Sender<()>> // Wait-idle notifications
```
### BaoFileHandle / BaoFileStorage
```rust
pub enum BaoFileStorage {
Partial(PartialMemStorage), // Still downloading
Complete(CompleteStorage), // Fully available
}
pub struct PartialMemStorage {
data: SparseMemFile, // Sparse byte array for data
outboard: SparseMemFile, // Sparse byte array for BLAKE3 hash tree
size: SizeInfo, // Known/estimated size
bitfield: Bitfield, // Which chunks are verified
}
pub struct CompleteStorage {
data: Bytes, // Complete data
outboard: Bytes, // Complete outboard (hash tree)
}
```
The `watch::Sender<BaoFileStorage>` pattern allows subscribers to observe state changes (for the `observe` API).
### Data Flow (Import)
1. `add_bytes(data)` → compute outboard via `PreOrderMemOutboard::create()` → transition `Partial → Complete`
2. `import_bao(hash, size, stream)` → receive `BaoContentItem` stream → write to `PartialMemStorage` → update bitfield → transition to `Complete` when all chunks present
### Data Flow (Export)
1. `export_bao(hash, ranges)` → look up `BaoFileHandle``traverse_ranges_validated(data, outboard, &ranges, tx)` — streams validated BAO data
## FsStore Architecture (Hybrid Store)
The filesystem store uses a **hybrid approach** that stores small data inline in redb and large data as files on disk.
### Design Rationale (from DESIGN.md)
- **Databases** are good for small blobs (low per-entry overhead, fast random access)
- **Filesystems** are good for large blobs (OS-level caching, direct file access)
- **Neither alone** works well for both cases
### Layout
```
<data_dir>/
├── db/ # redb database
│ ├── metadata table # Hash → EntryState
│ ├── inline_data table # Hash → Bytes (for small blobs)
│ ├── inline_outboard table # Hash → Bytes (for small outboards)
│ └── tags table # Tag → HashAndFormat
├── data/<hash>.data # Large blob data files
├── data/<hash>.outboard # Large outboard files
├── data/<hash>.sizes # Size tracking for partial files
└── data/<hash>.bitfield # Validated chunk tracking for partial files
```
### EntryState
```rust
// Simplified from src/store/fs/entry_state.rs
pub enum EntryState {
Complete(CompleteEntryState),
Partial(PartialEntryState),
}
pub struct CompleteEntryState {
pub data: DataLocation, // Inline, Owned (canonical path), or External (user path)
pub outboard: OutboardLocation, // Inline, Owned, or NotNeeded
pub size: u64,
}
pub enum DataLocation {
Inline, // Stored in redb inline_data table
Owned, // File at canonical path <hash>.data
External(Vec<PathBuf>), // User-owned file paths
}
pub enum OutboardLocation {
Inline, // Stored in redb inline_outboard table
Owned, // File at canonical path <hash>.outboard
NotNeeded, // Data ≤ 16 KiB, no outboard needed
}
pub struct PartialEntryState {
// Either we know the verified size, or we don't yet
pub verified_size: Option<NonZeroU64>,
}
```
### Thresholds
- **Data inline threshold**: 16 KiB (default) — blobs smaller than this are stored entirely in redb
- **Outboard inline threshold**: 16 KiB (default) — outboards smaller than this are stored in redb
- Data ≤ 16 KiB has no outboard (not needed for verification of a single chunk group)
### Blob Lifecycle
**Adding a local file (known data, unknown hash)**:
1. Compute the full BLAKE3 hash and outboard
2. Atomically move the file into the store under the hash name
3. Apply inlining rules: small files → redb, large files → filesystem
**Syncing from remote (known hash, unknown data)**:
1. Start with no data — keep state in memory (not in database)
2. As chunks arrive, write incrementally to partial files
3. Once size is known to exceed the inline threshold, create database entry + filesystem files
4. On completion, transition to `Complete` state and apply inlining rules
**Deletion**:
- Tags protect content from GC
- `TempTag` provides ephemeral (process-lifetime) protection
- HashSeq tags protect the root blob AND all referenced child blobs
- GC is mark-and-sweep: mark all reachable content via tags → sweep (delete) everything else
- Explicit `force` deletion bypasses protection (emergency use only)
### FsStore Actor Architecture
```
FsStore (ApiClient)
└── MainActor (tokio task)
├── TaskContext { config, db_actor_sender }
├── EntityMap: HashMap<Hash, ActiveEntityState> // Currently active entities
├── JoinSet<TaskResult> // Running tasks
├── TempTags // Ephemeral protection
├── ProtectedSet // GC protection
└── idle_waiters
```
The FsStore uses an **entity manager** pattern where each hash gets a `BaoFileHandle` (like MemStore) when active, and entries are cleaned up when tasks complete.
## Garbage Collection
```rust
pub struct GcConfig {
pub interval: Duration,
pub add_protected: Option<ProtectCb>, // Optional callback to add more protected hashes
}
```
GC is a two-phase process:
1. **Mark**: Walk all tags (persistent + temp), collect reachable hashes. For HashSeq format, traverse the hash sequence to find all child hashes.
2. **Sweep**: Delete all blobs not in the reachable set, in batches of 100.
GC runs automatically at a configurable interval via `run_gc(store, config)`, or manually via `gc_run_once(store, live)`.

View File

@@ -0,0 +1,202 @@
# iroh-blobs: Remote API and Downloader
## Remote API
The `Remote` type (`api::remote::Remote`) provides the client-side interface for interacting with remote iroh-blobs providers. It's a thin wrapper around `ApiClient` that exposes fetch, observe, and push operations.
```rust
let remote = store.remote(); // or Remote::from_sender(client)
// Get local info about what we already have
let local = remote.local(hash_and_format).await?;
// Compute what we need
let missing = local.missing();
// Execute a download
let stats = remote.execute_get(connection, request).await?;
// Or use the simpler fetch API
let progress = remote.fetch(connection, hash, format, store);
```
### LocalInfo
```rust
pub struct LocalInfo {
pub size: Option<u64>, // Total size if known
pub present: ChunkRanges, // Chunks we already have
pub missing: ChunkRanges, // Chunks we still need
pub hash_and_format: HashAndFormat,
}
```
`LocalInfo` is computed by querying the local store's bitfield for a given hash and comparing it against what a full download would require.
### Fetch Process
The `fetch` method handles the complete lifecycle:
1. **Local check**: Query the store for what we already have
2. **Request computation**: If format is HashSeq, read the local HashSeq to compute precise missing ranges
3. **Connection**: Open a QUIC stream to the provider
4. **Transfer**: Use the get FSM to stream data into the store
5. **Verification**: BLAKE3 verification happens in-stream during the transfer
For HashSeq format:
- First fetch the root blob (the HashSeq)
- Parse it to get child hashes
- For each child, check local availability and compute missing ranges
- Fetch only what's missing
### Observe
```rust
// Subscribe to bitfield updates from a remote provider
let mut stream = remote.observe(connection, hash).stream().await?;
while let Some(bitfield) = stream.next().await {
// Process availability updates
}
```
The observe protocol sends `ObserveItem` messages (size + available ranges) whenever new chunks become available on the provider. The initial message contains the full current state, subsequent messages contain deltas.
### Push
```rust
// Push local data to a remote provider
let progress = remote.push(connection, request, store);
```
Push uses the same FSM-style approach but in reverse — the local side reads from the store and writes BLAKE3-verified data to the QUIC stream.
## Downloader API
The `Downloader` (`api::downloader::Downloader`) coordinates downloads from multiple sources:
```rust
let downloader = Downloader::new(store, endpoint);
// Download from specific providers
let progress = downloader.download(DownloadRequest {
request: FiniteRequest::Get(get_request),
providers: vec![endpoint_id_1, endpoint_id_2],
strategy: SplitStrategy::Split,
}).stream();
```
### SplitStrategy
```rust
pub enum SplitStrategy {
Split, // Split the request across multiple providers
None, // Use a single provider
}
```
When `SplitStrategy::Split` is used, the downloader:
1. Splits the `GetRequest` into per-child requests
2. Distributes children across available providers
3. Downloads in parallel from multiple sources
4. Stores each completed child into the local store
### DownloadRequest
```rust
pub struct DownloadRequest {
pub request: FiniteRequest, // What to download
pub providers: Vec<EndpointId>, // Who to download from
pub strategy: SplitStrategy, // How to split work
}
pub enum FiniteRequest {
Get(GetRequest),
GetMany(GetManyRequest),
}
```
### Download Progress
```rust
pub enum DownloadProgressItem {
TryProvider { id: EndpointId, request: Arc<GetRequest> },
ProviderFailed { id: EndpointId, request: Arc<GetRequest> },
PartComplete { request: Arc<GetRequest> },
Progress(u64),
DownloadError,
}
```
## Connection Pooling
The `util::connection_pool::ConnectionPool` manages reusable QUIC connections:
```rust
let pool = ConnectionPool::new(endpoint, ALPN, options);
let connection = pool.connect(endpoint_id).await?;
```
Options include connection timeout, idle timeout, and maximum connections per peer.
## Integration with iroh
### BlobsProtocol
```rust
// src/net_protocol.rs
pub struct BlobsProtocol {
inner: Arc<BlobsInner>, // (Store, EventSender)
}
impl ProtocolHandler for BlobsProtocol {
async fn accept(&self, conn: Connection) -> Result<(), AcceptError> {
crate::provider::handle_connection(conn, store, events).await;
Ok(())
}
async fn shutdown(&self) { /* shutdown store */ }
}
```
Usage with iroh Router:
```rust
let endpoint = Endpoint::bind(presets::N0).await?;
let store = MemStore::new(); // or FsStore::load(path).await?
let blobs = BlobsProtocol::new(&store, None);
let router = Router::builder(endpoint)
.accept(iroh_blobs::ALPN, blobs)
.spawn();
```
### Creating a BlobTicket
```rust
let endpoint = Endpoint::bind(presets::N0).await?;
endpoint.online().await;
let addr = endpoint.addr();
let tag = store.add_slice(b"hello world").await?;
let ticket = BlobTicket::new(addr, tag.hash, tag.format);
println!("Share this: {ticket}");
```
### Fetching from a Ticket
```rust
// On the requester side
let ticket: BlobTicket = ticket_str.parse()?;
let (addr, hash, format) = ticket.into_parts();
let endpoint = Endpoint::bind(presets::N0).await?;
let conn = endpoint.connect(addr, iroh_blobs::ALPN).await?;
let request = match format {
BlobFormat::Raw => GetRequest::blob(hash),
BlobFormat::HashSeq => GetRequest::all(hash),
};
// Use the get FSM
let fsm = get::fsm::start(conn, request, RequestCounters::default());
let connected = fsm.next().await?;
// ... drive the FSM to completion
```

View File

@@ -0,0 +1,312 @@
# iroh-blobs: Data Flow and Complete Example
## Complete Data Flow: Provider Side
```
QUIC Connection Arrives
handle_connection(conn, store, events)
┌──────────┴──────────┐
│ Accept QUIC BIDI │
│ streams in loop │
└──────────┬──────────┘
handle_stream(pair, store)
┌──────────┴──────────┐
│ Read Request type │
│ byte + deserialize │
└──────────┬──────────┘
┌─────────────┬───────┼───────┬──────────────┐
│ │ │ │ │
handle_get handle_get handle handle (reserved)
_many _observe _push
│ │ │ │
▼ ▼ ▼ ▼
┌─────────────────────────────────────────────────┐
│ For each (offset, ranges) in request.ranges: │
│ │
│ if offset == 0: │
│ send_blob(store, 0, hash, ranges, writer) │
│ else: │
│ lookup hash in HashSeq[offset-1] │
│ send_blob(store, offset, child_hash, ranges, writer) │
│ │
│ send_blob: │
│ store.export_bao(hash, ranges) │
│ .write_with_progress(writer, ctx, &hash, idx) │
└─────────────────────────────────────────────────┘
```
## Complete Data Flow: Requester Side (Get FSM)
```
Create GetRequest
fsm::start(connection, request, counters)
AtInitial.next()
│ (open_bi, send request)
AtConnected.next()
┌───────────┼───────────┐
│ │ │
StartRoot StartChild Closing
(offset=0) (offset>0) (empty)
│ │ │
▼ ▼ ▼
AtBlobHeader AtBlobHeader AtClosing
.next() .next(hash) .next()
│ │ │
▼ ▼ ▼
(size, AtBlobContent) Stats
┌────────┴────────┐
│ │
More(item) Done
(loop back to (AtEndBlob)
AtBlobContent) │
┌─────┼─────┐
│ │
MoreChildren Closing
(AtStartChild) (AtClosing)
│ │
└───────────┘
```
### Blob Content Items
During `AtBlobContent`, items arrive as `BaoContentItem`:
```rust
pub enum BaoContentItem {
Parent(ParentNode), // (node, (left_hash, right_hash)) — 64 bytes
Leaf(Leaf), // { offset: u64, data: Bytes } — actual data
}
```
- **Parent nodes** contain BLAKE3 hash pairs for tree verification. They're overhead (~64 bytes per internal node).
- **Leaf nodes** contain actual data chunks. Each leaf's data is at most `IROH_BLOCK_SIZE` bytes (16 KiB).
Verification is automatic: the `ResponseDecoder` from `bao-tree` validates each chunk against the expected hash tree rooted at the request hash.
## Blob Verification and BaoTree Encoding
### How BLAKE3 Verified Streaming Works
1. **The hash is the root** of a binary Merkle tree
2. **Internal nodes** store `(left_child_hash, right_child_hash)` — 64 bytes each
3. **Leaf nodes** store the actual data chunks (up to 1024 bytes each in standard BLAKE3, or 16 KiB in iroh's block size)
4. **Chunk groups** (16 chunks = 16 KiB) are the minimum verification unit in iroh-blobs
For a request with specific ranges:
- The provider traverses the tree, yielding only nodes needed to verify the requested ranges
- The requester can verify each chunk group independently after receiving its parent hash pair
- Maximum undetected corruption: 16 KiB (one chunk group)
### Outboard Storage
The **outboard** is the BLAKE3 hash tree stored separately from the data. For the provider:
- Small blobs (≤16 KiB): outboard is empty (not needed, single chunk group)
- Large blobs: outboard stored as `PreOrderMemOutboard` (in-memory) or as a file (filesystem store)
For the requester, the outboard is built incrementally as data arrives.
## Import and Export Flows
### Import Bytes (Local Data)
```
add_bytes(data) / add_slice(data)
ImportBytesRequest { data, format, scope }
Actor::import_bytes()
│ 1. Send AddProgressItem::Size(len)
│ 2. Send AddProgressItem::CopyDone
│ 3. Compute outboard: PreOrderMemOutboard::create(&data, IROH_BLOCK_SIZE)
│ 4. Return ImportEntry { data, outboard, scope, format, tx }
Actor::finish_import()
│ 1. Get hash from outboard.root()
│ 2. Get or create BaoFileHandle for hash
│ 3. Transition BaoFileStorage::Partial → Complete
│ 4. Create TempTag for the hash_and_format
│ 5. Send AddProgressItem::Done(temp_tag)
```
### Import BAO Stream (Remote Data)
```
import_bao_bytes(hash, ranges, data) / import_bao_reader(hash, ranges, reader)
ImportBaoRequest { hash, size }
Actor::import_bao()
│ 1. Set size on partial entry
│ 2. Create BaoTree for the size
│ 3. For each BaoContentItem from stream:
│ - Parent: write hash pair to outboard
│ - Leaf: write data to storage, update bitfield
│ - If bitfield becomes complete: transition Partial → Complete
│ 4. Send result
```
### Export BAO
```
export_bao(hash, ranges) → ExportBao
Actor::export_bao()
│ 1. Look up BaoFileHandle for hash
│ 2. If not found: send EncodeError::NotFound and return
│ 3. Create BaoTreeSender from data + outboard readers
│ 4. Call traverse_ranges_validated(data, outboard, &ranges, tx)
│ → streams validated BAO items to the sender
```
### Export Path (To Filesystem)
```
export(hash, target_path) → ExportPath
Actor::export_path()
│ 1. Look up BaoFileHandle for hash
│ 2. Create parent directories if needed
│ 3. Create file at target_path
│ 4. Send ExportProgressItem::Size(total_size)
│ 5. Read data from store in 64 KiB chunks
│ 6. Write to file, yielding ExportProgressItem::CopyProgress(offset)
│ 7. Send ExportProgressItem::Done
```
## Observe Protocol Detail
```
Requester Provider
│ │
│ ObserveRequest {hash, ranges} │
│─────────────────────────────────►│
│ │
│ ObserveItem {size, ranges} │ (initial state)
│◄─────────────────────────────────│
│ │
│ ... (time passes, more data │
│ becomes available) │
│ │
│ ObserveItem {size, ranges} │ (delta update)
│◄─────────────────────────────────│
│ │
│ ... (continue until │
│ requester stops │
│ or connection closes) │
│ │
│ STOP_STREAM │
│─────────────────────────────────►│
```
The observe protocol uses `Bitfield::diff()` to send only the new chunks since the last update, minimizing bandwidth.
## Full Working Example
```rust
use iroh::{protocol::Router, Endpoint, endpoint::presets};
use iroh_blobs::{store::mem::MemStore, BlobsProtocol, ticket::BlobTicket, BlobFormat};
// === Provider Side ===
async fn provider() -> anyhow::Result<()> {
let endpoint = Endpoint::bind(presets::N0).await?;
let store = MemStore::new();
// Add some data
let tag = store.add_slice(b"Hello, iroh-blobs!").await?;
let _ = endpoint.online().await;
let addr = endpoint.addr();
// Create ticket for sharing
let ticket = BlobTicket::new(addr, tag.hash, BlobFormat::Raw);
println!("Ticket: {ticket}");
// Start serving
let blobs = BlobsProtocol::new(&store, None);
let router = Router::builder(endpoint)
.accept(iroh_blobs::ALPN, blobs)
.spawn();
tokio::signal::ctrl_c().await?;
router.shutdown().await?;
Ok(())
}
// === Requester Side ===
async fn requester(ticket: BlobTicket) -> anyhow::Result<()> {
let (addr, hash, format) = ticket.into_parts();
let endpoint = Endpoint::bind(presets::N0).await?;
let conn = endpoint.connect(addr, iroh_blobs::ALPN).await?;
// Build request based on format
let request = match format {
BlobFormat::Raw => iroh_blobs::protocol::GetRequest::blob(hash),
BlobFormat::HashSeq => iroh_blobs::protocol::GetRequest::all(hash),
};
// Use the get FSM
let start = iroh_blobs::get::fsm::start(conn, request, Default::default());
let connected = start.next().await?;
let connected = connected.next().await?;
match connected {
iroh_blobs::get::fsm::ConnectedNext::StartRoot(at_root) => {
let (at_content, size) = at_root.next().next().await?;
let (at_end, data) = at_content.concatenate_into_vec().await?;
println!("Got {} bytes: {:?}", size, data);
// ...
}
iroh_blobs::get::fsm::ConnectedNext::StartChild(at_child) => {
// Need to know the child hash
}
iroh_blobs::get::fsm::ConnectedNext::Closing(at_closing) => {
println!("Empty response");
}
}
Ok(())
}
```
## Simplified Fetch (Using Store + Remote)
```rust
// The simplest way to download data
let store = MemStore::new();
let remote = store.remote();
// Fetch with automatic local availability checking
let result = remote.fetch(connection, hash, format, &store).await?;
// Result includes Stats with transfer metrics
```
## Key Error Types
| Error Type | Location | Purpose |
|------------|----------|---------|
| `GetError` | `get::error` | Errors during get FSM |
| `ExportBaoError` | `api` | Errors during BAO export |
| `RequestError` | `api` | Store command errors |
| `DecodeError` | `get::fsm` | BAO stream decode errors |
| `ProgressError` | `provider::events` | Provider event errors |

View File

@@ -0,0 +1,60 @@
# iroh-blobs Reference Documentation
This directory contains a comprehensive reference for the `iroh-blobs` crate (v0.100.0), a Rust library for content-addressed blob transfer over QUIC connections using BLAKE3 verified streaming.
## Documents
1. **[Overview and Architecture](01-overview-and-architecture.md)** — Core concepts, module structure, feature flags, and architecture diagram. Start here.
2. **[Key Types and Data Structures](02-key-types.md)** — Detailed reference for `Hash`, `BlobFormat`, `HashAndFormat`, `HashSeq`, `Bitfield`, `Tag`, `TempTag`, `BlobTicket`, `ChunkRanges`/`ChunkRangesSeq`/`RangeSpec`, and the store command protocol.
3. **[Transfer Protocol](03-transfer-protocol.md)** — Wire protocol specification: request types (`GetRequest`, `GetManyRequest`, `PushRequest`, `ObserveRequest`), response format (BLAKE3 verified streaming), the client-side FSM, provider handling, event system, and the Collection format.
4. **[Storage Architecture](04-storage.md)** — Store implementations: `MemStore` (in-memory), `FsStore` (hybrid redb + filesystem), `ReadonlyMemStore`. Covers the actor pattern, `BaoFileHandle`/`BaoFileStorage`, partial/complete states, the hybrid inline/file approach, entry states, blob lifecycle, and garbage collection.
5. **[Remote API and Downloader](05-remote-and-downloader.md)** — `Remote` API for fetching from/observing/pushing to peers, `Downloader` for multi-source downloads, connection pooling, and iroh integration via `BlobsProtocol`.
6. **[Data Flow and Examples](06-data-flow-and-examples.md)** — End-to-end data flow diagrams for provider and requester sides, BLAKE3 verification mechanics, import/export flows, observe protocol detail, and complete working examples.
## Quick Reference
### Creating a Provider
```rust
use iroh::{protocol::Router, Endpoint, endpoint::presets};
use iroh_blobs::{store::mem::MemStore, BlobsProtocol};
let endpoint = Endpoint::bind(presets::N0).await?;
let store = MemStore::new();
let tag = store.add_slice(b"data").await?;
let blobs = BlobsProtocol::new(&store, None);
let router = Router::builder(endpoint)
.accept(iroh_blobs::ALPN, blobs)
.spawn();
```
### Key Constants
| Constant | Value | Meaning |
|----------|-------|---------|
| `ALPN` | `b"/iroh-bytes/4"` | QUIC ALPN protocol identifier |
| `IROH_BLOCK_SIZE` | `BlockSize::from_chunk_log(4)` | 16 KiB chunk groups |
| `MAX_MESSAGE_SIZE` | `1 MiB` | Maximum request message size |
| `Hash::EMPTY` | BLAKE3 of `b""` | Hash of the empty blob |
### Core Crate Exports
```rust
pub use hash::{BlobFormat, Hash, HashAndFormat};
pub use hashseq::HashSeq;
pub use net_protocol::BlobsProtocol;
pub use protocol::ALPN;
pub mod api; // Store API, Blobs, Tags, Downloader, Remote
pub mod format; // Collection type
pub mod get; // Client-side FSM
pub mod protocol; // Wire protocol types (GetRequest, etc.)
pub mod provider; // Server-side handling
pub mod store; // Storage implementations
pub mod ticket; // BlobTicket
pub mod util; // Connection pool, temp tags, stream helpers
```