Files
alknet/docs/research/references/iroh/iroh-blobs/06-data-flow-and-examples.md

12 KiB

iroh-blobs: Data Flow and Complete Example

Complete Data Flow: Provider Side

                          QUIC Connection Arrives
                                    │
                                    ▼
                    handle_connection(conn, store, events)
                                    │
                         ┌──────────┴──────────┐
                         │  Accept QUIC BIDI    │
                         │  streams in loop     │
                         └──────────┬──────────┘
                                    │
                          handle_stream(pair, store)
                                    │
                         ┌──────────┴──────────┐
                         │  Read Request type   │
                         │  byte + deserialize  │
                         └──────────┬──────────┘
                                    │
              ┌─────────────┬───────┼───────┬──────────────┐
              │             │       │       │              │
         handle_get   handle_get  handle  handle      (reserved)
                       _many    _observe  _push
              │             │       │       │
              ▼             ▼       ▼       ▼
         ┌─────────────────────────────────────────────────┐
         │  For each (offset, ranges) in request.ranges:   │
         │                                                   │
         │  if offset == 0:                                  │
         │    send_blob(store, 0, hash, ranges, writer)      │
         │  else:                                            │
         │    lookup hash in HashSeq[offset-1]               │
         │    send_blob(store, offset, child_hash, ranges, writer) │
         │                                                   │
         │  send_blob:                                       │
         │    store.export_bao(hash, ranges)                 │
         │      .write_with_progress(writer, ctx, &hash, idx) │
         └─────────────────────────────────────────────────┘

Complete Data Flow: Requester Side (Get FSM)

                    Create GetRequest
                          │
                          ▼
               fsm::start(connection, request, counters)
                          │
                          ▼
                    AtInitial.next()
                          │ (open_bi, send request)
                          ▼
                    AtConnected.next()
                          │
              ┌───────────┼───────────┐
              │           │           │
        StartRoot    StartChild    Closing
        (offset=0)   (offset>0)    (empty)
              │           │           │
              ▼           ▼           ▼
        AtBlobHeader  AtBlobHeader  AtClosing
        .next()       .next(hash)   .next()
              │           │           │
              ▼           ▼           ▼
        (size, AtBlobContent)     Stats
              │
     ┌────────┴────────┐
     │                 │
  More(item)          Done
  (loop back to       (AtEndBlob)
  AtBlobContent)           │
                    ┌─────┼─────┐
                    │           │
              MoreChildren  Closing
              (AtStartChild) (AtClosing)
                    │           │
                    └───────────┘

Blob Content Items

During AtBlobContent, items arrive as BaoContentItem:

pub enum BaoContentItem {
    Parent(ParentNode),  // (node, (left_hash, right_hash)) — 64 bytes
    Leaf(Leaf),          // { offset: u64, data: Bytes } — actual data
}
  • Parent nodes contain BLAKE3 hash pairs for tree verification. They're overhead (~64 bytes per internal node).
  • Leaf nodes contain actual data chunks. Each leaf's data is at most IROH_BLOCK_SIZE bytes (16 KiB).

Verification is automatic: the ResponseDecoder from bao-tree validates each chunk against the expected hash tree rooted at the request hash.

Blob Verification and BaoTree Encoding

How BLAKE3 Verified Streaming Works

  1. The hash is the root of a binary Merkle tree
  2. Internal nodes store (left_child_hash, right_child_hash) — 64 bytes each
  3. Leaf nodes store the actual data chunks (up to 1024 bytes each in standard BLAKE3, or 16 KiB in iroh's block size)
  4. Chunk groups (16 chunks = 16 KiB) are the minimum verification unit in iroh-blobs

For a request with specific ranges:

  • The provider traverses the tree, yielding only nodes needed to verify the requested ranges
  • The requester can verify each chunk group independently after receiving its parent hash pair
  • Maximum undetected corruption: 16 KiB (one chunk group)

Outboard Storage

The outboard is the BLAKE3 hash tree stored separately from the data. For the provider:

  • Small blobs (≤16 KiB): outboard is empty (not needed, single chunk group)
  • Large blobs: outboard stored as PreOrderMemOutboard (in-memory) or as a file (filesystem store)

For the requester, the outboard is built incrementally as data arrives.

Import and Export Flows

Import Bytes (Local Data)

add_bytes(data) / add_slice(data)
       │
       ▼
ImportBytesRequest { data, format, scope }
       │
       ▼
Actor::import_bytes()
  │ 1. Send AddProgressItem::Size(len)
  │ 2. Send AddProgressItem::CopyDone
  │ 3. Compute outboard: PreOrderMemOutboard::create(&data, IROH_BLOCK_SIZE)
  │ 4. Return ImportEntry { data, outboard, scope, format, tx }
       │
       ▼
Actor::finish_import()
  │ 1. Get hash from outboard.root()
  │ 2. Get or create BaoFileHandle for hash
  │ 3. Transition BaoFileStorage::Partial → Complete
  │ 4. Create TempTag for the hash_and_format
  │ 5. Send AddProgressItem::Done(temp_tag)

Import BAO Stream (Remote Data)

import_bao_bytes(hash, ranges, data) / import_bao_reader(hash, ranges, reader)
       │
       ▼
ImportBaoRequest { hash, size }
       │
       ▼
Actor::import_bao()
  │ 1. Set size on partial entry
  │ 2. Create BaoTree for the size
  │ 3. For each BaoContentItem from stream:
  │    - Parent: write hash pair to outboard
  │    - Leaf: write data to storage, update bitfield
  │    - If bitfield becomes complete: transition Partial → Complete
  │ 4. Send result

Export BAO

export_bao(hash, ranges) → ExportBao
       │
       ▼
Actor::export_bao()
  │ 1. Look up BaoFileHandle for hash
  │ 2. If not found: send EncodeError::NotFound and return
  │ 3. Create BaoTreeSender from data + outboard readers
  │ 4. Call traverse_ranges_validated(data, outboard, &ranges, tx)
  │    → streams validated BAO items to the sender

Export Path (To Filesystem)

export(hash, target_path) → ExportPath
       │
       ▼
Actor::export_path()
  │ 1. Look up BaoFileHandle for hash
  │ 2. Create parent directories if needed
  │ 3. Create file at target_path
  │ 4. Send ExportProgressItem::Size(total_size)
  │ 5. Read data from store in 64 KiB chunks
  │ 6. Write to file, yielding ExportProgressItem::CopyProgress(offset)
  │ 7. Send ExportProgressItem::Done

Observe Protocol Detail

Requester                          Provider
    │                                  │
    │  ObserveRequest {hash, ranges}   │
    │─────────────────────────────────►│
    │                                  │
    │  ObserveItem {size, ranges}      │  (initial state)
    │◄─────────────────────────────────│
    │                                  │
    │  ... (time passes, more data      │
    │       becomes available)          │
    │                                  │
    │  ObserveItem {size, ranges}      │  (delta update)
    │◄─────────────────────────────────│
    │                                  │
    │  ... (continue until             │
    │       requester stops            │
    │       or connection closes)       │
    │                                  │
    │  STOP_STREAM                     │
    │─────────────────────────────────►│

The observe protocol uses Bitfield::diff() to send only the new chunks since the last update, minimizing bandwidth.

Full Working Example

use iroh::{protocol::Router, Endpoint, endpoint::presets};
use iroh_blobs::{store::mem::MemStore, BlobsProtocol, ticket::BlobTicket, BlobFormat};

// === Provider Side ===
async fn provider() -> anyhow::Result<()> {
    let endpoint = Endpoint::bind(presets::N0).await?;
    let store = MemStore::new();
    
    // Add some data
    let tag = store.add_slice(b"Hello, iroh-blobs!").await?;
    
    let _ = endpoint.online().await;
    let addr = endpoint.addr();
    
    // Create ticket for sharing
    let ticket = BlobTicket::new(addr, tag.hash, BlobFormat::Raw);
    println!("Ticket: {ticket}");
    
    // Start serving
    let blobs = BlobsProtocol::new(&store, None);
    let router = Router::builder(endpoint)
        .accept(iroh_blobs::ALPN, blobs)
        .spawn();
    
    tokio::signal::ctrl_c().await?;
    router.shutdown().await?;
    Ok(())
}

// === Requester Side ===
async fn requester(ticket: BlobTicket) -> anyhow::Result<()> {
    let (addr, hash, format) = ticket.into_parts();
    
    let endpoint = Endpoint::bind(presets::N0).await?;
    let conn = endpoint.connect(addr, iroh_blobs::ALPN).await?;
    
    // Build request based on format
    let request = match format {
        BlobFormat::Raw => iroh_blobs::protocol::GetRequest::blob(hash),
        BlobFormat::HashSeq => iroh_blobs::protocol::GetRequest::all(hash),
    };
    
    // Use the get FSM
    let start = iroh_blobs::get::fsm::start(conn, request, Default::default());
    let connected = start.next().await?;
    let connected = connected.next().await?;
    
    match connected {
        iroh_blobs::get::fsm::ConnectedNext::StartRoot(at_root) => {
            let (at_content, size) = at_root.next().next().await?;
            let (at_end, data) = at_content.concatenate_into_vec().await?;
            println!("Got {} bytes: {:?}", size, data);
            // ...
        }
        iroh_blobs::get::fsm::ConnectedNext::StartChild(at_child) => {
            // Need to know the child hash
        }
        iroh_blobs::get::fsm::ConnectedNext::Closing(at_closing) => {
            println!("Empty response");
        }
    }
    
    Ok(())
}

Simplified Fetch (Using Store + Remote)

// The simplest way to download data
let store = MemStore::new();
let remote = store.remote();

// Fetch with automatic local availability checking
let result = remote.fetch(connection, hash, format, &store).await?;
// Result includes Stats with transfer metrics

Key Error Types

Error Type Location Purpose
GetError get::error Errors during get FSM
ExportBaoError api Errors during BAO export
RequestError api Store command errors
DecodeError get::fsm BAO stream decode errors
ProgressError provider::events Provider event errors