# iroh-docs: Range-Based Set Reconciliation (Ranger) ## Overview The sync protocol in iroh-docs is based on **Range-Based Set Reconciliation**, implementing the algorithm described in [Aljoscha Meyer's paper (arXiv:2212.13567)](https://arxiv.org/abs/2212.13567). The core idea: two peers can efficiently compute the union of their entry sets by recursively partitioning the sets and comparing **fingerprints** (hashes) of partitions. When fingerprints match, no further work is needed. When they differ, the partition is subdivided until the difference can be resolved by sending the actual entries. ## Key Abstractions ### RangeEntry Trait ```rust pub trait RangeEntry: Debug + Clone { type Key: RangeKey; type Value: RangeValue; fn key(&self) -> &Self::Key; fn value(&self) -> &Self::Value; fn as_fingerprint(&self) -> Fingerprint; } ``` `SignedEntry` implements `RangeEntry`: - `Key` = `RecordIdentifier` (namespace || author || key bytes) - `Value` = `Record` (timestamp, hash, len) - Fingerprint = BLAKE3 hash of (namespace || author || key || timestamp || content_hash) ### RangeKey Trait ```rust pub trait RangeKey: Sized + Debug + Ord + PartialEq + Clone + 'static { fn is_prefix_of(&self, other: &Self) -> bool; // test-only } ``` `RecordIdentifier` implements this via byte-level prefix matching: `(namespace, author, key)` where key prefix matching supports the hierarchical deletion semantics. ### RangeValue Trait ```rust pub trait RangeValue: Sized + Debug + Ord + PartialEq + Clone + 'static {} ``` `Record` implements `RangeValue` with ordering by `(timestamp, hash)` — the Last-Writer-Wins ordering. ### Fingerprint ```rust pub struct Fingerprint(pub [u8; 32]); // BLAKE3 hash ``` Fingerprints are computed by XOR-ing the individual entry fingerprints within a range. This means: - The fingerprint of the empty set is `BLAKE3([])` (the hash of nothing) - Adding/removing an entry toggles its contribution via XOR - Equal sets produce equal fingerprints ## Range Concept A `Range` represents a half-open interval `[x, y)` in the key space, with special semantics: ```rust pub(crate) struct Range { x: K, y: K, } ``` - `x == y`: The entire set (all elements) - `x < y`: Standard half-open interval `[x, y)` — includes `x`, excludes `y` - `x > y`: Wrapping range — elements from `x` to end + beginning to `y` This wrapping range concept allows the algorithm to work with circular key spaces where the "first" element might be anywhere. ## Protocol Messages ```rust pub type ProtocolMessage = crate::ranger::Message; ``` ### Message Structure ```rust pub struct Message { parts: Vec>, } pub enum MessagePart { RangeFingerprint(RangeFingerprint), // "Here's a fingerprint for this range" RangeItem(RangeItem), // "Here are the entries in this range" } pub struct RangeFingerprint { range: Range, fingerprint: Fingerprint, } pub struct RangeItem { range: Range, values: Vec<(E, ContentStatus)>, have_local: bool, // If true, sender already has these entries } ``` The `have_local` flag is an optimization: when a peer sends entries AND indicates it already has them locally, the receiver doesn't need to send its own entries in that range back. ### Wire Format Messages are serialized using `postcard` (a compact serde format) and framed with a 4-byte big-endian length prefix via `SyncCodec`: ``` ┌─────────────────┬──────────────────────────────┐ │ u32 BE length │ postcard-encoded Message │ └─────────────────┴──────────────────────────────┘ ``` Max message size: 1 GiB (`MAX_MESSAGE_SIZE = 1024 * 1024 * 1024`). ## Sync Algorithm Walkthrough ### 1. Initiation (Alice → Bob) Alice generates the initial message: ```rust fn init>(store: &mut S) -> Result { let x = store.get_first()?; // First key, or default let range = Range::new(x.clone(), x); // "All elements" range let fingerprint = store.get_fingerprint(&range)?; Ok(Message { parts: vec![RangeFingerprint { range, fingerprint }] }) } ``` This sends a single fingerprint covering the entire set. ### 2. Processing (Bob processes Alice's message) For each part in the message: **Case 1: RangeFingerprint matches local fingerprint** → Nothing to do, sets are equal in this range. **Case 2: RangeFingerprint is empty OR range has ≤ 1 local entry** → Send all entries in the range as a `RangeItem`. **Case 3: Recurse** → Split the range into `split_factor` partitions, compute fingerprints, and send either `RangeFingerprint` (if partition is large) or `RangeItem` (if partition is small enough, ≤ `max_set_size`). ### 3. Processing RangeItem When a peer receives a `RangeItem`: 1. **Validate** each incoming entry using `validate_cb` 2. **Insert** valid entries via `Store::put()` (which handles prefix deletion) 3. **Notify** via `on_insert_cb` for actually-inserted entries 4. If `have_local` is false, compute the **diff** — entries in the local range not present in the received set — and send them back ### Configuration ```rust struct SyncConfig { max_set_size: usize, // Default: 1 — entries to send before using fingerprints split_factor: usize, // Default: 2 — number of partitions per recursion step } ``` With `max_set_size = 1` and `split_factor = 2`, the algorithm behaves like a binary search: each fingerprint mismatch splits the range in two and sends fingerprints for both halves. ## Store Trait The `Store` trait provides the interface that the reconciliation algorithm needs: ```rust pub trait Store: Sized { type Error: Debug + Send + Sync + Into + 'static; type RangeIterator<'a>: Iterator> where Self: 'a, E: 'a; type ParentIterator<'a>: Iterator> where Self: 'a, E: 'a; fn get_first(&mut self) -> Result; fn get_fingerprint(&mut self, range: &Range) -> Result; fn entry_put(&mut self, entry: E) -> Result<(), Self::Error>; fn get_range(&mut self, range: Range) -> Result, Self::Error>; fn prefixes_of(&mut self, key: &E::Key) -> Result, Self::Error>; fn remove_prefix_filtered(&mut self, prefix: &E::Key, predicate: impl Fn(&E::Value) -> bool) -> Result; fn initial_message(&mut self) -> Result, Self::Error>; async fn process_message(...) -> Result>, Self::Error>; fn put(&mut self, entry: E) -> Result; } ``` ### Insert Semantics in `Store::put()` The `put` method implements the CRDT insert logic: ```rust fn put(&mut self, entry: E) -> Result { // 1. Check prefix entries — if any parent entry has value >= new entry, reject for prefix_entry in self.prefixes_of(entry.key())? { if entry.value() <= prefix_entry.value() { return Ok(InsertOutcome::NotInserted); } } // 2. Remove entries whose key is prefixed by new entry's key AND whose value is <= let removed = self.remove_prefix_filtered(entry.key(), |v| entry.value() >= v)?; // 3. Insert the new entry self.entry_put(entry)?; Ok(InsertOutcome::Inserted { removed }) } ``` ### InsertOutcome ```rust enum InsertOutcome { NotInserted, // A newer or equal entry already exists Inserted { removed: usize }, // Successfully inserted; reports removed entries } ``` ## Sync Flow at the Protocol Level The `Replica` type provides the sync interface: ```rust // Create initial message for sync fn sync_initial_message(&mut self) -> anyhow::Result // Process an incoming message and produce optional reply async fn sync_process_message( &mut self, message: ProtocolMessage, from_peer: PeerIdBytes, state: &mut SyncOutcome, ) -> Result, anyhow::Error> ``` ### SyncOutcome Tracks the result of a sync session: ```rust pub struct SyncOutcome { pub heads_received: AuthorHeads, // Latest timestamps per author from remote pub num_recv: usize, // Number of entries received pub num_sent: usize, // Number of entries sent } ``` ## Network Protocol (Codec) The sync protocol operates over a QUIC bidirectional stream: 1. **Alice** (initiator) sends `Message::Init { namespace, message }` 2. **Bob** (responder) validates the namespace and either: - Accepts and processes the initial message - Rejects with `Message::Abort { reason }` 3. Both peers exchange `Message::Sync(message)` rounds until one side has no reply (convergence reached) The `BobState` manages the responder side, tracking namespace and `SyncOutcome` progress across message rounds. ### Abort Reasons ```rust pub enum AbortReason { NotFound, // Namespace not available AlreadySyncing, // Already syncing this namespace InternalServerError, } ``` ### Concurrent Sync Prevention When both peers try to sync with each other simultaneously, the system uses a deterministic tiebreaker based on comparing `EndpointId` bytes — the peer with the larger ID accepts, the other connects.