Files
alknet/docs/research/references/iroh/iroh-gossip/05-net-actor.md

9.3 KiB

iroh-gossip: Networking Layer & Actor Model

Overview

The net module (src/net.rs and submodules) provides the async runtime layer that connects the IO-free protocol state machine to real network IO via iroh QUIC connections. It is built around a single Actor that manages all topics and connections.

ALPN Protocol

pub const GOSSIP_ALPN: &[u8] = b"/iroh-gossip/1";

This ALPN identifier is used when establishing QUIC connections through iroh.

Gossip Handle (net::Gossip)

#[derive(Debug, Clone)]
pub struct Gossip {
    pub(crate) inner: Arc<Inner>,
}

Gossip is the primary public handle. It derefs to GossipApi, providing the user-facing interface:

// Subscribe to a topic
let (sender, receiver) = gossip.subscribe(topic_id, bootstrap_peers).await?.split();

// Subscribe and wait for at least one connection
let topic = gossip.subscribe_and_join(topic_id, bootstrap_peers).await?;

// Broadcast a message
sender.broadcast(b"hello world".to_vec().into()).await?;

// Broadcast to neighbors only
sender.broadcast_neighbors(b"local announcement".to_vec().into()).await?;

// Join additional peers
sender.join_peers(vec![peer_id]).await?;

Builder Pattern

let gossip = Gossip::builder()
    .max_message_size(8192)            // Default: 4096
    .membership_config(hyparview_config)  // HyParView settings
    .broadcast_config(plumtree_config)    // PlumTree settings
    .alpn(b"/custom-alpn")               // Custom ALPN (must match across network)
    .spawn(endpoint);

Architecture: The Actor

The core of the networking layer is the Actor struct, which runs as a single async task:

struct Actor {
    alpn: Bytes,
    state: proto::State<PublicKey, StdRng>,     // Protocol state machine
    endpoint: Endpoint,                          // iroh endpoint for connections
    dialer: Dialer,                              // Manages outgoing connections
    rpc_rx: mpsc::Receiver<RpcMessage>,          // API commands
    local_rx: mpsc::Receiver<LocalActorMessage>, // Local commands (connections, shutdown)
    in_event_tx: mpsc::Sender<InEvent>,          // Protocol input channel
    in_event_rx: mpsc::Receiver<InEvent>,        // Protocol input channel (receiver)
    timers: Timers<Timer>,                        // Scheduled timers
    topics: HashMap<TopicId, TopicState>,         // Per-topic subscription state
    peers: HashMap<EndpointId, PeerState>,        // Per-peer connection state
    command_rx: stream_group::Keyed<TopicCommandStream>,  // Per-topic command streams
    quit_queue: VecDeque<TopicId>,               // Topics pending unsubscription
    connection_tasks: JoinSet<...>,               // Running connection loop tasks
    metrics: Arc<Metrics>,
    topic_event_forwarders: JoinSet<TopicId>,    // Tasks forwarding events to subscribers
    address_lookup: GossipAddressLookup,          // Address discovery integration
}

Event Loop

The actor's run() method calls event_loop() in a loop. Each iteration uses tokio::select! to handle:

Source Action
local_rx (local messages) Handle incoming connections or shutdown
rpc_rx (RPC messages) Process Join requests from the API
command_rx (per-topic commands) Process Broadcast, BroadcastNeighbors, JoinPeers, or stream closure
addr_updates (endpoint addr changes) Update our PeerData in the protocol state
dialer (connection establishment) Handle successful/failed outgoing connections
in_event_rx (protocol events from connections) Feed events to the protocol state machine
timers (scheduled timers) Feed timer expirations to the protocol state machine
connection_tasks (connection task completions) Handle peer disconnections
topic_event_forwarders (subscription tasks) Handle topic cleanup when all subscribers drop

Processing InEvents

When an InEvent is processed, the actor calls self.state.handle(event, now, metrics), which returns Vec<OutEvent>. For each OutEvent:

OutEvent Action
SendMessage(peer, message) Send via peer's active connection or queue for pending connection
EmitEvent(topic, event) Forward to topic's broadcast::Sender → subscribers
ScheduleTimer(delay, timer) Schedule timer via Timers data structure
DisconnectPeer(peer) Drop the peer's send channel, removing from peers map
PeerData(endpoint_id, data) Decode AddrInfo from PeerData, add to GossipAddressLookup

Connection Management

Peer States

enum PeerState {
    Pending {
        queue: Vec<ProtoMessage>,   // Messages queued while connecting
    },
    Active {
        active_send_tx: mpsc::Sender<ProtoMessage>,  // Current active send channel
        active_conn_id: ConnId,                       // Stable ID of active connection
        other_conns: Vec<ConnId>,                     // Older connections still closing
    },
}

When a message needs to be sent to a peer:

  • Active: Send immediately via active_send_tx
  • Pending: Queue the message and initiate a dial

Dialer

struct Dialer {
    endpoint: Endpoint,
    pending: JoinSet<(EndpointId, Option<Result<Connection, ConnectError>>)>,
    pending_dials: HashMap<EndpointId, CancellationToken>,
}

The Dialer manages outgoing connections. It:

  1. Checks if a dial is already pending for a peer
  2. Spawns an async connection task with cancellation support
  3. Returns completed connections via next_conn()

Connection Loop

Each peer connection runs a connection_loop task:

async fn connection_loop(
    from: PublicKey,           // Remote peer's public key
    conn: Connection,          // QUIC connection
    origin: ConnOrigin,        // Accept (incoming) or Dial (outgoing)
    send_rx: mpsc::Receiver<ProtoMessage>,  // Messages to send
    in_event_tx: mpsc::Sender<InEvent>,      // Channel to protocol
    max_message_size: usize,                   // Maximum message size
    queue: Vec<ProtoMessage>,                  // Queued messages to send first
) -> Result<(), ConnectionLoopError>

The connection loop:

  1. First sends any queued messages
  2. Runs a send loop and receive loop concurrently (tokio::join!)
  3. Uses iroh QUIC bidirectional streams for communication

Wire Protocol

Messages are serialized with postcard and sent as length-prefixed frames over QUIC unidirectional streams:

┌──────────────┐
│ Stream Header │  ── Contains TopicId (sent once per stream)
├──────────────┤
│ Frame (len)  │  ── u32 length prefix
│ Frame (data) │  ── postcard-encoded topic::Message<PublicKey>
├──────────────┤
│ Frame (len)  │  ── next message...
│ Frame (data) │
└──────────────┘

Each topic gets its own unidirectional stream. The stream header is sent once when the stream is opened. Disconnect messages close the stream after being sent.

The SendLoop manages per-topic streams within a connection:

struct SendLoop {
    conn: Connection,
    streams: HashMap<TopicId, SendStream>,  // One stream per topic
    buffer: Vec<u8>,
    max_message_size: usize,
    send_rx: mpsc::Receiver<ProtoMessage>,
}

When a disconnect message is sent for a topic, the stream for that topic is closed (via finish()).

Topic State (Net Layer)

struct TopicState {
    neighbors: BTreeSet<EndpointId>,          // Current active neighbors (from protocol)
    event_sender: broadcast::Sender<ProtoEvent>,  // Broadcast channel to subscribers
    command_rx_keys: HashSet<stream_group::Key>,   // Active command stream keys
}

A topic is considered "still needed" if it has either:

  • Active command receivers (publishers), or
  • Active event subscribers (subscribers)

When neither exists, the topic is queued for quit/unsubscription.

Address Lookup Integration

The GossipAddressLookup integrates with iroh's address discovery system:

pub(crate) struct GossipAddressLookup {
    endpoints: NodeMap,                      // BTreeMap<EndpointId, StoredEndpointInfo>
    _task_handle: Arc<AbortOnDropHandle<()>>, // Background eviction task
}

It implements iroh's AddressLookup trait, allowing gossip-discovered peer addresses to feed back into iroh's connection establishment. This means that when a peer shares its address information in Join or ForwardJoin messages, that information is used to help iroh connect to that peer.

Entries expire after 5 minutes (configurable via RetentionOpts), with eviction checks every 30 seconds.

Metrics

The Metrics struct tracks various counters:

Metric Description
msgs_ctrl_sent Control messages sent
msgs_ctrl_recv Control messages received
msgs_data_sent Data messages sent
msgs_data_recv Data messages received
msgs_data_sent_size Total size of data messages sent
msgs_data_recv_size Total size of data messages received
msgs_ctrl_sent_size Total size of control messages sent
msgs_ctrl_recv_size Total size of control messages received
neighbor_up Neighbor connections established
neighbor_down Neighbor connections lost
actor_tick_* Various event loop tick counters