Files
alknet/docs/research/references/nats.rs/nats-async/03-protocol-and-wire-format.md

7.9 KiB

async-nats: NATS Protocol & Wire Format

Protocol Overview

NATS uses a simple, text-based protocol over TCP. Messages are terminated with \r\n. The protocol is symmetric for client and server operations.

Client → Server Operations (ClientOp)

pub(crate) enum ClientOp {
    Publish { subject, payload, respond, headers },
    Subscribe { sid, subject, queue_group },
    Unsubscribe { sid, max },
    Ping,
    Pong,
    Connect(ConnectInfo),
}

Server → Client Operations (ServerOp)

pub(crate) enum ServerOp {
    Ok,
    Info(Box<ServerInfo>),
    Ping,
    Pong,
    Error(ServerError),
    Message { sid, subject, reply, payload, headers, status, description, length },
}

Wire Format: Client Operations

CONNECT

Sent immediately after receiving the first INFO from the server:

CONNECT {"verbose":false,"pedantic":false,...}\r\n

The JSON payload is ConnectInfo serialized inline on the same line.

PUB (Publish without headers)

PUB <subject> [reply-to] <payload-size>\r\n
<payload>\r\n

Example:

PUB events.data INBOX.67 11\r\n
Hello World\r\n

HPUB (Publish with headers)

When headers are present and non-empty:

HPUB <subject> [reply-to] <header-size> <total-size>\r\n
<headers>\r\n
<payload>\r\n

The <total-size> = <header-size> + <payload-size>.

Header block format:

NATS/1.0\r\n
Header-Name: Header-Value\r\n
Another-Header: Another-Value\r\n
\r\n

The version line (NATS/1.0) may include a status code and description:

NATS/1.0 404 No Messages\r\n
\r\n

SUB (Subscribe)

SUB <subject> [queue-group] <sid>\r\n

The sid (subscription ID) is a client-assigned u64, unique per connection.

UNSUB (Unsubscribe)

UNSUB <sid> [max]\r\n

The optional max tells the server to auto-unsubscribe after max messages are delivered.

PING / PONG

PING\r\n
PONG\r\n

Client sends PING periodically (default every 60s). If 2+ pings are pending without PONG, the connection is considered dead.

Wire Format: Server Operations

INFO

First message sent by the server on connection:

INFO {"server_id":"NATSxxx","version":"2.10"...}\r\n

Also sent asynchronously when cluster topology changes.

MSG (Message without headers)

MSG <subject> <sid> [reply-to] <payload-size>\r\n
<payload>\r\n

HMSG (Message with headers)

HMSG <subject> <sid> [reply-to] <header-size> <total-size>\r\n
<headers + payload>\r\n

+OK / -ERR

+OK\r\n
-ERR <description>\r\n

Sent only when verbose=true in CONNECT. The client always sets verbose=false, so +OK is not expected.

Protocol Parser

The Connection struct handles all protocol parsing and serialization:

Read Path (try_read_op)

  1. Search for \r\n in read_buf using memchr::memmem::find
  2. Inspect the first bytes to determine the operation type:
    • +OKServerOp::Ok
    • PINGServerOp::Ping
    • PONGServerOp::Pong
    • -ERRServerOp::Error(...) (description is trim_matches('\''))
    • INFO ServerOp::Info(...) (serde_json deserialization)
    • MSG → Parse subject/sid/reply/size, then read payload
    • HMSG → Parse subject/sid/reply/header_len/total_len, then read headers + payload
  3. For MSG/HMSG: if the full message body hasn't been read yet, return None (wait for more data)
  4. For HMSG: parse the header block — extract version line (NATS/1.0[ <status>[ <description>]]), then key-value pairs (supports folded/multi-line header values)

Write Path (enqueue_write_op)

Writes into a buffer strategy:

  • Small writes (< 4096 bytes): flattened into flattened_writes: BytesMut
  • Large writes (≥ 4096 bytes): appended as separate Bytes chunks in write_buf: VecDeque<Bytes>

This enables efficient vectored I/O when the underlying stream supports it.

Write Flush Strategy

The should_flush() method returns:

  • Yes — buffers empty but haven't flushed yet
  • May — buffers not empty and haven't flushed
  • No — already flushed or nothing to flush

The ConnectionHandler calls poll_flush() after processing commands, ensuring data is actually sent to the server.

Vectored I/O

When stream.is_write_vectored() returns true, the connection uses poll_write_vectored() to write up to 64 IoSlices at once. This is significantly more efficient for bursty publish patterns.

const WRITE_VECTORED_CHUNKS: usize = 64;

WebSocket Transport

When the websockets feature is enabled, WebSocketAdapter<T> wraps tokio_websockets::WebSocketStream<T> to implement AsyncRead + AsyncWrite, making WebSocket connections transparent to the protocol layer.

#[cfg(feature = "websockets")]
pub(crate) struct WebSocketAdapter<T> {
    pub(crate) inner: WebSocketStream<T>,
    pub(crate) read_buf: BytesMut,
}

WebSocket connections use ws:// or wss:// scheme in the server URL. TLS for wss:// is handled by the WebSocket library's built-in TLS support.

Connection Lifecycle

Initial Connection Flow

Client                                  Server
  │                                        │
  │──── TCP connect ────────────────────▶  │
  │◀──── INFO {server_id, nonce, ...} ─── │
  │──── CONNECT {auth, ...} ──────────▶  │
  │──── PING ─────────────────────────▶  │
  │◀──── PONG (or -ERR) ─────────────── │
  │                                        │
  │  [connected, ConnectionHandler runs]   │

If tls_first is enabled, TLS is established before reading INFO:

Client                                  Server
  │                                        │
  │──── TCP connect ────────────────────▶  │
  │──── TLS handshake ─────────────────▶  │
  │◀──── TLS handshake ──────────────── │
  │◀──── INFO {...} ──────────────────── │
  │──── CONNECT + PING ────────────────▶  │
  │◀──── PONG ────────────────────────── │

Ping/Pong Keepalive

  • Client sends PING every ping_interval (default 60s)
  • Server responds with PONG
  • If pending_pings > MAX_PENDING_PINGS (2), connection is considered dead
  • Any server operation resets the ping interval timer

Reconnection Flow

On disconnect:

  1. handle_disconnect() sends Event::Disconnected and sets state to Disconnected
  2. handle_reconnect() calls connector.connect() which:
    • Shuffles servers (unless retain_servers_order)
    • Sorts by failed_attempts (ascending)
    • Iterates through servers with exponential backoff delay
    • On each server: DNS resolve → TCP connect → INFO → TLS (if needed) → CONNECT+PING → PONG
  3. On success:
    • Sends Event::Connected, sets state to Connected
    • Removes closed subscriptions
    • Re-subscribes all active subscriptions (with adjusted max = max - delivered)
    • Re-subscribes the multiplexer (if active)
  4. On failure with MaxReconnects reached, the handler loop exits

Default Reconnect Delay

Exponential backoff capped at 4 seconds:

fn reconnect_delay_callback_default(attempts: usize) -> Duration {
    if attempts <= 1 {
        Duration::from_millis(0)
    } else {
        let exp: u32 = (attempts - 1).try_into().unwrap_or(u32::MAX);
        cmp::min(Duration::from_millis(2_u64.saturating_pow(exp)), Duration::from_secs(4))
    }
}
Attempt Delay
1 0ms
2 0ms
3 2ms
4 8ms
5 32ms
6 128ms
7 512ms
8 2048ms
9+ 4000ms (cap)