Files
alknet/docs/research/references/nats.rs/nats-async/03-protocol-and-wire-format.md

278 lines
7.9 KiB
Markdown

# async-nats: NATS Protocol & Wire Format
## Protocol Overview
NATS uses a simple, text-based protocol over TCP. Messages are terminated with `\r\n`. The protocol is symmetric for client and server operations.
### Client → Server Operations (`ClientOp`)
```rust
pub(crate) enum ClientOp {
Publish { subject, payload, respond, headers },
Subscribe { sid, subject, queue_group },
Unsubscribe { sid, max },
Ping,
Pong,
Connect(ConnectInfo),
}
```
### Server → Client Operations (`ServerOp`)
```rust
pub(crate) enum ServerOp {
Ok,
Info(Box<ServerInfo>),
Ping,
Pong,
Error(ServerError),
Message { sid, subject, reply, payload, headers, status, description, length },
}
```
## Wire Format: Client Operations
### CONNECT
Sent immediately after receiving the first `INFO` from the server:
```
CONNECT {"verbose":false,"pedantic":false,...}\r\n
```
The JSON payload is `ConnectInfo` serialized inline on the same line.
### PUB (Publish without headers)
```
PUB <subject> [reply-to] <payload-size>\r\n
<payload>\r\n
```
Example:
```
PUB events.data INBOX.67 11\r\n
Hello World\r\n
```
### HPUB (Publish with headers)
When headers are present and non-empty:
```
HPUB <subject> [reply-to] <header-size> <total-size>\r\n
<headers>\r\n
<payload>\r\n
```
The `<total-size>` = `<header-size>` + `<payload-size>`.
Header block format:
```
NATS/1.0\r\n
Header-Name: Header-Value\r\n
Another-Header: Another-Value\r\n
\r\n
```
The version line (`NATS/1.0`) may include a status code and description:
```
NATS/1.0 404 No Messages\r\n
\r\n
```
### SUB (Subscribe)
```
SUB <subject> [queue-group] <sid>\r\n
```
The `sid` (subscription ID) is a client-assigned u64, unique per connection.
### UNSUB (Unsubscribe)
```
UNSUB <sid> [max]\r\n
```
The optional `max` tells the server to auto-unsubscribe after `max` messages are delivered.
### PING / PONG
```
PING\r\n
PONG\r\n
```
Client sends PING periodically (default every 60s). If 2+ pings are pending without PONG, the connection is considered dead.
## Wire Format: Server Operations
### INFO
First message sent by the server on connection:
```
INFO {"server_id":"NATSxxx","version":"2.10"...}\r\n
```
Also sent asynchronously when cluster topology changes.
### MSG (Message without headers)
```
MSG <subject> <sid> [reply-to] <payload-size>\r\n
<payload>\r\n
```
### HMSG (Message with headers)
```
HMSG <subject> <sid> [reply-to] <header-size> <total-size>\r\n
<headers + payload>\r\n
```
### +OK / -ERR
```
+OK\r\n
-ERR <description>\r\n
```
Sent only when `verbose=true` in `CONNECT`. The client always sets `verbose=false`, so `+OK` is not expected.
## Protocol Parser
The `Connection` struct handles all protocol parsing and serialization:
### Read Path (`try_read_op`)
1. Search for `\r\n` in `read_buf` using `memchr::memmem::find`
2. Inspect the first bytes to determine the operation type:
- `+OK``ServerOp::Ok`
- `PING``ServerOp::Ping`
- `PONG``ServerOp::Pong`
- `-ERR``ServerOp::Error(...)` (description is `trim_matches('\'')`)
- `INFO ``ServerOp::Info(...)` (serde_json deserialization)
- `MSG ` → Parse subject/sid/reply/size, then read payload
- `HMSG ` → Parse subject/sid/reply/header_len/total_len, then read headers + payload
3. For `MSG`/`HMSG`: if the full message body hasn't been read yet, return `None` (wait for more data)
4. For `HMSG`: parse the header block — extract version line (`NATS/1.0[ <status>[ <description>]]`), then key-value pairs (supports folded/multi-line header values)
### Write Path (`enqueue_write_op`)
Writes into a buffer strategy:
- **Small writes** (< 4096 bytes): flattened into `flattened_writes: BytesMut`
- **Large writes** (≥ 4096 bytes): appended as separate `Bytes` chunks in `write_buf: VecDeque<Bytes>`
This enables efficient vectored I/O when the underlying stream supports it.
### Write Flush Strategy
The `should_flush()` method returns:
- `Yes` — buffers empty but haven't flushed yet
- `May` — buffers not empty and haven't flushed
- `No` — already flushed or nothing to flush
The `ConnectionHandler` calls `poll_flush()` after processing commands, ensuring data is actually sent to the server.
## Vectored I/O
When `stream.is_write_vectored()` returns true, the connection uses `poll_write_vectored()` to write up to 64 `IoSlice`s at once. This is significantly more efficient for bursty publish patterns.
```rust
const WRITE_VECTORED_CHUNKS: usize = 64;
```
## WebSocket Transport
When the `websockets` feature is enabled, `WebSocketAdapter<T>` wraps `tokio_websockets::WebSocketStream<T>` to implement `AsyncRead + AsyncWrite`, making WebSocket connections transparent to the protocol layer.
```rust
#[cfg(feature = "websockets")]
pub(crate) struct WebSocketAdapter<T> {
pub(crate) inner: WebSocketStream<T>,
pub(crate) read_buf: BytesMut,
}
```
WebSocket connections use `ws://` or `wss://` scheme in the server URL. TLS for `wss://` is handled by the WebSocket library's built-in TLS support.
## Connection Lifecycle
### Initial Connection Flow
```
Client Server
│ │
│──── TCP connect ────────────────────▶ │
│◀──── INFO {server_id, nonce, ...} ─── │
│──── CONNECT {auth, ...} ──────────▶ │
│──── PING ─────────────────────────▶ │
│◀──── PONG (or -ERR) ─────────────── │
│ │
│ [connected, ConnectionHandler runs] │
```
If `tls_first` is enabled, TLS is established before reading INFO:
```
Client Server
│ │
│──── TCP connect ────────────────────▶ │
│──── TLS handshake ─────────────────▶ │
│◀──── TLS handshake ──────────────── │
│◀──── INFO {...} ──────────────────── │
│──── CONNECT + PING ────────────────▶ │
│◀──── PONG ────────────────────────── │
```
### Ping/Pong Keepalive
- Client sends PING every `ping_interval` (default 60s)
- Server responds with PONG
- If `pending_pings > MAX_PENDING_PINGS (2)`, connection is considered dead
- Any server operation resets the ping interval timer
### Reconnection Flow
On disconnect:
1. `handle_disconnect()` sends `Event::Disconnected` and sets state to `Disconnected`
2. `handle_reconnect()` calls `connector.connect()` which:
- Shuffles servers (unless `retain_servers_order`)
- Sorts by `failed_attempts` (ascending)
- Iterates through servers with exponential backoff delay
- On each server: DNS resolve → TCP connect → INFO → TLS (if needed) → CONNECT+PING → PONG
3. On success:
- Sends `Event::Connected`, sets state to `Connected`
- Removes closed subscriptions
- Re-subscribes all active subscriptions (with adjusted `max = max - delivered`)
- Re-subscribes the multiplexer (if active)
4. On failure with `MaxReconnects` reached, the handler loop exits
### Default Reconnect Delay
Exponential backoff capped at 4 seconds:
```rust
fn reconnect_delay_callback_default(attempts: usize) -> Duration {
if attempts <= 1 {
Duration::from_millis(0)
} else {
let exp: u32 = (attempts - 1).try_into().unwrap_or(u32::MAX);
cmp::min(Duration::from_millis(2_u64.saturating_pow(exp)), Duration::from_secs(4))
}
}
```
| Attempt | Delay |
|---------|-------|
| 1 | 0ms |
| 2 | 0ms |
| 3 | 2ms |
| 4 | 8ms |
| 5 | 32ms |
| 6 | 128ms |
| 7 | 512ms |
| 8 | 2048ms |
| 9+ | 4000ms (cap) |