# Connection and Reconnection This document covers how connections are established, TLS handling, the server pool, and the reconnection mechanism. ## Connector **Location**: `connector.rs` The `Connector` manages the server pool and handles connection establishment and reconnection. ```rust pub(crate) struct Connector { servers: Vec, // Server pool with per-server metadata options: ConnectorOptions, // Connection configuration connect_stats: Arc, // Shared statistics attempts: usize, // Global reconnection attempt counter events_tx: mpsc::Sender, // Event channel state_tx: watch::Sender, // Connection state watcher max_payload: Arc, // Server's max payload last_info: ServerInfo, // Last known server info } ``` ### Server Pool Each server in the pool carries metadata: ```rust #[derive(Debug, Clone)] pub struct Server { pub addr: ServerAddr, pub failed_attempts: usize, // Consecutive failed attempts pub did_connect: bool, // Ever successfully connected? pub is_discovered: bool, // Discovered via INFO, not user-configured pub last_error: Option, // Last connection error } ``` ### ConnectorOptions ```rust pub(crate) struct ConnectorOptions { pub tls_required: bool, pub certificates: Vec, pub client_cert: Option, pub client_key: Option, pub tls_client_config: Option, pub tls_first: bool, pub auth: Auth, pub no_echo: bool, pub connection_timeout: Duration, // Default: 5 seconds pub name: Option, pub ignore_discovered_servers: bool, pub retain_servers_order: bool, pub read_buffer_capacity: u16, // Default: 65535 pub reconnect_delay_callback: Arc Duration>, pub auth_callback: Option, Result>>, pub max_reconnects: Option, pub local_address: Option, pub reconnect_to_server_callback: Option, } ``` ## Connection Establishment Flow ``` Connector::try_connect_to_server(addr) │ ├── 1. DNS resolution │ server_addr.socket_addrs() │ ├── 2. For each resolved address: │ │ │ ├── 2a. Connect with timeout │ │ tokio::time::timeout(connection_timeout, try_connect_to(socket_addr, ...)) │ │ │ └── 2b. try_connect_to(): │ │ │ ├── Select transport: │ │ ├── "ws" → WebSocket (tokio_websockets) │ │ ├── "wss" → WebSocket over TLS │ │ └── default → TCP (TcpStream) │ │ │ ├── Optional: bind to local_address │ ├── Set TCP_NODELAY │ ├── Create Connection with read_buffer_capacity │ │ │ ├── If tls_first: upgrade to TLS before INFO │ │ │ ├── Read INFO from server │ │ │ ├── If TLS required (by option, server, or URL scheme): │ │ upgrade to TLS (rustls) │ │ │ ├── Discover servers from INFO.connect_urls │ │ (unless ignore_discovered_servers) │ │ │ ├── Build ConnectInfo with auth: │ │ ├── username/password (from Auth or URL) │ │ ├── token (from Auth) │ │ ├── nkey + signed nonce (feature: nkeys) │ │ ├── JWT + signature callback (feature: nkeys) │ │ └── auth_callback (custom async callback) │ │ │ ├── Send CONNECT + PING │ │ │ └── Wait for response: │ ├── -ERR (authorization violation) → error │ ├── PONG or +OK → success │ └── EOF → error │ └── 3. On success: ├── Reset attempt counter ├── Increment connects statistic ├── Emit Event::Connected ├── Update State::Connected ├── Store max_payload ├── Update per-server metadata (did_connect, failed_attempts) └── Return (ServerInfo, Connection) ``` ## TLS Handling The client supports three TLS modes: ### 1. Standard TLS (INFO → TLS) Default behavior. The client receives the `INFO` message in plaintext, then upgrades to TLS if: - `tls_required` option is set - Server's `INFO.tls_required` is true - URL scheme is `tls://` ### 2. TLS First (TLS → INFO) When `ConnectOptions::tls_first()` is enabled, the client establishes TLS before reading INFO. This requires the server to have `handshake_first` enabled. Useful for environments where plaintext INFO is not acceptable. ### 3. WebSocket TLS For `wss://` URLs, TLS is handled by the WebSocket library (`tokio-websockets`) directly, not by the client's TLS layer. ### TLS Configuration The client uses `rustls` via `tokio-rustls`. Configuration steps: 1. Load root certificates from system store (`rustls-native-certs`) 2. Optionally add custom root certificates from PEM files 3. Optionally configure client certificate and key for mTLS 4. Optionally pass a custom `rustls::ClientConfig` Crypto backend is selectable via feature flags: - `ring` (default) - `aws-lc-rs` - `fips` (requires aws-lc-rs) ## Reconnection ### Reconnection Trigger Reconnection is triggered when: 1. I/O error during read or write (`ExitReason::Disconnected`) 2. Too many pending PINGs (no PONG received) 3. User calls `Client::force_reconnect()` (`ExitReason::ReconnectRequested`) ### Reconnection Flow ``` ConnectionHandler::handle_disconnect() │ ├── Reset pending_pings to 0 ├── Emit Event::Disconnected ├── Update State::Disconnected │ └── handle_reconnect() │ └── Connector::connect() │ └── Loop: try_connect() │ ├── If reconnect_to_server_callback is set: │ │ Call callback with (server_pool, server_info) │ │ If returns Some(ReconnectToServer): │ │ Validate server is in pool │ │ Use callback's delay or default backoff │ │ Try connecting to selected server │ └── If None or invalid: fall through to default │ ├── Default selection: │ ├── Shuffle servers (unless retain_servers_order) │ ├── Sort by failed_attempts (ascending) │ └── Try each server in order │ ├── For each server: │ ├── Increment attempts counter │ ├── Check max_reconnects limit │ ├── Apply reconnect delay (exponential backoff) │ └── try_connect_to_server(addr) │ ├── On success: │ ├── Reset attempts to 0 │ ├── Re-subscribe all active subscriptions │ │ (filter out closed subscription channels) │ ├── Re-subscribe multiplexer wildcard │ └── Return (ServerInfo, Connection) │ └── On failure: ├── Update per-server metadata (failed_attempts, last_error) ├── Auth errors → propagate immediately └── Other errors → continue to next server ``` ### Exponential Backoff Default reconnect delay function: ```rust fn reconnect_delay_callback_default(attempts: usize) -> Duration { if attempts <= 1 { Duration::from_millis(0) } else { let exp: u32 = (attempts - 1).try_into().unwrap_or(u32::MAX); let max = Duration::from_secs(4); cmp::min(Duration::from_millis(2_u64.saturating_pow(exp)), max) } } ``` | Attempt | Delay | |---------|-------| | 1 | 0ms | | 2 | 0ms | | 3 | 2ms | | 4 | 4ms | | 5 | 8ms | | ... | ... | | 13 | 4096ms | | 14+ | 4000ms (capped) | Custom delay functions can be provided via `ConnectOptions::reconnect_delay_callback()`. ### Server Pool Updates The server pool is dynamic: 1. **Initial pool**: from `connect()` / `ConnectOptions::connect()` URL(s) 2. **Discovered servers**: added from `INFO.connect_urls` on each connection (unless `ignore_discovered_servers` is set) 3. **Runtime updates**: via `Client::set_server_pool()` — replaces the entire pool while preserving per-server state for servers that appear in both old and new pools 4. **Order**: servers are shuffled by default (random selection), unless `retain_servers_order` is set ### Max Reconnects The `max_reconnects` option limits total reconnection attempts: - `None` or `0` → unlimited (default) - `Some(n)` → give up after `n` total attempts - Counter is reset on successful connection and when `set_server_pool()` is called ## ConnectOptions Defaults | Option | Default | |--------|---------| | `connection_timeout` | 5 seconds | | `ping_interval` | 60 seconds | | `sender_capacity` | 2048 | | `subscription_capacity` | 65536 | | `inbox_prefix` | `"_INBOX"` | | `request_timeout` | 10 seconds | | `retry_on_initial_connect` | false | | `ignore_discovered_servers` | false | | `retain_servers_order` | false | | `read_buffer_capacity` | 65535 | | `skip_subject_validation` | false | | `no_echo` | false | | `tls_required` | false | | `tls_first` | false | | `max_reconnects` | None (unlimited) | ## Background Connection When `ConnectOptions::retry_on_initial_connect()` is enabled, the `connect()` function returns a `Client` immediately, before the connection is established. The connection is established in a background Tokio task. This means: - `client.server_info()` returns `ServerInfo::default()` until connected - `client.connection_state()` returns `State::Pending` - Operations like `publish()` will queue in the command channel - The `Client` becomes usable once the background task connects