Files
alknet/docs/research/references/ssh/russh-sftp/02-wire-protocol-and-codec.md

231 lines
8.8 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# russh-sftp: Wire Protocol and Codec
## SFTP v3 Wire Format
The SFTP protocol (draft-ietf-secsh-filexfer-02) transmits packets over the SSH channel as:
```
┌────────────┬──────────┬─────────────────┐
│ length │ type │ payload │
│ (u32 BE) │ (u8) │ (variable) │
│ 4 bytes │ 1 byte │ length-1 bytes│
└────────────┴──────────┴─────────────────┘
```
- `length` includes the type byte but not itself
- All multi-byte integers are **big-endian** (network byte order)
- Strings are encoded as `u32 length + UTF-8 bytes`
- Byte arrays are encoded as `u32 length + raw bytes`
### Packet Type Constants
Defined in `protocol/mod.rs`:
| Constant | Value | Direction | Description |
|----------|-------|-----------|-------------|
| `SSH_FXP_INIT` | 1 | C→S | Client initialization |
| `SSH_FXP_VERSION` | 2 | S→C | Server version response |
| `SSH_FXP_OPEN` | 3 | C→S | Open a file |
| `SSH_FXP_CLOSE` | 4 | C→S | Close a handle |
| `SSH_FXP_READ` | 5 | C→S | Read from a handle |
| `SSH_FXP_WRITE` | 6 | C→S | Write to a handle |
| `SSH_FXP_LSTAT` | 7 | C→S | Stat a path (no follow) |
| `SSH_FXP_FSTAT` | 8 | C→S | Stat an open handle |
| `SSH_FXP_SETSTAT` | 9 | C→S | Set file attributes by path |
| `SSH_FXP_FSETSTAT` | 10 | C→S | Set file attributes by handle |
| `SSH_FXP_OPENDIR` | 11 | C→S | Open a directory |
| `SSH_FXP_READDIR` | 12 | C→S | Read directory entries |
| `SSH_FXP_REMOVE` | 13 | C→S | Remove a file |
| `SSH_FXP_MKDIR` | 14 | C→S | Create a directory |
| `SSH_FXP_RMDIR` | 15 | C→S | Remove a directory |
| `SSH_FXP_REALPATH` | 16 | C→S | Canonicalize a path |
| `SSH_FXP_STAT` | 17 | C→S | Stat a path (follow symlinks) |
| `SSH_FXP_RENAME` | 18 | C→S | Rename a file |
| `SSH_FXP_READLINK` | 19 | C→S | Read a symbolic link |
| `SSH_FXP_SYMLINK` | 20 | C→S | Create a symbolic link |
| `SSH_FXP_STATUS` | 101 | S→C / C→S | Status response |
| `SSH_FXP_HANDLE` | 102 | S→C | Handle response |
| `SSH_FXP_DATA` | 103 | S→C | Data response |
| `SSH_FXP_NAME` | 104 | S→C | Name list response |
| `SSH_FXP_ATTRS` | 105 | S→C | File attributes response |
| `SSH_FXP_EXTENDED` | 200 | C→S | Extended request |
| `SSH_FXP_EXTENDED_REPLY` | 201 | S→C | Extended reply |
## Packet Reading
Wire I/O is handled by `utils::read_packet()`:
```rust
pub(crate) async fn read_packet<S: AsyncRead + Unpin>(
stream: &mut S,
max_length: u32,
) -> Result<Bytes, Error> {
let length = stream.read_u32().await?;
if length > max_length {
return Err(Error::BadMessage("packet length limit exceeded".to_owned()));
}
let mut buf = vec![0; length as usize];
stream.read_exact(&mut buf).await?;
Ok(Bytes::from(buf))
}
```
The read packet buffer **includes the type byte** as the first byte, followed by the payload. This design means the caller can distinguish packet types before full deserialization.
## Packet Enum and Dispatch
All packets are unified into a single `Packet` enum:
```rust
pub enum Packet {
Init(Init), Version(Version), Open(Open),
Close(Close), Read(Read), Write(Write),
Lstat(Lstat), Fstat(Fstat), SetStat(SetStat),
FSetStat(FSetStat), OpenDir(OpenDir), ReadDir(ReadDir),
Remove(Remove), MkDir(MkDir), RmDir(RmDir),
RealPath(RealPath), Stat(Stat), Rename(Rename),
ReadLink(ReadLink), Symlink(Symlink), Status(Status),
Handle(Handle), Data(Data), Name(Name),
Attrs(Attrs), Extended(Extended), ExtendedReply(ExtendedReply),
}
```
### Deserialization (`TryFrom<&mut Bytes> for Packet`)
Reads the type byte first, then delegates to the custom serde deserializer:
```rust
fn try_from(bytes: &mut Bytes) -> Result<Self, Self::Error> {
let r#type = bytes.try_get_u8()?;
match r#type {
SSH_FXP_INIT => Self::Init(de::from_bytes(bytes)?),
SSH_FXP_OPEN => Self::Open(de::from_bytes(bytes)?),
// ... all 26 variants
_ => Err(Error::BadMessage("unknown type".to_owned())),
}
}
```
### Serialization (`TryFrom<Packet> for Bytes`)
Converts each variant to bytes via `ser::to_bytes()`, prepends type byte, and wraps with the 4-byte length:
```rust
fn try_from(packet: Packet) -> Result<Self, Self::Error> {
let (r#type, payload): (u8, Bytes) = match packet {
Packet::Init(init) => (SSH_FXP_INIT, ser::to_bytes(&init)?),
Packet::Open(open) => (SSH_FXP_OPEN, ser::to_bytes(&open)?),
// ... all variants
};
let length = payload.len() as u32 + 1;
let mut bytes = BytesMut::new();
bytes.put_u32(length);
bytes.put_u8(r#type);
bytes.put_slice(&payload);
Ok(bytes.freeze())
}
```
## Custom Serde Wire Codec
The crate implements a **custom serde `Serializer` and `Deserializer`** that directly maps Rust types to the SFTP binary format. This is NOT JSON, Bincode, or any standard serde format — it is a bespoke binary encoding matching the SFTP v3 wire specification.
### Serializer (`ser.rs`)
The `Serializer` writes directly into a `BytesMut` buffer:
| Rust Type | Wire Encoding |
|-----------|---------------|
| `u8` | 1 byte raw |
| `u32` | 4 bytes big-endian |
| `u64` | 8 bytes big-endian |
| `str` / `String` | `u32 length` + UTF-8 bytes |
| `bytes` | `u32 length` + raw bytes |
| `struct` | Fields concatenated in order (no field names) |
| `seq` | `u32 count` + elements |
| `map` | Key-value pairs (no length prefix) |
| `enum` | Variant index as `u32` + variant content |
| `None` | Nothing (zero bytes) |
| `Some(T)` | Serialized as `T` |
| `bool`, `i8``i64`, `u16`, `f32`/`f64`, `char` | **Not supported** — returns `BadMessage` error |
Key detail: `struct` serialization uses `serialize_struct` which delegates to `serialize_tuple` — fields are written in declaration order with **no field names or tags**. This matches SFTP's positional binary layout.
The `data_serialize` helper serializes `Vec<u8>` as a raw byte sequence **without** a length prefix (used for `Extended.data` and `ExtendedReply.data`).
### Deserializer (`de.rs`)
The `Deserializer` reads from a `&mut Bytes` buffer, consuming bytes as it goes:
| Wire Pattern | Rust Deserialize Target |
|--------------|------------------------|
| 1 byte | `u8` |
| 4 bytes BE | `u32` |
| 8 bytes BE | `u64` |
| `u32 len` + bytes | `String` / `str` |
| `u32 len` + bytes | `Vec<u8>` / byte buf |
| `u32 count` + elements | `Vec<T>` / seq |
| Positional fields | struct (tuple-like) |
| `u32 variant` + content | enum |
| Key-value pairs | `HashMap` |
The `data_deserialize` helper reads all remaining bytes into a `Vec<u8>` (no length prefix) — used for `Extended.data` and `ExtendedReply.data`.
### TryBuf Helper (`buf.rs`)
A small extension trait on `bytes::Buf`:
```rust
pub trait TryBuf: Buf {
fn try_get_bytes(&mut self) -> Result<Vec<u8>, Error>; // u32-length-prefixed
fn try_get_string(&mut self) -> Result<String, Error>; // u32-length-prefixed UTF-8
}
```
These are used internally by the deserializer for reading SFTP's length-prefixed byte and string fields.
## FileAttributes Serialization
`FileAttributes` has a custom `Serialize`/`Deserialize` implementation because the SFTP wire format uses a **flags bitmask** to indicate which optional fields are present. This is fundamentally different from serde's typical self-describing formats.
### Serialization Flow
1. Compute `FileAttr` flags bitmask based on which `Option` fields are `Some`:
- `SIZE` (0x1) — `size` is present
- `UIDGID` (0x2) — `uid`/`gid` are present
- `PERMISSIONS` (0x4) — `permissions` is present
- `ACMODTIME` (0x8) — `atime`/`mtime` are present
- `EXTENDED` (0x80000000) — extended fields (not yet implemented)
2. Write flags as `u32`
3. Write fields conditionally based on flags
### Deserialization Flow
1. Read `u32` flags bitmask
2. Conditionally read fields based on which bits are set:
- If `SIZE`: read `u64` for `size`
- If `UIDGID`: read `u32` for `uid`, `u32` for `gid`
- If `PERMISSIONS`: read `u32` for `permissions`
- If `ACMODTIME`: read `u32` for `atime`, `u32` for `mtime`
This ensures that fields not flagged are left as `None` in the `FileAttributes` struct.
## Request ID Tracking
All request packets (except `Init`) carry a `u32 id` field used as a request identifier. The `RequestId` trait and macro provide uniform access:
```rust
pub(crate) trait RequestId: Sized {
fn get_request_id(&self) -> u32;
}
macro_rules! impl_request_id {
($packet:ty) => {
impl RequestId for $packet {
fn get_request_id(&self) -> u32 { self.id }
}
};
}
```
This is used by the server to extract the request ID for constructing status responses on error, and by the client for demultiplexing responses.