Files

glm-5.1 9fadc5d1af Add rudolfs Git LFS server reference research

Deep dive into rudolfs architecture, LFS protocol implementation,
storage backend decorator pattern (Verify→Encrypted→Cached→Retrying→S3),
caching layer, encryption, and relevance to the alknet git hosting
stack (gitserver + rudolfs + rustfs).

2026-06-08 15:26:56 +00:00

39 KiB

Raw Blame History

Rudolfs Reference Document

Source: https://github.com/jasonwhite/rudolfs (cloned at /workspace/rudolfs/) Version: 0.3.8 (Cargo.toml) License: MIT Date researched: 2026-06-08 Purpose: Evaluate rudolfs as a Git LFS server for the alknet git hosting stack (gitserver + rudolfs + rustfs)

1. Architecture Overview

1.1 What is Rudolfs?

Rudolfs is a high-performance, caching Git LFS server written in Rust with an AWS S3 storage backend. It implements the Git LFS batch API specification, providing a clean separation between the LFS protocol layer and pluggable storage backends. The storage system uses a decorator (Russian doll) pattern where backends are composed in layers — each adding a capability (encryption, verification, caching, retrying) — resulting in a flexible and composable architecture.

Key design principles:

Modular storage backends: S3, local disk, and any combination with caching
Streaming everywhere: All data flows as async byte streams; nothing is buffered entirely in memory
Decorator composition: Storage capabilities are stacked via composable wrapper types
Corruption resilience: SHA256 verification on upload and download; corrupted objects are auto-purged from cache
Optional encryption: XChaCha20 stream cipher for data at rest (both cache and permanent storage)

1.2 Source Structure

src/
├── main.rs              # CLI entry point (structopt), server startup
├── lib.rs               # Public API: S3ServerBuilder, LocalServerBuilder, Cache, Server trait
├── app.rs               # Hyper service: HTTP routing, batch API, upload/download endpoints
├── lfs.rs               # LFS protocol types (BatchRequest/Response, Action, Oid, Transfer)
├── sha256.rs            # Sha256 type, hex serde, VerifyStream (streaming SHA256 checker)
├── lru.rs               # In-memory LRU cache (LinkedHashMap-based) for cache metadata
├── error.rs             # Error type (re-export of anyhow::Error)
├── hyperext.rs          # RequestExt trait (X-Forwarded-Proto, X-Forwarded-Host, Host headers)
├── logger.rs            # Logger middleware (request/response logging)
├── util.rs              # NamedTempFile (async temp file with auto-cleanup + atomic rename)
└── storage/
    ├── mod.rs            # Core traits: Storage, LFSObject, StorageKey, Namespace, ByteStream
    ├── s3.rs             # S3 backend (rusoto): multipart upload, head/get/put, presigned URLs
    ├── disk.rs           # Local disk backend: file-based storage with temp file pattern
    ├── cached.rs         # Caching decorator: LRU + disk cache → permanent storage
    ├── encrypt.rs        # Encryption decorator: XChaCha20 stream cipher
    ├── verify.rs         # Verification decorator: streaming SHA256 with auto-delete on corruption
    ├── retrying.rs       # Retry decorator: exponential backoff for size/delete operations
    └── faulty.rs         # Fault injection decorator (test-only, gated by "faulty" feature)

1.3 Key Dependencies

Dependency	Version	Purpose
`hyper`	0.14	HTTP server (no frameworks like Axum/Actix)
`rusoto_s3`	0.48	S3 API client (PutObject, GetObject, HeadObject, multipart upload)
`rusoto_core`	0.48	AWS credential provider, HTTP client
`rusoto_sts`	0.48	Kubernetes WebIdentity credential provider
`rusoto_credential`	0.48	DefaultCredentialsProvider chain (env → file → IAM)
`tokio`	1	Async runtime (full features)
`tokio-util`	0.7	BytesCodec for file I/O streaming
`futures`	0.3	Stream combinators, try_join_all, channel mpsc
`serde` / `serde_json`	1 / 1	JSON serialization for LFS batch API
`sha2`	0.11.0-rc.3	SHA256 hashing for LFS object verification
`chacha`	0.4	XChaCha20 stream cipher for encryption
`linked-hash-map`	0.5	LRU cache data structure with O(1) get_refresh
`askama`	0.14	HTML template engine (index page)
`structopt`	0.3	CLI argument parsing
`backoff`	0.4	Exponential backoff for S3 retries
`uuid`	1.1	Temp file naming (v4)
`bytes`	1	Bytes/BytesMut for streaming
`derive_more`	2	Display/From derives for error types
`human-size`	0.4	Human-readable size parsing (CLI)
`humansize`	2	Human-readable size formatting (logging)

Note: rudolfs uses rusoto (the older Rust AWS SDK), not aws-sdk-rust. This is important for compatibility with S3-compatible stores like MinIO and, by extension, rustfs.

1.4 Request Flow

LFS Batch API (Upload)

Client → POST /api/{org}/{project}/objects/batch
       → { "operation": "upload", "objects": [{ "oid": "...", "size": N }] }
       → Server: parse request, check storage.size() for each object
       ← { "objects": [{ "oid": "...", "size": N, "actions": { "upload": { "href": "..." }, "verify": { "href": "..." } } }] }

Client → PUT /api/{org}/{project}/object/{oid}
       → Raw LFS object data (Content-Length header required)
       → Server: stream through Verify→Encrypted→Cached→S3 (or disk)
       ← 200 OK (empty body)

LFS Batch API (Download)

Client → POST /api/{org}/{project}/objects/batch
       → { "operation": "download", "objects": [...] }
       ← { "objects": [{ "oid": "...", "actions": { "download": { "href": "..." } } }] }

Client → GET /api/{org}/{project}/object/{oid}
       → Server: lookup in cache (LRU metadata), fetch from storage if needed
       ← Streamed LFS object data with Content-Length

CDN Mode (Presigned URLs)

When --cdn is configured, the batch response for downloads returns a presigned S3 URL directly instead of routing through rudolfs. Uploads also use presigned S3 URLs. This bypasses encryption (since data never touches rudolfs).

2. Git LFS Protocol Implementation

2.1 LFS Protocol Overview

Git LFS uses an HTTP-based API where the client first contacts the LFS server to discover how to transfer objects, then performs the actual transfers. Rudolfs implements the "basic" transfer adapter exclusively.

Specification reference: git-lfs/docs/api/batch.md

2.2 Endpoints Implemented

Route	Method	Purpose
`/`	GET	HTML index page with setup instructions
`/api/{org}/{project}/objects/batch`	POST	LFS batch API — core protocol endpoint
`/api/{org}/{project}/object/{oid}`	GET	Direct download of a single LFS object
`/api/{org}/{project}/object/{oid}`	PUT	Direct upload of a single LFS object
`/api/{org}/{project}/objects/verify`	POST	Verify object exists with correct size

2.3 Batch API Detail

Request (BatchRequest):

pub struct BatchRequest {
    pub operation: Operation,       // "upload" or "download"
    pub transfers: Option<Vec<Transfer>>,  // Transfer adapters client supports
    pub refs: Option<BTreeMap<String, String>>,  // Git ref context (v2.4+)
    pub objects: Vec<RequestObject>,  // Objects to transfer
}

Response (BatchResponse):

pub struct BatchResponse {
    pub transfer: Option<Transfer>,  // Always Transfer::Basic
    pub objects: Vec<ResponseObject>,
}

Each ResponseObject contains:

oid: SHA256 hash of the object
size: Byte size
error: Optional error (code + message)
authenticated: Some(true) — signals the client that auth is handled
actions: Optional Actions with href, header, expires_in/expires_at

2.4 Transfer Adapters

pub enum Transfer {
    Basic,                    // The basic HTTP transfer adapter
    LfsStandaloneFile,       // Parsed but not implemented
    Custom,                  // Catch-all via #[serde(other)]
}

Rudolfs always responds with Transfer::Basic. It parses LfsStandaloneFile and Custom from client requests but never selects them.

2.5 Upload Expiration

Upload actions include an expires_in field set to 30 minutes (1800 seconds):

const UPLOAD_EXPIRATION: Duration = Duration::from_secs(30 * 60);

2.6 Auth Header Reflection

Rudolfs has no built-in authentication. However, it reflects any Authorization header from the incoming batch request back into the header field of all action URLs. This allows a reverse proxy (e.g., nginx with basic auth) to provide authentication:

fn extract_auth_header(headers: &HeaderMap) -> Option<BTreeMap<String, String>> {
    // Filters for "authorization" header and reflects it into action.header
}

This is critical for the alknet integration model — authentication is delegated entirely to an outer layer.

2.7 Namespace (org/project) Extraction

The URL path /api/{org}/{project}/... is parsed to extract the namespace:

let namespace = match (parts.next(), parts.next()) {
    (Some(org), Some(project)) => Namespace::new(org.into(), project.into()),
    _ => // 400 Bad Request
};

This provides multi-tenancy by URL path. Different orgs/projects are isolated in storage (S3 key prefix or disk directory), but objects with identical OIDs across projects can share storage if they use the same namespace path.

2.8 Verify Endpoint

pub struct VerifyRequest {
    pub oid: Oid,
    pub size: u64,
}

The verify endpoint (POST .../objects/verify) checks whether an uploaded object exists and has the correct size. It calls storage.size(&key) and compares against the request's size field.

3. Storage Architecture

3.1 The Storage Trait

The core abstraction is the Storage trait:

#[async_trait]
pub trait Storage {
    type Error: fmt::Display + Send;

    async fn get(&self, key: &StorageKey) -> Result<Option<LFSObject>, Self::Error>;
    async fn put(&self, key: StorageKey, value: LFSObject) -> Result<(), Self::Error>;
    async fn size(&self, key: &StorageKey) -> Result<Option<u64>, Self::Error>;
    async fn delete(&self, key: &StorageKey) -> Result<(), Self::Error>;
    fn list(&self) -> StorageStream<(StorageKey, u64), Self::Error>;
    async fn total_size(&self) -> Option<u64>;
    async fn max_size(&self) -> Option<u64>;
    fn public_url(&self, key: &StorageKey) -> Option<String>;
    async fn upload_url(&self, key: &StorageKey, expires_in: Duration) -> Option<String>;
}

Key types:

StorageKey = (Namespace, Oid) — composite key for all storage operations
Namespace = (org: String, project: String) — tenant/organization isolation
Oid = Sha256 — 32-byte SHA256 hash, displayed as hex
LFSObject — holds (len: u64, stream: ByteStream) where ByteStream is a Pin<Box<dyn Stream<Item = Result<Bytes, io::Error>> + Send>>
ByteStream — pinned, boxed, async byte stream

3.2 LFSObject — Fanout Pattern

LFSObject has a critical fanout() method that duplicates the byte stream into two identical streams using mpsc::channel(0):

pub fn fanout(self) -> (impl Future<Output = Result<(), io::Error>>, Self, Self)

This is used in two places:

Caching on download: Stream flows to both the client and the cache simultaneously
Caching on upload: Stream flows to both the permanent storage and the cache, with a oneshot signal ensuring the cache only persists after the upload succeeds

3.3 Storage Composition

The decorator pattern produces these composition chains depending on configuration:

S3 with Cache and Encryption (fullest stack):

Client ↔ Verify ↔ Encrypted ↔ Cached ↔ Retrying(Disk → S3)
                                         ↑cache   ↑permanent

Actual call order during put:

Verify: wraps stream with SHA256 verification; rejects upload if hash mismatch
Encrypted: applies XChaCha20 cipher to the stream (nonce = first 24 bytes of OID)
Cached: fans out stream to both cache and storage; uses LRU to manage cache
Retrying: wraps S3 operations with exponential backoff (only for size and delete)
S3: multipart uploads to s3://{bucket}/{prefix}/{org}/{project}/{sha256_path}

S3 with CDN (least processing):

Client ↔ App (batch API generates presigned S3 URLs; no data flows through rudolfs)

Local Disk with Cache and Encryption:

Client ↔ Verify ↔ Encrypted ↔ Cached(Disk_cache → Disk_storage)

Local Disk only:

Client ↔ Verify ↔ Disk

3.4 S3 Backend Deep Dive

Object Key Format

fn key_to_path(&self, key: &StorageKey) -> String {
    // With prefix "lfs":   "lfs/{org}/{project}/{sha256_path}"
    // Without prefix:       "{org}/{project}/{sha256_path}"
}

Where sha256_path is formatted as {first_two_hex}/{next_two_hex}/{full_hex}:

// Sha256Path Display:
write!(f, "{:02x}/{:02x}/{}", self.0.bytes()[0], self.0.bytes()[1], self.0)

Example: an OID b1fbeefc23e6a1496f7d0c2fb635bfc78f7ddc2da963ea9c6a63eb324260e6d in namespace myorg/myproject with default prefix becomes:

lfs/myorg/myproject/b1/fe/b1fbeefc23e6a1496f7d0c2fb635bfc78f7ddc2da963ea9c6a63eb324260e6d

The two-level hex prefix distributes objects across S3's flat namespace for better I/O performance.

S3 Operations

Operation	S3 API	Notes
Get object	`GetObject`	Streams response body
Put object	`CreateMultipartUpload` → `UploadPart` (100MB chunks) → `CompleteMultipartUpload`	Handles files >5GB
Check size	`HeadObject`	Uses content_length; handles Rusoto bug for 404
Delete	No-op	Always returns `Ok(())` — never deletes from S3
List	Empty stream	Returns no entries (S3 listing not implemented)
Presigned download	`GetPreSignedUrl`	Only when CDN is configured
Presigned upload	`PutObject` presigned URL	Only when CDN is configured

HeadObject 404 Bug Workaround

Rusoto has a known bug where HeadObject for a missing key returns RusotoError::Unknown with status 404 instead of HeadObjectError::NoSuchKey. Rudolfs checks for this:

Err(RusotoError::Unknown(e)) if e.status == 404 => Ok(None)

Initialization Validation

On startup, the S3 backend performs a HeadBucket request with exponential backoff to validate:

The bucket exists
Credentials are valid

This ensures fast failures for misconfiguration.

Credential Provider Chain

// Try Kubernetes WebIdentity first
let k8s_provider = WebIdentityProvider::from_k8s_env();
if k8s_provider.credentials().await.is_ok() {
    // Use K8s credentials
} else {
    // Fall back to default provider chain:
    // 1. AWS_ACCESS_KEY_ID / AWS_SECRET_ACCESS_KEY env vars
    // 2. ~/.aws/credentials file
    // 3. IAM instance profile
}

Custom S3 Endpoints

The AWS_S3_ENDPOINT environment variable enables custom S3-compatible endpoints:

let region = if let Ok(endpoint) = std::env::var("AWS_S3_ENDPOINT") {
    // Must also set AWS_DEFAULT_REGION or AWS_REGION
    Region::Custom { name, endpoint }
} else {
    Region::default()
}

This is how MinIO (and rustfs) integration works — set AWS_S3_ENDPOINT to the MinIO/rustfs URL.

3.5 Disk Backend Deep Dive

On-Disk Layout

{root}/
├── objects/
│   └── {org}/
│       └── {project}/
│           └── {sha256_prefix}/
│               └── {sha256}/
│                   └── {sha256_hex_full}       # Actual object file
└── incomplete/                                  # Temp directory for in-progress uploads
    └── {uuid}                                  # NamedTempFile, auto-cleaned on failure

Write Pattern (Atomic Put)

Create a NamedTempFile in {root}/incomplete/{uuid}
Stream the entire LFS object to the temp file via Framed codec
Verify byte count matches expected len
fs::create_dir_all for the target directory
file.persist(path) — atomic rename() from temp to final location

This ensures that partially written objects never appear at their final path.

SHA256 Path Structure

The same {first_two_hex}/{next_two_hex}/{full_hex} structure as S3 is used, creating a 2-level directory hierarchy that prevents any single directory from having too many entries.

4. Caching Layer

4.1 Architecture

The Cached backend wraps an inner cache storage (typically Disk) and an outer permanent storage (typically S3), connected by an in-memory LRU index:

        ┌─────────────┐
get()──→│ LRU Index    │──→ cache hit? ──→ Disk.get()
        │ (in-memory)  │                   └── miss? ──→ S3.get() + background cache
        └─────────────┘

pub struct Backend<C, S> {
    lru: Arc<Mutex<Cache>>,   // In-memory LRU metadata index
    max_size: u64,             // Maximum cache size in bytes (0 = unlimited)
    cache: Arc<C>,             // Cache storage (usually Disk)
    storage: Arc<S>,           // Permanent storage (usually S3)
}

4.2 LRU Data Structure

pub struct Cache<K> {
    map: LinkedHashMap<K, u64>,  // Key → size in bytes
    size: u64,                     // Total size of all entries
}

Uses linked_hash_map which maintains insertion order and provides get_refresh() to move an accessed entry to the most-recently-used position. This gives O(1) access and O(1) eviction from the front.

4.3 Cache Hit/Miss Behavior

`get()` — Download

LRU hit: If lru.get_refresh(key) returns size → query cache.get(key)
- If cache has the object → return it (fast path)
- If cache miss (file deleted but LRU not updated) → remove from LRU, fall through to storage
LRU miss: Query storage.get(key)
- If storage has it → fanout() the stream: one copy to client, one copy to cache in background
- If storage doesn't have it → return None

Background caching is fire-and-forget — errors are logged but don't affect the client response.

`put()` — Upload

fanout() the uploaded stream into two copies
One copy goes to permanent storage (storage.put())
When permanent storage finishes receiving, a oneshot signal is sent
The cache copy appends an empty Bytes::new() chunk that only resolves after this signal
Both streams complete in parallel via try_join3(f, cache, store)

This ensures the cache only persists data that was successfully stored permanently.

`size()` — Batch API Check

Only checks the LRU index (without perturbing LRU order). Falls through to permanent storage if not cached. This is critical for the batch API, which checks which objects already exist.

`delete()` — Cache Only

Only deletes from the cache, never from permanent storage. Called by the Verify decorator when corruption is detected.

4.4 Eviction Policy

When the cache exceeds max_size bytes:

pop() the least-recently-used entry from the LRU
Delete the corresponding file from cache storage
Repeat until lru.size() <= max_size

Important: max_size = 0 means unlimited cache — no eviction occurs.

4.5 Startup Prepopulation

On startup, the LRU index is rebuilt from the cache storage's list() method:

let lru = Cache::from_stream(cache.list()).await?;

Then immediately pruned if the current cache exceeds the configured max_size:

let count = prune_cache(lru.clone(), max_size, cache.clone()).await?;

For S3 backend: list() returns an empty stream (not implemented), so the LRU starts empty and builds up as objects are accessed.

For Disk backend: list() walks the filesystem to rebuild the LRU index, making the disk cache survive restarts.

4.6 Corruption Resilience via Verify

The Verify decorator wraps the byte stream with a streaming SHA256 hash:

// On download:
let stream = VerifyStream::new(stream.map_err(Error::from), len, *key.oid());
// If SHA256 mismatches → automatically delete corrupted object from cache
//                                    → return error to client

// On upload:
let stream = VerifyStream::new(stream.map_err(Error::from), len, *key.oid());
// If SHA256 mismatches → reject the upload entirely

The VerifyStream hashes every chunk and compares the final digest against the expected OID when len >= total. This catches:

Bit rot in the cache
Encryption key changes (different hash)
Network transfer errors

5. Encryption

5.1 Algorithm

XChaCha20 stream cipher (extended nonce variant of ChaCha20), implemented via the chacha crate.

5.2 Key and Nonce

Key: 32 bytes, provided via --key CLI flag or RUDOLFS_KEY env var (hex-encoded)
Nonce: First 24 bytes of the SHA256 OID used as the XChaCha20 nonce

let mut nonce: [u8; 24] = [0; 24];
nonce.copy_from_slice(&key.oid().bytes()[0..24]);
let chacha = ChaCha::new_xchacha20(&self.key, &nonce);

Security implication: The nonce is deterministic (derived from the OID). Since OIDs are content-addressable (SHA256 of content), identical content always produces the same nonce. This is safe because each (key, nonce) pair is only ever used to encrypt one message.

5.3 Streaming Encryption

Encryption/decryption is applied to the byte stream in-flight using xor_stream():

fn xor_stream<S>(mut chacha: ChaCha, stream: S) -> impl Stream<Item = Result<Bytes, io::Error>>

No buffering of the entire object — each chunk is encrypted/decrypted as it passes through. This means encryption works on objects of arbitrary size (including those exceeding available memory).

5.4 Encryption + CDN Incompatibility

From lib.rs:

if self.cdn.is_some() {
    tracing::warn!("A CDN was specified. Since uploads and downloads do not flow \
                     through Rudolfs in this case, they will *not* be encrypted.");
    // Cache is also disabled when CDN is used
}

When CDN mode is active, presigned URLs bypass rudolfs entirely, so encryption cannot be applied.

6. Retry and Fault Injection

6.1 Retry (Retrying Decorator)

Only applies exponential backoff to size() and delete() operations. get() and put() cannot be retried because their streaming nature means the stream is consumed:

async fn get(&self, key: &StorageKey) -> ... {
    // Cannot retry — stream already consumed
    self.storage.get(key).await
}

async fn put(&self, key: StorageKey, value: LFSObject) -> ... {
    // Cannot retry — stream already consumed
    self.storage.put(key, value).await
}

async fn size(&self, key: &StorageKey) -> ... {
    retry(ExponentialBackoff::default(), || async {
        Ok(self.storage.size(key).await?)
    }).await
}

6.2 Fault Injection (Faulty Decorator)

Gated behind the faulty feature flag. Injects random stream errors by checking if rand() == 0 on each chunk:

fn faulty_stream(stream: ByteStream) -> ByteStream {
    Box::pin(stream.map(|item| {
        if rand::thread_rng().random::<u8>() == 0 {
            Err(io::Error::other("injected fault"))
        } else {
            item
        }
    }))
}

7. Authentication

Rudolfs has no built-in authentication. This is explicitly stated as a "non-feature" in the README:

There is no client authentication. This is meant to be run in an internal network with clients you trust, not on the internet with malicious actors.

However, the design intentionally supports proxy-based authentication:

The extract_auth_header() function reflects the Authorization header from incoming requests into LFS action URLs
The hyperext.rs module supports X-Forwarded-Proto and X-Forwarded-Host headers for reverse proxy deployment
The authenticated: Some(true) field in batch responses tells the git-lfs client that auth has been handled

This means authentication is expected to be provided by an outer layer (nginx, a reverse proxy, or in our case, alknet's identity system).

8. Locking API

Rudolfs does NOT implement the LFS Locking API. The endpoints POST /api/{org}/{project}/locks, GET /api/{org}/{project}/locks, etc. are not implemented. The Transfer::LfsStandaloneFile variant exists in the enum but is never selected or handled.

9. Configuration

9.1 CLI Arguments (structopt)

Flag	Env Var	Default	Description
`--host`	`RUDOLFS_HOST`	—	Host/address to bind (e.g., `0.0.0.0:8080`)
`--port`	`PORT`	8080	Port (only used if `--host` not set)
`--key`	`RUDOLFS_KEY`	None	32-byte hex encryption key; omit for no encryption
`--cache-dir`	`RUDOLFS_CACHE_DIR`	None	Local disk cache directory; omit for no cache
`--max-cache-size`	`RUDOLFS_MAX_CACHE_SIZE`	50 GiB	Maximum cache size (0 = unlimited)
`--log-level`	`RUDOLFS_LOG`	info	Log level (trace/debug/info/warn/error)

S3 subcommand:

Flag	Env Var	Default	Description
`--bucket`	`RUDOLFS_S3_BUCKET`	(required)	S3 bucket name
`--prefix`	`RUDOLFS_S3_PREFIX`	`lfs`	S3 key prefix
`--cdn`	`RUDOLFS_S3_CDN`	None	CDN base URL for presigned URLs

Local subcommand:

Flag	Env Var	Default	Description
`--path`	`RUDOLFS_LOCAL_PATH`	(required)	Local directory for LFS data

9.2 AWS Credential Chain

AWS_ACCESS_KEY_ID + AWS_SECRET_ACCESS_KEY environment variables
~/.aws/credentials file
IAM instance profile (EC2)
Kubernetes WebIdentity (WebIdentityProvider::from_k8s_env())

Custom endpoints: AWS_S3_ENDPOINT + AWS_DEFAULT_REGION (for MinIO/rustfs)

10. Docker / Deployment

10.1 Dockerfile

Multi-stage build producing a scratch-based image (very small, <10MB):

Build stage: rust:1.91.1 with x86_64-unknown-linux-musl target
Uses tini as PID 1
Copies ca-certificates.crt for TLS
Final image: FROM scratch with only the binary + certs + tini

10.2 Docker Compose Variants

File	Backend	Notes
`docker-compose.yml`	S3	AWS S3 backend with cache and encryption
`docker-compose.local.yml`	Local disk	Local storage with cache and encryption
`docker-compose.minio.yml`	S3 (MinIO)	Uses `AWS_S3_ENDPOINT=http://minio:9000`

All variants:

Expose port 8080 (mapped to 8081)
Use a Docker volume data for cache
Pass encryption key and config via environment variables
Recommend nginx as a TLS-terminating reverse proxy

11. Relevance to Alknet

11.1 The Complete Git Hosting Stack

┌──────────────────────────────────────────────────────┐
│                    Alknet Layer                       │
│  ┌─────────┐  ┌─────────────┐  ┌─────────────────┐  │
│  │ Identity │  │ Call Protocol│  │ Operation Reg.  │  │
│  └────┬─────┘  └──────┬──────┘  └────────┬────────┘  │
│       │               │                   │           │
│  ┌────▼───────────────▼───────────────────▼────────┐ │
│  │              HTTP MessageInterface               │ │
│  └────┬───────────────┬──────────────────┬─────────┘ │
│       │               │                  │           │
└───────┼───────────────┼──────────────────┼───────────┘
        │               │                  │
  ┌─────▼─────┐  ┌──────▼──────┐  ┌────────▼────────┐
  │ gitserver  │  │  rudolfs    │  │     rustfs      │
  │ (git HTTP) │  │ (Git LFS)  │  │   (S3 storage)  │
  └─────┬─────┘  └──────┬──────┘  └────────┬────────┘
        │               │                  │
        │           ┌───▼──────────────────▼───┐
        │           │        Object Store       │
        │           │    (S3 API / rustfs)      │
        │           └──────────────────────────┘
        │
  ┌─────▼─────┐
  │   Git Repo │
  │  (bare)    │
  └───────────┘

gitserver: Handles git-upload-pack, git-receive-pack, info/refs (smart HTTP)
rudolfs: Handles POST /objects/batch, GET/PUT /object/{oid}, POST /objects/verify (LFS)
rustfs: Provides the S3-compatible object store that rudolfs uses as backend

11.2 Authentication: Alknet Identity → Rudolfs Auth

Rudolfs has no auth, but it reflects auth headers. This maps perfectly to alknet's approach:

alknet HTTP MessageInterface terminates the connection, authenticates via alknet Identity
The authenticated request is reverse-proxied to rudolfs
Alknet injects an Authorization header (e.g., a Bearer token or internal auth)
Rudolfs reflects this header in LFS action URLs, propagating auth to subsequent client requests

Integration approach:

Add a middleware layer before App that validates alknet Identity tokens
The extract_auth_header() function already handles the Authorization → action.header propagation
For fine-grained access control (per-org/project), add namespace validation against alknet's permission model

Code location to modify: app.rs — add auth middleware in the Service::call() method, or wrap App in a new AuthMiddleware service.

11.3 Embedding Rudolfs as a Library

Rudolfs exposes a clean library API via lib.rs:

// S3 backend
let mut builder = S3ServerBuilder::new(bucket);
builder.prefix("lfs".into());
builder.key(encryption_key);
builder.cache(Cache::new(cache_dir, max_cache_size));
builder.run(addr).await?;

// Local backend
let mut builder = LocalServerBuilder::new(path);
builder.key(encryption_key);
builder.run(addr).await?;

Feasibility: The Storage trait is fully abstract, and the App struct is generic over any S: Storage. This means:

Option A — Run as a separate service: Use S3ServerBuilder or LocalServerBuilder directly; minimal integration work, just deploy alongside gitserver and rustfs
Option B — Embed as a library: Import rudolfs as a crate dependency, construct the storage stack, and call spawn_server(storage, &addr).
Option C — Extract the storage layer: The storage module and lfs module can be extracted and used within alknet's own HTTP handler (e.g., integrating LFS endpoints into the same Axum server as gitserver)

Recommendation: Option A is simplest for initial deployment. Option C is best for long-term integration — merge LFS endpoints into alknet's unified HTTP server so there's one service, one port, one auth layer.

11.4 LFS HTTP Endpoints → Alknet MessageInterface Mapping

Rudolfs Endpoint	Alknet Mapping
`POST /api/{org}/{project}/objects/batch`	`http.handle("lfs.batch", org, project, batch_request)`
`GET /api/{org}/{project}/object/{oid}`	`http.handle("lfs.download", org, project, oid)`
`PUT /api/{org}/{project}/object/{oid}`	`http.handle("lfs.upload", org, project, oid)`
`POST /api/{org}/{project}/objects/verify`	`http.handle("lfs.verify", org, project, verify_request)`
`GET /` (index page)	Could be served via alknet's web UI or removed

The namespace extraction ({org}/{project}) maps directly to alknet's namespace model. In the LFS URL format http://gitlfs.example.com:8080/api/my-org/my-project, the my-org/my-project path naturally aligns with alknet's org/project hierarchy.

11.5 Caching Layer: Alknet-Managed vs Rudolfs-Managed

Current rudolfs approach: Each rudolfs instance manages its own LRU cache in memory + local disk. There's no distributed cache coordination.

Options for alknet:

Approach	Pros	Cons
Let rudolfs manage its cache	Simple, battle-tested, no coordination overhead	No cross-instance cache sharing; each node has its own cache
Alknet-managed distributed cache	Shared cache across nodes; better hit rates	Significant complexity; rudolfs' `Cached` is tightly coupled to its `Disk` backend
Replace local cache with rustfs-backed cache	Unified storage; no local disk dependency	Adds S3 round-trip latency for "cache" reads; defeats purpose of local cache

Recommendation: Start with rudolfs-managed local caching. Each alknet node runs a local rudolfs instance with its own cache on local SSD. The LRU prepopulation is a no-op for S3 backend (since S3 list() returns empty), but caches warm up quickly. If cross-node cache sharing is needed later, consider a Redis/memcached metadata layer while keeping local disk for byte storage.

11.6 CredentialProvider::S3AccessKey → Rudolfs S3 Configuration

rustfs uses a CredentialProvider enum that includes S3AccessKey. The mapping to rudolfs:

rustfs CredentialProvider	rudolfs Equivalent	How
`S3AccessKey { access_key, secret_key }`	`AWS_ACCESS_KEY_ID` + `AWS_SECRET_ACCESS_KEY` env vars	Set these in the environment before starting rudolfs
`Keystone { ... }`	Not directly supported	Would need custom credential provider for rudolfs
`OIDC { ... }`	Kubernetes WebIdentity	Already supported via `WebIdentityProvider::from_k8s_env()`
`IAMRole`	Default credential chain	Falls through to IAM instance profile

For integration with rustfs: Set AWS_S3_ENDPOINT=http://rustfs:9000 (or wherever rustfs listens) and AWS_DEFAULT_REGION=us-east-1 along with the rustfs access credentials. This is exactly how the MinIO compose file works:

environment:
  - AWS_S3_ENDPOINT=http://minio:9000
  - AWS_DEFAULT_REGION=us-east-1
  - AWS_ACCESS_KEY_ID=${RUSTFS_ACCESS_KEY}
  - AWS_SECRET_ACCESS_KEY=${RUSTFS_SECRET_KEY}

Note: rudolfs uses rusoto (not aws-sdk-rust), and rusoto is in maintenance mode. For long-term maintenance, migrating to aws-sdk-rust or a generic S3 client would be necessary. However, rusoto works well with rustfs/MinIO.

11.7 Encryption Considerations for Alknet

If rudolfs encryption is enabled:

LFS objects are encrypted before they reach S3/rustfs
This means rustfs cannot deduplicate across different encryption keys
If the encryption key rotates, all existing objects become invalid (SHA256 verification fails)
The nonce is derived from the OID, so same content → same encrypted form (no random IV)

For alknet: Consider whether encryption at the rudolfs layer is needed if rustfs already provides encryption at rest. Running encryption twice is wasteful. If rustfs provides server-side encryption (SSE), disable rudolfs --key and rely on rustfs's encryption instead.

11.8 Key Gaps and Limitations

Gap	Impact	Mitigation
No authentication	Must be behind a trusted network or reverse proxy	Alknet provides auth at the MessageInterface layer
No Locking API	Cannot use `git lfs lock`	Would need to implement LFS Locking API
No LFS listing API	`list()` returns empty for S3 backend	S3 doesn't need listing for LFS protocol, but it prevents cache prepopulation
No rate limiting	Vulnerable to abuse	Add rate limiting middleware
No multi-instance coordination	Each instance has independent cache	Accept as-is for now; add Redis metadata layer later
rusoto is unmaintained	Long-term maintenance risk	Migrate to aws-sdk-rust or generic S3 client
No content-type on responses	`GET /object/{oid}` returns `application/octet-stream`	Fine for LFS; browsers wouldn't render binary
No HTTPS/TLS	Must use reverse proxy for TLS	Fine for deployment behind nginx/alknet

11.9 Integration Blueprint

Phase 1 — Sidecar Deployment:

# docker-compose.yml (development)
services:
  rustfs:
    image: rustfs:latest
    ports: ["9000:9000"]
    
  rudolfs:
    image: jasonwhite0/rudolfs:latest
    environment:
      - AWS_S3_ENDPOINT=http://rustfs:9000
      - AWS_DEFAULT_REGION=us-east-1
      - AWS_ACCESS_KEY_ID=minioadmin
      - AWS_SECRET_ACCESS_KEY=minioadmin
      - RUDOLFS_S3_BUCKET=lfs-data
    ports: ["8080:8080"]
    
  gitserver:
    image: gitserver:latest
    ports: ["8081:8080"]

Phase 2 — Library Integration:

// In alknet's git service module
use rudolfs::{S3ServerBuilder, Cache};

async fn start_lfs(config: &LfsConfig) -> Result<()> {
    let mut builder = S3ServerBuilder::new(config.s3_bucket.clone());
    builder.prefix(config.s3_prefix.clone());
    
    if let Some(key) = &config.encryption_key {
        builder.key(*key);
    }
    
    if let Some(cache) = &config.cache {
        builder.cache(Cache::new(cache.dir.clone(), cache.max_size));
    }
    
    builder.run(config.listen_addr).await?;
    Ok(())
}

Phase 3 — Deep Integration:

Extract storage/ and lfs.rs from rudolfs
Integrate LFS endpoints into alknet's HTTP router (alongside gitserver's git endpoints)
Replace rusoto with a generic S3 client targeting rustfs
Add alknet Identity-based auth middleware
Implement LFS Locking API using alknet's operation registry

12. Test Coverage

12.1 Local Backend Tests

tests/test_local.rs: Exercises the local disk backend with encryption enabled and disabled. Creates temp git repos, pushes LFS objects (4MB, 8MB, 16MB), pulls them, clones repos, and verifies data integrity.

12.2 S3 Backend Tests

tests/test_s3.rs: Same test pattern as local but targets S3. Requires tests/.test_credentials.toml to run. Skips silently if credentials are missing.

12.3 Test Infrastructure

tests/common/mod.rs provides:

GitRepo — temp directory with a git repo configured for LFS
GitRepo::init(addr) — initializes repo, sets lfs.url to the test server
GitRepo::add_random(path, size, rng) — creates random binary files
GitRepo::lfs_push() / lfs_pull() — push/pull LFS objects
GitRepo::clean_lfs() — clears .git/lfs/ to force re-download
init_logger() — sets up tracing for test output

13. Summary

Rudolfs is a well-architected, composable Git LFS server that maps cleanly onto alknet's requirements:

Criterion	Assessment
Protocol correctness	Implements LFS batch API correctly; basic transfer adapter
Storage composability	Excellent — decorator pattern allows arbitrary stacking
Caching	Solid LRU + disk cache with background writes and corruption resilience
Encryption	XChaCha20 streaming, deterministic nonce from OID, optional
S3 compatibility	Works with any S3-compatible store (MinIO, rustfs) via `AWS_S3_ENDPOINT`
Auth story	No auth, but reflects headers — perfect for proxy-based auth
Embeddability	Good library API via `S3ServerBuilder`/`LocalServerBuilder`
Maintenance risk	`rusoto` is unmaintained; need migration path
Missing features	Locking API, listing API, rate limiting, multi-instance cache

Bottom line: Rudolfs is a strong foundation for the LFS layer in the alknet git hosting stack. Its decorator pattern, streaming architecture, and S3 compatibility make it an excellent match for rustfs. The main gaps (auth, locking, rusoto maintenance) are well-understood and have clear integration paths within alknet.

39 KiB Raw Blame History