Deep dive into rudolfs architecture, LFS protocol implementation, storage backend decorator pattern (Verify→Encrypted→Cached→Retrying→S3), caching layer, encryption, and relevance to the alknet git hosting stack (gitserver + rudolfs + rustfs).
39 KiB
Rudolfs Reference Document
Source: https://github.com/jasonwhite/rudolfs (cloned at
/workspace/rudolfs/) Version: 0.3.8 (Cargo.toml) License: MIT Date researched: 2026-06-08 Purpose: Evaluate rudolfs as a Git LFS server for the alknet git hosting stack (gitserver + rudolfs + rustfs)
1. Architecture Overview
1.1 What is Rudolfs?
Rudolfs is a high-performance, caching Git LFS server written in Rust with an AWS S3 storage backend. It implements the Git LFS batch API specification, providing a clean separation between the LFS protocol layer and pluggable storage backends. The storage system uses a decorator (Russian doll) pattern where backends are composed in layers — each adding a capability (encryption, verification, caching, retrying) — resulting in a flexible and composable architecture.
Key design principles:
- Modular storage backends: S3, local disk, and any combination with caching
- Streaming everywhere: All data flows as async byte streams; nothing is buffered entirely in memory
- Decorator composition: Storage capabilities are stacked via composable wrapper types
- Corruption resilience: SHA256 verification on upload and download; corrupted objects are auto-purged from cache
- Optional encryption: XChaCha20 stream cipher for data at rest (both cache and permanent storage)
1.2 Source Structure
src/
├── main.rs # CLI entry point (structopt), server startup
├── lib.rs # Public API: S3ServerBuilder, LocalServerBuilder, Cache, Server trait
├── app.rs # Hyper service: HTTP routing, batch API, upload/download endpoints
├── lfs.rs # LFS protocol types (BatchRequest/Response, Action, Oid, Transfer)
├── sha256.rs # Sha256 type, hex serde, VerifyStream (streaming SHA256 checker)
├── lru.rs # In-memory LRU cache (LinkedHashMap-based) for cache metadata
├── error.rs # Error type (re-export of anyhow::Error)
├── hyperext.rs # RequestExt trait (X-Forwarded-Proto, X-Forwarded-Host, Host headers)
├── logger.rs # Logger middleware (request/response logging)
├── util.rs # NamedTempFile (async temp file with auto-cleanup + atomic rename)
└── storage/
├── mod.rs # Core traits: Storage, LFSObject, StorageKey, Namespace, ByteStream
├── s3.rs # S3 backend (rusoto): multipart upload, head/get/put, presigned URLs
├── disk.rs # Local disk backend: file-based storage with temp file pattern
├── cached.rs # Caching decorator: LRU + disk cache → permanent storage
├── encrypt.rs # Encryption decorator: XChaCha20 stream cipher
├── verify.rs # Verification decorator: streaming SHA256 with auto-delete on corruption
├── retrying.rs # Retry decorator: exponential backoff for size/delete operations
└── faulty.rs # Fault injection decorator (test-only, gated by "faulty" feature)
1.3 Key Dependencies
| Dependency | Version | Purpose |
|---|---|---|
hyper |
0.14 | HTTP server (no frameworks like Axum/Actix) |
rusoto_s3 |
0.48 | S3 API client (PutObject, GetObject, HeadObject, multipart upload) |
rusoto_core |
0.48 | AWS credential provider, HTTP client |
rusoto_sts |
0.48 | Kubernetes WebIdentity credential provider |
rusoto_credential |
0.48 | DefaultCredentialsProvider chain (env → file → IAM) |
tokio |
1 | Async runtime (full features) |
tokio-util |
0.7 | BytesCodec for file I/O streaming |
futures |
0.3 | Stream combinators, try_join_all, channel mpsc |
serde / serde_json |
1 / 1 | JSON serialization for LFS batch API |
sha2 |
0.11.0-rc.3 | SHA256 hashing for LFS object verification |
chacha |
0.4 | XChaCha20 stream cipher for encryption |
linked-hash-map |
0.5 | LRU cache data structure with O(1) get_refresh |
askama |
0.14 | HTML template engine (index page) |
structopt |
0.3 | CLI argument parsing |
backoff |
0.4 | Exponential backoff for S3 retries |
uuid |
1.1 | Temp file naming (v4) |
bytes |
1 | Bytes/BytesMut for streaming |
derive_more |
2 | Display/From derives for error types |
human-size |
0.4 | Human-readable size parsing (CLI) |
humansize |
2 | Human-readable size formatting (logging) |
Note: rudolfs uses rusoto (the older Rust AWS SDK), not aws-sdk-rust. This is important for compatibility with S3-compatible stores like MinIO and, by extension, rustfs.
1.4 Request Flow
LFS Batch API (Upload)
Client → POST /api/{org}/{project}/objects/batch
→ { "operation": "upload", "objects": [{ "oid": "...", "size": N }] }
→ Server: parse request, check storage.size() for each object
← { "objects": [{ "oid": "...", "size": N, "actions": { "upload": { "href": "..." }, "verify": { "href": "..." } } }] }
Client → PUT /api/{org}/{project}/object/{oid}
→ Raw LFS object data (Content-Length header required)
→ Server: stream through Verify→Encrypted→Cached→S3 (or disk)
← 200 OK (empty body)
LFS Batch API (Download)
Client → POST /api/{org}/{project}/objects/batch
→ { "operation": "download", "objects": [...] }
← { "objects": [{ "oid": "...", "actions": { "download": { "href": "..." } } }] }
Client → GET /api/{org}/{project}/object/{oid}
→ Server: lookup in cache (LRU metadata), fetch from storage if needed
← Streamed LFS object data with Content-Length
CDN Mode (Presigned URLs)
When --cdn is configured, the batch response for downloads returns a presigned S3 URL directly instead of routing through rudolfs. Uploads also use presigned S3 URLs. This bypasses encryption (since data never touches rudolfs).
2. Git LFS Protocol Implementation
2.1 LFS Protocol Overview
Git LFS uses an HTTP-based API where the client first contacts the LFS server to discover how to transfer objects, then performs the actual transfers. Rudolfs implements the "basic" transfer adapter exclusively.
Specification reference: git-lfs/docs/api/batch.md
2.2 Endpoints Implemented
| Route | Method | Purpose |
|---|---|---|
/ |
GET | HTML index page with setup instructions |
/api/{org}/{project}/objects/batch |
POST | LFS batch API — core protocol endpoint |
/api/{org}/{project}/object/{oid} |
GET | Direct download of a single LFS object |
/api/{org}/{project}/object/{oid} |
PUT | Direct upload of a single LFS object |
/api/{org}/{project}/objects/verify |
POST | Verify object exists with correct size |
2.3 Batch API Detail
Request (BatchRequest):
pub struct BatchRequest {
pub operation: Operation, // "upload" or "download"
pub transfers: Option<Vec<Transfer>>, // Transfer adapters client supports
pub refs: Option<BTreeMap<String, String>>, // Git ref context (v2.4+)
pub objects: Vec<RequestObject>, // Objects to transfer
}
Response (BatchResponse):
pub struct BatchResponse {
pub transfer: Option<Transfer>, // Always Transfer::Basic
pub objects: Vec<ResponseObject>,
}
Each ResponseObject contains:
oid: SHA256 hash of the objectsize: Byte sizeerror: Optional error (code + message)authenticated: Some(true) — signals the client that auth is handledactions: Optional Actions with href, header, expires_in/expires_at
2.4 Transfer Adapters
pub enum Transfer {
Basic, // The basic HTTP transfer adapter
LfsStandaloneFile, // Parsed but not implemented
Custom, // Catch-all via #[serde(other)]
}
Rudolfs always responds with Transfer::Basic. It parses LfsStandaloneFile and Custom from client requests but never selects them.
2.5 Upload Expiration
Upload actions include an expires_in field set to 30 minutes (1800 seconds):
const UPLOAD_EXPIRATION: Duration = Duration::from_secs(30 * 60);
2.6 Auth Header Reflection
Rudolfs has no built-in authentication. However, it reflects any Authorization header from the incoming batch request back into the header field of all action URLs. This allows a reverse proxy (e.g., nginx with basic auth) to provide authentication:
fn extract_auth_header(headers: &HeaderMap) -> Option<BTreeMap<String, String>> {
// Filters for "authorization" header and reflects it into action.header
}
This is critical for the alknet integration model — authentication is delegated entirely to an outer layer.
2.7 Namespace (org/project) Extraction
The URL path /api/{org}/{project}/... is parsed to extract the namespace:
let namespace = match (parts.next(), parts.next()) {
(Some(org), Some(project)) => Namespace::new(org.into(), project.into()),
_ => // 400 Bad Request
};
This provides multi-tenancy by URL path. Different orgs/projects are isolated in storage (S3 key prefix or disk directory), but objects with identical OIDs across projects can share storage if they use the same namespace path.
2.8 Verify Endpoint
pub struct VerifyRequest {
pub oid: Oid,
pub size: u64,
}
The verify endpoint (POST .../objects/verify) checks whether an uploaded object exists and has the correct size. It calls storage.size(&key) and compares against the request's size field.
3. Storage Architecture
3.1 The Storage Trait
The core abstraction is the Storage trait:
#[async_trait]
pub trait Storage {
type Error: fmt::Display + Send;
async fn get(&self, key: &StorageKey) -> Result<Option<LFSObject>, Self::Error>;
async fn put(&self, key: StorageKey, value: LFSObject) -> Result<(), Self::Error>;
async fn size(&self, key: &StorageKey) -> Result<Option<u64>, Self::Error>;
async fn delete(&self, key: &StorageKey) -> Result<(), Self::Error>;
fn list(&self) -> StorageStream<(StorageKey, u64), Self::Error>;
async fn total_size(&self) -> Option<u64>;
async fn max_size(&self) -> Option<u64>;
fn public_url(&self, key: &StorageKey) -> Option<String>;
async fn upload_url(&self, key: &StorageKey, expires_in: Duration) -> Option<String>;
}
Key types:
StorageKey = (Namespace, Oid)— composite key for all storage operationsNamespace = (org: String, project: String)— tenant/organization isolationOid = Sha256— 32-byte SHA256 hash, displayed as hexLFSObject— holds(len: u64, stream: ByteStream)whereByteStreamis aPin<Box<dyn Stream<Item = Result<Bytes, io::Error>> + Send>>ByteStream— pinned, boxed, async byte stream
3.2 LFSObject — Fanout Pattern
LFSObject has a critical fanout() method that duplicates the byte stream into two identical streams using mpsc::channel(0):
pub fn fanout(self) -> (impl Future<Output = Result<(), io::Error>>, Self, Self)
This is used in two places:
- Caching on download: Stream flows to both the client and the cache simultaneously
- Caching on upload: Stream flows to both the permanent storage and the cache, with a
oneshotsignal ensuring the cache only persists after the upload succeeds
3.3 Storage Composition
The decorator pattern produces these composition chains depending on configuration:
S3 with Cache and Encryption (fullest stack):
Client ↔ Verify ↔ Encrypted ↔ Cached ↔ Retrying(Disk → S3)
↑cache ↑permanent
Actual call order during put:
- Verify: wraps stream with SHA256 verification; rejects upload if hash mismatch
- Encrypted: applies XChaCha20 cipher to the stream (nonce = first 24 bytes of OID)
- Cached: fans out stream to both
cacheandstorage; uses LRU to manage cache - Retrying: wraps S3 operations with exponential backoff (only for
sizeanddelete) - S3: multipart uploads to
s3://{bucket}/{prefix}/{org}/{project}/{sha256_path}
S3 with CDN (least processing):
Client ↔ App (batch API generates presigned S3 URLs; no data flows through rudolfs)
Local Disk with Cache and Encryption:
Client ↔ Verify ↔ Encrypted ↔ Cached(Disk_cache → Disk_storage)
Local Disk only:
Client ↔ Verify ↔ Disk
3.4 S3 Backend Deep Dive
Object Key Format
fn key_to_path(&self, key: &StorageKey) -> String {
// With prefix "lfs": "lfs/{org}/{project}/{sha256_path}"
// Without prefix: "{org}/{project}/{sha256_path}"
}
Where sha256_path is formatted as {first_two_hex}/{next_two_hex}/{full_hex}:
// Sha256Path Display:
write!(f, "{:02x}/{:02x}/{}", self.0.bytes()[0], self.0.bytes()[1], self.0)
Example: an OID b1fbeefc23e6a1496f7d0c2fb635bfc78f7ddc2da963ea9c6a63eb324260e6d in namespace myorg/myproject with default prefix becomes:
lfs/myorg/myproject/b1/fe/b1fbeefc23e6a1496f7d0c2fb635bfc78f7ddc2da963ea9c6a63eb324260e6d
The two-level hex prefix distributes objects across S3's flat namespace for better I/O performance.
S3 Operations
| Operation | S3 API | Notes |
|---|---|---|
| Get object | GetObject |
Streams response body |
| Put object | CreateMultipartUpload → UploadPart (100MB chunks) → CompleteMultipartUpload |
Handles files >5GB |
| Check size | HeadObject |
Uses content_length; handles Rusoto bug for 404 |
| Delete | No-op | Always returns Ok(()) — never deletes from S3 |
| List | Empty stream | Returns no entries (S3 listing not implemented) |
| Presigned download | GetPreSignedUrl |
Only when CDN is configured |
| Presigned upload | PutObject presigned URL |
Only when CDN is configured |
HeadObject 404 Bug Workaround
Rusoto has a known bug where HeadObject for a missing key returns RusotoError::Unknown with status 404 instead of HeadObjectError::NoSuchKey. Rudolfs checks for this:
Err(RusotoError::Unknown(e)) if e.status == 404 => Ok(None)
Initialization Validation
On startup, the S3 backend performs a HeadBucket request with exponential backoff to validate:
- The bucket exists
- Credentials are valid
This ensures fast failures for misconfiguration.
Credential Provider Chain
// Try Kubernetes WebIdentity first
let k8s_provider = WebIdentityProvider::from_k8s_env();
if k8s_provider.credentials().await.is_ok() {
// Use K8s credentials
} else {
// Fall back to default provider chain:
// 1. AWS_ACCESS_KEY_ID / AWS_SECRET_ACCESS_KEY env vars
// 2. ~/.aws/credentials file
// 3. IAM instance profile
}
Custom S3 Endpoints
The AWS_S3_ENDPOINT environment variable enables custom S3-compatible endpoints:
let region = if let Ok(endpoint) = std::env::var("AWS_S3_ENDPOINT") {
// Must also set AWS_DEFAULT_REGION or AWS_REGION
Region::Custom { name, endpoint }
} else {
Region::default()
}
This is how MinIO (and rustfs) integration works — set AWS_S3_ENDPOINT to the MinIO/rustfs URL.
3.5 Disk Backend Deep Dive
On-Disk Layout
{root}/
├── objects/
│ └── {org}/
│ └── {project}/
│ └── {sha256_prefix}/
│ └── {sha256}/
│ └── {sha256_hex_full} # Actual object file
└── incomplete/ # Temp directory for in-progress uploads
└── {uuid} # NamedTempFile, auto-cleaned on failure
Write Pattern (Atomic Put)
- Create a
NamedTempFilein{root}/incomplete/{uuid} - Stream the entire LFS object to the temp file via
Framedcodec - Verify byte count matches expected
len fs::create_dir_allfor the target directoryfile.persist(path)— atomicrename()from temp to final location
This ensures that partially written objects never appear at their final path.
SHA256 Path Structure
The same {first_two_hex}/{next_two_hex}/{full_hex} structure as S3 is used, creating a 2-level directory hierarchy that prevents any single directory from having too many entries.
4. Caching Layer
4.1 Architecture
The Cached backend wraps an inner cache storage (typically Disk) and an outer permanent storage (typically S3), connected by an in-memory LRU index:
┌─────────────┐
get()──→│ LRU Index │──→ cache hit? ──→ Disk.get()
│ (in-memory) │ └── miss? ──→ S3.get() + background cache
└─────────────┘
pub struct Backend<C, S> {
lru: Arc<Mutex<Cache>>, // In-memory LRU metadata index
max_size: u64, // Maximum cache size in bytes (0 = unlimited)
cache: Arc<C>, // Cache storage (usually Disk)
storage: Arc<S>, // Permanent storage (usually S3)
}
4.2 LRU Data Structure
pub struct Cache<K> {
map: LinkedHashMap<K, u64>, // Key → size in bytes
size: u64, // Total size of all entries
}
Uses linked_hash_map which maintains insertion order and provides get_refresh() to move an accessed entry to the most-recently-used position. This gives O(1) access and O(1) eviction from the front.
4.3 Cache Hit/Miss Behavior
get() — Download
- LRU hit: If
lru.get_refresh(key)returns size → querycache.get(key)- If cache has the object → return it (fast path)
- If cache miss (file deleted but LRU not updated) → remove from LRU, fall through to storage
- LRU miss: Query
storage.get(key)- If storage has it →
fanout()the stream: one copy to client, one copy to cache in background - If storage doesn't have it → return
None
- If storage has it →
Background caching is fire-and-forget — errors are logged but don't affect the client response.
put() — Upload
fanout()the uploaded stream into two copies- One copy goes to permanent storage (
storage.put()) - When permanent storage finishes receiving, a
oneshotsignal is sent - The cache copy appends an empty
Bytes::new()chunk that only resolves after this signal - Both streams complete in parallel via
try_join3(f, cache, store)
This ensures the cache only persists data that was successfully stored permanently.
size() — Batch API Check
Only checks the LRU index (without perturbing LRU order). Falls through to permanent storage if not cached. This is critical for the batch API, which checks which objects already exist.
delete() — Cache Only
Only deletes from the cache, never from permanent storage. Called by the Verify decorator when corruption is detected.
4.4 Eviction Policy
When the cache exceeds max_size bytes:
pop()the least-recently-used entry from the LRU- Delete the corresponding file from cache storage
- Repeat until
lru.size() <= max_size
Important: max_size = 0 means unlimited cache — no eviction occurs.
4.5 Startup Prepopulation
On startup, the LRU index is rebuilt from the cache storage's list() method:
let lru = Cache::from_stream(cache.list()).await?;
Then immediately pruned if the current cache exceeds the configured max_size:
let count = prune_cache(lru.clone(), max_size, cache.clone()).await?;
For S3 backend: list() returns an empty stream (not implemented), so the LRU starts empty and builds up as objects are accessed.
For Disk backend: list() walks the filesystem to rebuild the LRU index, making the disk cache survive restarts.
4.6 Corruption Resilience via Verify
The Verify decorator wraps the byte stream with a streaming SHA256 hash:
// On download:
let stream = VerifyStream::new(stream.map_err(Error::from), len, *key.oid());
// If SHA256 mismatches → automatically delete corrupted object from cache
// → return error to client
// On upload:
let stream = VerifyStream::new(stream.map_err(Error::from), len, *key.oid());
// If SHA256 mismatches → reject the upload entirely
The VerifyStream hashes every chunk and compares the final digest against the expected OID when len >= total. This catches:
- Bit rot in the cache
- Encryption key changes (different hash)
- Network transfer errors
5. Encryption
5.1 Algorithm
XChaCha20 stream cipher (extended nonce variant of ChaCha20), implemented via the chacha crate.
5.2 Key and Nonce
- Key: 32 bytes, provided via
--keyCLI flag orRUDOLFS_KEYenv var (hex-encoded) - Nonce: First 24 bytes of the SHA256 OID used as the XChaCha20 nonce
let mut nonce: [u8; 24] = [0; 24];
nonce.copy_from_slice(&key.oid().bytes()[0..24]);
let chacha = ChaCha::new_xchacha20(&self.key, &nonce);
Security implication: The nonce is deterministic (derived from the OID). Since OIDs are content-addressable (SHA256 of content), identical content always produces the same nonce. This is safe because each (key, nonce) pair is only ever used to encrypt one message.
5.3 Streaming Encryption
Encryption/decryption is applied to the byte stream in-flight using xor_stream():
fn xor_stream<S>(mut chacha: ChaCha, stream: S) -> impl Stream<Item = Result<Bytes, io::Error>>
No buffering of the entire object — each chunk is encrypted/decrypted as it passes through. This means encryption works on objects of arbitrary size (including those exceeding available memory).
5.4 Encryption + CDN Incompatibility
From lib.rs:
if self.cdn.is_some() {
tracing::warn!("A CDN was specified. Since uploads and downloads do not flow \
through Rudolfs in this case, they will *not* be encrypted.");
// Cache is also disabled when CDN is used
}
When CDN mode is active, presigned URLs bypass rudolfs entirely, so encryption cannot be applied.
6. Retry and Fault Injection
6.1 Retry (Retrying Decorator)
Only applies exponential backoff to size() and delete() operations. get() and put() cannot be retried because their streaming nature means the stream is consumed:
async fn get(&self, key: &StorageKey) -> ... {
// Cannot retry — stream already consumed
self.storage.get(key).await
}
async fn put(&self, key: StorageKey, value: LFSObject) -> ... {
// Cannot retry — stream already consumed
self.storage.put(key, value).await
}
async fn size(&self, key: &StorageKey) -> ... {
retry(ExponentialBackoff::default(), || async {
Ok(self.storage.size(key).await?)
}).await
}
6.2 Fault Injection (Faulty Decorator)
Gated behind the faulty feature flag. Injects random stream errors by checking if rand() == 0 on each chunk:
fn faulty_stream(stream: ByteStream) -> ByteStream {
Box::pin(stream.map(|item| {
if rand::thread_rng().random::<u8>() == 0 {
Err(io::Error::other("injected fault"))
} else {
item
}
}))
}
7. Authentication
Rudolfs has no built-in authentication. This is explicitly stated as a "non-feature" in the README:
There is no client authentication. This is meant to be run in an internal network with clients you trust, not on the internet with malicious actors.
However, the design intentionally supports proxy-based authentication:
- The
extract_auth_header()function reflects theAuthorizationheader from incoming requests into LFS action URLs - The
hyperext.rsmodule supportsX-Forwarded-ProtoandX-Forwarded-Hostheaders for reverse proxy deployment - The
authenticated: Some(true)field in batch responses tells the git-lfs client that auth has been handled
This means authentication is expected to be provided by an outer layer (nginx, a reverse proxy, or in our case, alknet's identity system).
8. Locking API
Rudolfs does NOT implement the LFS Locking API. The endpoints POST /api/{org}/{project}/locks, GET /api/{org}/{project}/locks, etc. are not implemented. The Transfer::LfsStandaloneFile variant exists in the enum but is never selected or handled.
9. Configuration
9.1 CLI Arguments (structopt)
| Flag | Env Var | Default | Description |
|---|---|---|---|
--host |
RUDOLFS_HOST |
— | Host/address to bind (e.g., 0.0.0.0:8080) |
--port |
PORT |
8080 | Port (only used if --host not set) |
--key |
RUDOLFS_KEY |
None | 32-byte hex encryption key; omit for no encryption |
--cache-dir |
RUDOLFS_CACHE_DIR |
None | Local disk cache directory; omit for no cache |
--max-cache-size |
RUDOLFS_MAX_CACHE_SIZE |
50 GiB | Maximum cache size (0 = unlimited) |
--log-level |
RUDOLFS_LOG |
info | Log level (trace/debug/info/warn/error) |
S3 subcommand:
| Flag | Env Var | Default | Description |
|---|---|---|---|
--bucket |
RUDOLFS_S3_BUCKET |
(required) | S3 bucket name |
--prefix |
RUDOLFS_S3_PREFIX |
lfs |
S3 key prefix |
--cdn |
RUDOLFS_S3_CDN |
None | CDN base URL for presigned URLs |
Local subcommand:
| Flag | Env Var | Default | Description |
|---|---|---|---|
--path |
RUDOLFS_LOCAL_PATH |
(required) | Local directory for LFS data |
9.2 AWS Credential Chain
AWS_ACCESS_KEY_ID+AWS_SECRET_ACCESS_KEYenvironment variables~/.aws/credentialsfile- IAM instance profile (EC2)
- Kubernetes WebIdentity (
WebIdentityProvider::from_k8s_env())
Custom endpoints: AWS_S3_ENDPOINT + AWS_DEFAULT_REGION (for MinIO/rustfs)
10. Docker / Deployment
10.1 Dockerfile
Multi-stage build producing a scratch-based image (very small, <10MB):
- Build stage:
rust:1.91.1withx86_64-unknown-linux-musltarget - Uses
tinias PID 1 - Copies
ca-certificates.crtfor TLS - Final image:
FROM scratchwith only the binary + certs + tini
10.2 Docker Compose Variants
| File | Backend | Notes |
|---|---|---|
docker-compose.yml |
S3 | AWS S3 backend with cache and encryption |
docker-compose.local.yml |
Local disk | Local storage with cache and encryption |
docker-compose.minio.yml |
S3 (MinIO) | Uses AWS_S3_ENDPOINT=http://minio:9000 |
All variants:
- Expose port 8080 (mapped to 8081)
- Use a Docker volume
datafor cache - Pass encryption key and config via environment variables
- Recommend nginx as a TLS-terminating reverse proxy
11. Relevance to Alknet
11.1 The Complete Git Hosting Stack
┌──────────────────────────────────────────────────────┐
│ Alknet Layer │
│ ┌─────────┐ ┌─────────────┐ ┌─────────────────┐ │
│ │ Identity │ │ Call Protocol│ │ Operation Reg. │ │
│ └────┬─────┘ └──────┬──────┘ └────────┬────────┘ │
│ │ │ │ │
│ ┌────▼───────────────▼───────────────────▼────────┐ │
│ │ HTTP MessageInterface │ │
│ └────┬───────────────┬──────────────────┬─────────┘ │
│ │ │ │ │
└───────┼───────────────┼──────────────────┼───────────┘
│ │ │
┌─────▼─────┐ ┌──────▼──────┐ ┌────────▼────────┐
│ gitserver │ │ rudolfs │ │ rustfs │
│ (git HTTP) │ │ (Git LFS) │ │ (S3 storage) │
└─────┬─────┘ └──────┬──────┘ └────────┬────────┘
│ │ │
│ ┌───▼──────────────────▼───┐
│ │ Object Store │
│ │ (S3 API / rustfs) │
│ └──────────────────────────┘
│
┌─────▼─────┐
│ Git Repo │
│ (bare) │
└───────────┘
- gitserver: Handles
git-upload-pack,git-receive-pack,info/refs(smart HTTP) - rudolfs: Handles
POST /objects/batch,GET/PUT /object/{oid},POST /objects/verify(LFS) - rustfs: Provides the S3-compatible object store that rudolfs uses as backend
11.2 Authentication: Alknet Identity → Rudolfs Auth
Rudolfs has no auth, but it reflects auth headers. This maps perfectly to alknet's approach:
- alknet HTTP MessageInterface terminates the connection, authenticates via alknet Identity
- The authenticated request is reverse-proxied to rudolfs
- Alknet injects an
Authorizationheader (e.g., a Bearer token or internal auth) - Rudolfs reflects this header in LFS action URLs, propagating auth to subsequent client requests
Integration approach:
- Add a middleware layer before
Appthat validates alknet Identity tokens - The
extract_auth_header()function already handles theAuthorization→action.headerpropagation - For fine-grained access control (per-org/project), add namespace validation against alknet's permission model
Code location to modify: app.rs — add auth middleware in the Service::call() method, or wrap App in a new AuthMiddleware service.
11.3 Embedding Rudolfs as a Library
Rudolfs exposes a clean library API via lib.rs:
// S3 backend
let mut builder = S3ServerBuilder::new(bucket);
builder.prefix("lfs".into());
builder.key(encryption_key);
builder.cache(Cache::new(cache_dir, max_cache_size));
builder.run(addr).await?;
// Local backend
let mut builder = LocalServerBuilder::new(path);
builder.key(encryption_key);
builder.run(addr).await?;
Feasibility: The Storage trait is fully abstract, and the App struct is generic over any S: Storage. This means:
- Option A — Run as a separate service: Use
S3ServerBuilderorLocalServerBuilderdirectly; minimal integration work, just deploy alongside gitserver and rustfs - Option B — Embed as a library: Import
rudolfsas a crate dependency, construct the storage stack, and callspawn_server(storage, &addr). - Option C — Extract the storage layer: The
storagemodule andlfsmodule can be extracted and used within alknet's own HTTP handler (e.g., integrating LFS endpoints into the same Axum server as gitserver)
Recommendation: Option A is simplest for initial deployment. Option C is best for long-term integration — merge LFS endpoints into alknet's unified HTTP server so there's one service, one port, one auth layer.
11.4 LFS HTTP Endpoints → Alknet MessageInterface Mapping
| Rudolfs Endpoint | Alknet Mapping |
|---|---|
POST /api/{org}/{project}/objects/batch |
http.handle("lfs.batch", org, project, batch_request) |
GET /api/{org}/{project}/object/{oid} |
http.handle("lfs.download", org, project, oid) |
PUT /api/{org}/{project}/object/{oid} |
http.handle("lfs.upload", org, project, oid) |
POST /api/{org}/{project}/objects/verify |
http.handle("lfs.verify", org, project, verify_request) |
GET / (index page) |
Could be served via alknet's web UI or removed |
The namespace extraction ({org}/{project}) maps directly to alknet's namespace model. In the LFS URL format http://gitlfs.example.com:8080/api/my-org/my-project, the my-org/my-project path naturally aligns with alknet's org/project hierarchy.
11.5 Caching Layer: Alknet-Managed vs Rudolfs-Managed
Current rudolfs approach: Each rudolfs instance manages its own LRU cache in memory + local disk. There's no distributed cache coordination.
Options for alknet:
| Approach | Pros | Cons |
|---|---|---|
| Let rudolfs manage its cache | Simple, battle-tested, no coordination overhead | No cross-instance cache sharing; each node has its own cache |
| Alknet-managed distributed cache | Shared cache across nodes; better hit rates | Significant complexity; rudolfs' Cached is tightly coupled to its Disk backend |
| Replace local cache with rustfs-backed cache | Unified storage; no local disk dependency | Adds S3 round-trip latency for "cache" reads; defeats purpose of local cache |
Recommendation: Start with rudolfs-managed local caching. Each alknet node runs a local rudolfs instance with its own cache on local SSD. The LRU prepopulation is a no-op for S3 backend (since S3 list() returns empty), but caches warm up quickly. If cross-node cache sharing is needed later, consider a Redis/memcached metadata layer while keeping local disk for byte storage.
11.6 CredentialProvider::S3AccessKey → Rudolfs S3 Configuration
rustfs uses a CredentialProvider enum that includes S3AccessKey. The mapping to rudolfs:
| rustfs CredentialProvider | rudolfs Equivalent | How |
|---|---|---|
S3AccessKey { access_key, secret_key } |
AWS_ACCESS_KEY_ID + AWS_SECRET_ACCESS_KEY env vars |
Set these in the environment before starting rudolfs |
Keystone { ... } |
Not directly supported | Would need custom credential provider for rudolfs |
OIDC { ... } |
Kubernetes WebIdentity | Already supported via WebIdentityProvider::from_k8s_env() |
IAMRole |
Default credential chain | Falls through to IAM instance profile |
For integration with rustfs: Set AWS_S3_ENDPOINT=http://rustfs:9000 (or wherever rustfs listens) and AWS_DEFAULT_REGION=us-east-1 along with the rustfs access credentials. This is exactly how the MinIO compose file works:
environment:
- AWS_S3_ENDPOINT=http://minio:9000
- AWS_DEFAULT_REGION=us-east-1
- AWS_ACCESS_KEY_ID=${RUSTFS_ACCESS_KEY}
- AWS_SECRET_ACCESS_KEY=${RUSTFS_SECRET_KEY}
Note: rudolfs uses rusoto (not aws-sdk-rust), and rusoto is in maintenance mode. For long-term maintenance, migrating to aws-sdk-rust or a generic S3 client would be necessary. However, rusoto works well with rustfs/MinIO.
11.7 Encryption Considerations for Alknet
If rudolfs encryption is enabled:
- LFS objects are encrypted before they reach S3/rustfs
- This means rustfs cannot deduplicate across different encryption keys
- If the encryption key rotates, all existing objects become invalid (SHA256 verification fails)
- The nonce is derived from the OID, so same content → same encrypted form (no random IV)
For alknet: Consider whether encryption at the rudolfs layer is needed if rustfs already provides encryption at rest. Running encryption twice is wasteful. If rustfs provides server-side encryption (SSE), disable rudolfs --key and rely on rustfs's encryption instead.
11.8 Key Gaps and Limitations
| Gap | Impact | Mitigation |
|---|---|---|
| No authentication | Must be behind a trusted network or reverse proxy | Alknet provides auth at the MessageInterface layer |
| No Locking API | Cannot use git lfs lock |
Would need to implement LFS Locking API |
| No LFS listing API | list() returns empty for S3 backend |
S3 doesn't need listing for LFS protocol, but it prevents cache prepopulation |
| No rate limiting | Vulnerable to abuse | Add rate limiting middleware |
| No multi-instance coordination | Each instance has independent cache | Accept as-is for now; add Redis metadata layer later |
| rusoto is unmaintained | Long-term maintenance risk | Migrate to aws-sdk-rust or generic S3 client |
| No content-type on responses | GET /object/{oid} returns application/octet-stream |
Fine for LFS; browsers wouldn't render binary |
| No HTTPS/TLS | Must use reverse proxy for TLS | Fine for deployment behind nginx/alknet |
11.9 Integration Blueprint
Phase 1 — Sidecar Deployment:
# docker-compose.yml (development)
services:
rustfs:
image: rustfs:latest
ports: ["9000:9000"]
rudolfs:
image: jasonwhite0/rudolfs:latest
environment:
- AWS_S3_ENDPOINT=http://rustfs:9000
- AWS_DEFAULT_REGION=us-east-1
- AWS_ACCESS_KEY_ID=minioadmin
- AWS_SECRET_ACCESS_KEY=minioadmin
- RUDOLFS_S3_BUCKET=lfs-data
ports: ["8080:8080"]
gitserver:
image: gitserver:latest
ports: ["8081:8080"]
Phase 2 — Library Integration:
// In alknet's git service module
use rudolfs::{S3ServerBuilder, Cache};
async fn start_lfs(config: &LfsConfig) -> Result<()> {
let mut builder = S3ServerBuilder::new(config.s3_bucket.clone());
builder.prefix(config.s3_prefix.clone());
if let Some(key) = &config.encryption_key {
builder.key(*key);
}
if let Some(cache) = &config.cache {
builder.cache(Cache::new(cache.dir.clone(), cache.max_size));
}
builder.run(config.listen_addr).await?;
Ok(())
}
Phase 3 — Deep Integration:
- Extract
storage/andlfs.rsfrom rudolfs - Integrate LFS endpoints into alknet's HTTP router (alongside gitserver's git endpoints)
- Replace
rusotowith a generic S3 client targeting rustfs - Add alknet Identity-based auth middleware
- Implement LFS Locking API using alknet's operation registry
12. Test Coverage
12.1 Local Backend Tests
tests/test_local.rs: Exercises the local disk backend with encryption enabled and disabled. Creates temp git repos, pushes LFS objects (4MB, 8MB, 16MB), pulls them, clones repos, and verifies data integrity.
12.2 S3 Backend Tests
tests/test_s3.rs: Same test pattern as local but targets S3. Requires tests/.test_credentials.toml to run. Skips silently if credentials are missing.
12.3 Test Infrastructure
tests/common/mod.rs provides:
GitRepo— temp directory with a git repo configured for LFSGitRepo::init(addr)— initializes repo, setslfs.urlto the test serverGitRepo::add_random(path, size, rng)— creates random binary filesGitRepo::lfs_push()/lfs_pull()— push/pull LFS objectsGitRepo::clean_lfs()— clears.git/lfs/to force re-downloadinit_logger()— sets up tracing for test output
13. Summary
Rudolfs is a well-architected, composable Git LFS server that maps cleanly onto alknet's requirements:
| Criterion | Assessment |
|---|---|
| Protocol correctness | Implements LFS batch API correctly; basic transfer adapter |
| Storage composability | Excellent — decorator pattern allows arbitrary stacking |
| Caching | Solid LRU + disk cache with background writes and corruption resilience |
| Encryption | XChaCha20 streaming, deterministic nonce from OID, optional |
| S3 compatibility | Works with any S3-compatible store (MinIO, rustfs) via AWS_S3_ENDPOINT |
| Auth story | No auth, but reflects headers — perfect for proxy-based auth |
| Embeddability | Good library API via S3ServerBuilder/LocalServerBuilder |
| Maintenance risk | rusoto is unmaintained; need migration path |
| Missing features | Locking API, listing API, rate limiting, multi-instance cache |
Bottom line: Rudolfs is a strong foundation for the LFS layer in the alknet git hosting stack. Its decorator pattern, streaming architecture, and S3 compatibility make it an excellent match for rustfs. The main gaps (auth, locking, rusoto maintenance) are well-understood and have clear integration paths within alknet.