Update four existing specs (overview, server, napi-and-pubsub, call-protocol) to reflect Phase 0 decisions: three-layer model, IdentityProvider, ForwardingPolicy, OperationEnv, static/dynamic config split. Review all 9 Phase 0a ADRs (026-034) for consistency. Fix 4 critical issues from architecture review: missing OQ-SVC-05 in open-questions.md, deprecated hub terminology, undefined AuthService and noq terms. Replace inline OQ text with cross-references per format rules. Add ConfigServiceImpl definition to configuration.md. Port absolute workspace paths to project-relative links by copying referenced docs (feasibility, certbot, fail2ban, event_source_types) into docs/research/.
2.5 KiB
ADR-013: Fail2ban-Friendly Server Logging
Status
Accepted
Context
The server needs to handle abuse on public-facing deployments. Our production infrastructure uses fail2ban on Linux (documented in fail2ban.md) with nftables and systemd journal backend. fail2ban needs structured, parseable logs to identify abusive IP addresses.
However, fail2ban is Linux-specific. On other platforms (macOS, Windows, BSD), users need a different approach to reject abusive connections. The server should provide enough logging for fail2ban on Linux and enough built-in protection for other platforms.
Decision
The server logs connection and authentication events at INFO level with structured fields, and provides a configurable connection rate limiter as a built-in defense.
Logging (for fail2ban integration on Linux):
- Log auth attempts:
level=INFO, msg="auth attempt", remote_addr=<ip>, user=<user>, key_fingerprint=<sha256>, result=<accept|reject> - Log new connections:
level=INFO, msg="connection opened", remote_addr=<ip>, transport=<tcp|tls|iroh> - Log disconnections:
level=INFO, msg="connection closed", remote_addr=<ip>, duration=<secs> - Do NOT log: channel open targets, DNS resolutions, bytes transferred
This matches what fail2ban needs: source IP + failure indicator. Our existing fail2ban setup filters on similar fields for SSH and nginx.
Built-in rate limiting (for all platforms):
--max-connections-per-ip <n>(default: 0 = unlimited) — reject new connections from an IP that already has N active connections--max-auth-attempts <n>(default: 10) — disconnect after N failed auth attempts from one connection- Rate limiting happens at the SSH layer, before channels are opened
This ensures that even without fail2ban, the server rejects obviously abusive connections.
Consequences
- Positive: fail2ban can parse alknet logs the same way it parses SSH and nginx logs on our production systems.
- Positive: Built-in rate limiting provides protection on platforms without fail2ban.
- Positive: No privacy-sensitive data in logs (no tunnel destinations).
- Negative: Slightly more code in the server for connection tracking per IP.
- Negative: Users with custom fail2ban filters need to write regex for alknet's log format (documented examples provided).
References
- server.md
- OQ-08 — resolved by this ADR
- Production fail2ban setup: fail2ban.md