alkdev/hub

Files

glm-5.1 a248698f40 ADR-018: Remove AI SDK, use openai SDK directly with hub-own streaming

Replace the Vercel AI SDK with direct OpenAI SDK calls and a custom
AgentLoop. The AI SDK has zero runtime integration today, so removing
it costs nothing. Supply chain risk (2-5 releases/day, April 2026
Vercel breach, bus factor of 1) makes it a liability we don't need.

Key changes:
- ADR-018 accepted: openai package (zero runtime deps) replaces ai SDK
- AgentLoop handles multi-step tool execution explicitly (~300 LOC vs
  AI SDK's ~2700 LOC streamText)
- Hub owns UIMessage/UIPart/ToolCallState types (extends ADR-016)
- Hub owns streaming protocol (subset of AI SDK's UIMessageChunk wire
  format with step boundaries, error handling, usage tracking)
- operationToOpenAITool() maps TypeBox schemas directly, no adapter
- Trade-off: ~1100 LOC total new code for the savings of 6+ transitive
  deps, supply chain risk, and release cadence coupling

Updates AGENTS.md constraints and dependencies, adds OQ-63/OQ-64/OQ-65
and Theme 11 (Inference & LLM Integration) to open questions.

2026-05-26 08:55:52 +00:00

44 KiB

Raw Blame History

status, last_updated

status	last_updated
reviewed	2026-05-26

Open Questions Tracker

Cross-cutting compilation of all unresolved questions across the hub architecture documents, organized by theme. Questions that appear in multiple documents are unified here with cross-references.

How to Use This Document

Each question has an ID (e.g., OQ-01), status, origin (which doc(s)), and priority assessment
Cross-references link related questions that may conflict or answer each other
When a question is resolved, update its status to resolved and add a resolution note
Once all questions in a theme are resolved, the theme section can be removed

Theme	Questions	Focus
1. Authentication & Authorization	OQ-01–OQ-05	Auth models, permissions, SSO
2. Spoke Connectivity & Lifecycle	OQ-06–OQ-11, OQ-61–OQ-62	Spoke provisioning, WebSocket, concurrent ops, dev spoke
3. Data Integrity & Lifecycle	OQ-12–OQ-15	Deletion, retention, truncation, FK enforcement
4. Session & Schema Design	OQ-16–OQ-19	Message schema, compaction, versioning, nesting
5. Configuration & Infrastructure	OQ-20–OQ-25	Config reload, CI/CD, SSL, tokenEnv, secret refs
6. Role & Identity Management	OQ-26–OQ-29	Role sync, inheritance, dynamic creation, switching
7. Task Management	OQ-30–OQ-33	Task storage, bulk updates, embeddings
8. Deployment & Operations	OQ-34–OQ-37	Migrations, hot spare, observability, Redis topology
9. Cross-Cutting Implementation Gaps	OQ-38–OQ-50	Startup, config, logger, Gitea, keypal, auth, schemas
10. Future / Low Priority	OQ-51–OQ-60	Phase 3+, memory, versioning, visualization
11. Inference & LLM Integration	OQ-63–OQ-65	Streaming protocol, SDK choice, part persistence

Resolved by ADRs

ADR	Questions Resolved
ADR-014	OQ-21, OQ-22, OQ-34, OQ-35, OQ-36, OQ-37
ADR-015	OQ-16, OQ-17, OQ-26, OQ-28, OQ-51, OQ-55
ADR-016	OQ-18, OQ-19 (confirmed)
ADR-017	OQ-26, OQ-28, OQ-51 (overlaps with ADR-015)
ADR-018	OQ-16 (extended to TypeScript types)

Theme 1: Authentication & Authorization

OQ-01: What is the API authentication model?

Origin: hub-architecture.md OQ-2
Status: open
Priority: high — blocks all authenticated endpoints
Question: Should the hub use API keys with the keypal pattern, a simpler token auth stopgap, or something else? This affects every authenticated endpoint in the system.
Cross-references: OQ-02 (WebSocket auth), OQ-43 (MCP auth)

OQ-02: How does WebSocket authentication work for spoke connections?

Origin: spoke-runner.md OQ-4
Status: open
Priority: high — blocks all spoke connections
Question: Should the spoke authenticate via token in the first message after connect, token in the query string, or token in the subprotocol header? This also affects the SpokeConfig.auth format — the config system currently supports tokenFile but the actual auth protocol is undefined.
Cross-references: OQ-01 (API auth model), OQ-46 (spoke config auth field), infrastructure.md Security section

OQ-03: How are permissions enforced at the call protocol layer?

Origin: agent-roles.md OQ-2
Status: resolved
Priority: high
Resolution: OperationContext.identity carries the resolved permissions from sessions.data.scope. The CallHandler evaluates AccessControl.requiredScopes against the session's resolved scope. The principal-agent framework ensures delegated permissions are properly intersected.

OQ-04: How are service accounts provisioned?

Origin: agent-roles.md OQ-6
Status: narrowed
Priority: medium
Question: Does the hub need a hub.createAccount operation for programmatic service account creation, or is manual creation (with keypal CLI) sufficient for v1? LLM-specific email conventions (e.g., glm-5.1@alk.dev) are deployment-specific, not core architecture. Git attribution for LLM accounts uses the account's giteaUsername — this is a config concern, not an auth architecture concern.
Narrowed by: ADR-017 — LLM accounts are service accounts with specific scopes, same pattern as any other automated identity. V1: manual creation. Future: hub.createAccount operation.

OQ-05: Should the hub integrate with git providers via SSO?

Origin: hub-architecture.md OQ-3
Status: narrowed
Priority: low
Question: Originally: should api.alk.dev share sessions with Gitea? Narrowed: git provider integration (Gitea, GitHub, etc.) is a spoke concern via operations, not through SSO. The dev spoke exposes git operations. SSO with any specific git provider is out of scope for a generalized hub. For v1, git access is through the dev spoke's git operations, not shared auth.
Narrowed by: ADR-015 and ADR-017

Theme 2: Spoke Connectivity & Lifecycle

OQ-06: How does a spoke receive its project context?

Origin: spoke-runner.md OQ-1
Status: open
Priority: high — blocks spoke provisioning
Question: Does the hub tell the spoke which git repo to clone, or does the spoke come pre-configured with a project? For the dev spoke specifically: the spoke binary connects to the hub, receives project/workspace context via hub.register, and clones/checks out the relevant repo. The exact protocol needs specification.
Cross-references: OQ-07 (source sync), OQ-61 (dev spoke operations)

OQ-07: How does source sync work for external compute?

Origin: spoke-runner.md OQ-3
Status: open
Priority: medium
Question: For GPU compute spokes on vast.ai — does the spoke clone from Gitea automatically, or does the hub push source to it?
Cross-references: OQ-06 (project context)

OQ-08: Can a spoke handle concurrent operations?

Origin: spoke-runner.md OQ-5
Status: narrowed
Priority: medium → low
Question: Originally: can a spoke handle multiple call.requested events concurrently? Narrowed: the hub doesn't decide this — the spoke does. The hub dispatches call.requested and the spoke processes it. A spoke can process concurrently (multiple handlers) or serially (queue). The only hub-side question is whether the hub should respect a spoke-advertised concurrency limit in its hub.register payload (default: 1). This is a minor spoke registration enhancement, not an architectural question.
Cross-references: OQ-09 (operation list freshness)

OQ-09: When does a spoke re-register its operations?

Origin: spoke-runner.md OQ-6
Status: narrowed
Priority: low
Resolution: For v1, re-register on reconnect only. Spokes disconnect and reconnect (the call protocol handles abort cascading for in-flight calls). Push-based registry updates are a v2 enhancement. The hub.register call on reconnect is sufficient for v1.

OQ-10: What is the design for the hub-side WebSocket handler?

Origin: spoke-runner.md Hub-Side WebSocket Handling section
Status: open
Priority: high — blocks all spoke functionality
Question: What is the full design for the hub-side WebSocket handler? This includes: Hono WebSocket upgrade handler, per-connection WebSocketEventTarget, per-connection PendingRequestMap, spoke lifecycle management (connect/register/heartbeat/disconnect), identity/authentication integration, and reconnection state recovery. Currently described as "an architectural task that needs deeper design" with no spec.
Cross-references: OQ-02 (WebSocket auth), OQ-06 (spoke project context — constrains handler message types), OQ-08 (concurrent operations)

OQ-11: Dev spoke and compute spoke lifecycle

Origin: spoke-runner.md OQ-2, hub-architecture.md Components table
Status: narrowed
Priority: low
Question: Originally: "Container spoke extends base spoke with Docker container lifecycle management and opencode integration." Narrowed by ADR-015: the dev spoke is a compiled Deno binary (not an opencode container) that exposes dev operations over the standard call protocol. Compute spokes (GPU, vast.ai) are separate spoke types. The container spoke concept is deferred — the dev spoke replaces it for v1.
Cross-references: OQ-06 (project context), OQ-61 (dev spoke operations)

OQ-61: What operations does the dev spoke expose?

Origin: ADR-015
Status: open
Priority: medium
Question: The dev spoke replaces opencode's tool suite. What operations does it expose? Minimum: dev.bash.exec, dev.fs.read, dev.fs.write, dev.fs.list, dev.git.status, dev.git.diff, dev.git.commit, dev.git.clone, dev.git.checkout. Web search may be a hub-native operation or a separate spoke. The exact operation set and their input/output schemas need specification.
Cross-references: OQ-06 (project context), OQ-11 (dev spoke lifecycle)

OQ-62: How is the dev spoke distributed and configured?

Origin: ADR-015
Status: open
Priority: medium
Question: The dev spoke is a compiled Deno binary. How is it distributed (Docker image, binary download, package manager)? How is it configured (hub URL, auth token, project context)? Does it use SpokeConfig from hub-config.md or a separate config format? The spoke doesn't have Postgres or Redis — just a WebSocket connection to the hub and local tools.
Cross-references: OQ-06 (project context), OQ-46 (spoke config auth field)

Theme 3: Data Integrity & Lifecycle

OQ-12: Operation deletion and call graph referential integrity

Origin: call-graph.md OQ-1, storage/spokes.md OQ-1
Status: open
Priority: high — blocks operation lifecycle management
Question: The call_graph_nodes.operationId column has a RESTRICT FK to operations.id. An operation cannot be deleted while any call records reference it. Two strategies proposed: (a) deny removal while call records exist, or (b) reassign referencing call records to a sentinel __removed__ operation. Making operationId nullable in flowgraph's CallNodeAttrs is another option. This needs coordination with the @alkdev/flowgraph package.
Cross-references: OQ-13 (call graph retention interacts with deletion constraints)

OQ-13: Call graph retention policy

Origin: storage/call-graph.md Retention Policy section, storage/README.md OQ-3
Status: open
Priority: medium
Question: Need TTL-based cleanup of completed/failed call graph records older than N days, with aggregation for observability. Default 90 days is specified but no config field exists yet. Aggregation for observability (dashboards, metrics) is deferred to Phase 2.
Cross-references: OQ-12 (operation deletion)

OQ-14: Call graph payload truncation strategy

Origin: storage/call-graph.md
Status: open
Priority: medium
Question: Strategy defined (10KB threshold, { _truncated } format) but no config field exists for the threshold. Payload redaction strategy also needs config fields. Object storage for payloads exceeding the truncation threshold is Phase 2 and not yet implemented.
Cross-references: OQ-13 (retention policy)

OQ-15: Polymorphic FK enforcement for `providerId`

Origin: storage/spokes.md OQ-2
Status: open
Priority: medium
Question: providerId in spokes table references different parent tables depending on spokeType (either dev_env_spokes or compute_spokes). Current approach is application-layer enforcement. Alternatives (two nullable FK columns, DB triggers) are deferred.

Theme 4: Session & Schema Design

OQ-16: Session/message schema finalization

Origin: agent-sessions.md Schema Research Needed section, storage/sessions.md
Status: resolved by ADR-016
Priority: high → medium (unblocked, narrower scope)
Resolution: The hub defines its own canonical message/part format based on AI SDK UIMessage + parts. Opencode's format is an import concern, not an architectural constraint. The hub's format stays close to opencode's for import compatibility but is self-determined. The remaining design work is specifying the hub's exact part types and session data shapes — this is an implementation task, not an open architectural question.
Cross-references: OQ-17 (compaction), OQ-19 (part nesting)

OQ-17: Session message compaction

Origin: agent-sessions.md, storage/README.md OQ-2
Status: resolved by ADR-016
Priority: medium → low
Resolution: Compaction is an opencode concept (LLM-driven summarization). The hub may need message pruning (server-side truncation of old messages for API response size), but this is different from compaction. For v1, full message history is served. Pruning is a potential future optimization, not a current design concern.

OQ-18: Message data versioning

Origin: storage/README.md OQ-1
Status: resolved by ADR-016
Priority: medium
Resolution: The hub's data JSONB columns are implicitly versioned by their TypeBox schema evolution. No separate version column is needed. Each schema change is documented in migration history, and the TypeBox schemas in the codebase are the source of truth. Opencode's version column was an opencode concern, not a hub pattern.

OQ-19: Part nesting

Origin: storage/sessions.md
Status: resolved by ADR-016
Priority: low
Resolution: Flat parts with messageId FK for v1. If nesting becomes necessary for the hub's own use cases (e.g., tool results containing sub-parts), a parentId column can be added. No need to carry opencode's tree structure.

Theme 5: Configuration & Infrastructure

OQ-20: Config reload without restart

Origin: hub-config.md OQ-1, hub-startup.md OQ-2
Status: open
Priority: medium
Question: For non-encrypted fields (logLevel, cache TTLs), should SIGHUP or an API call trigger re-reading the config file? Encrypted fields would need the master key to remain in memory, which the current design explicitly avoids after startup. Currently config is read-once; changes require restart.
Cross-references: OQ-21 (CI/CD config generation), OQ-22 (config layers)

OQ-21: Config file generation for CI/CD

Origin: hub-config.md OQ-2
Status: resolved by ADR-014
Priority: high → resolved
Resolution: In the Docker deployment model, config files are pre-encrypted by the operator (using alkhub-config) and mounted at runtime. The Docker secret provides the master key. CI/CD doesn't need the master key — it doesn't encrypt. Config files are built into the Docker image or mounted as volumes. The alkhub-config CLI runs on the operator's machine, not in CI/CD.

OQ-22: Multiple config file layers

Origin: hub-config.md OQ-4
Status: resolved by ADR-014
Priority: low → resolved
Resolution: Docker Compose handles environment variation via different config files mounted at different paths. Dev: local decrypted config. Prod: pre-encrypted config mounted as read-only volume. No overlay system needed — the Docker model makes this straightforward with volume mounts.

OQ-23: What are the production SSL/TLS requirements for PostgresConfig?

Origin: hub-config.md PostgresConfig section
Status: resolved by ADR-014
Priority: medium → low
Resolution: In the Docker deployment model, Postgres and the hub run in the same Docker network. TLS between containers in the same network is not required — Docker network policies handle isolation. TLS termination happens at the reverse proxy (nginx/caddy) for external traffic. PostgresConfig.ssl: boolean is sufficient for v1. If a future deployment topology puts Postgres on a different network, a PostgresSslConfig object can be added, but same-network Docker deployment doesn't need it.

OQ-24: HTTPServiceConfig.auth.tokenEnv deprecation

Origin: hub-config.md, operations.md
Status: open
Priority: high — security violation
Question: HTTPServiceConfig.auth.tokenEnv is deprecated and should be removed. The from_openapi.ts line Deno.env.get(config.auth.tokenEnv) is a bug that violates the "no secrets in env vars" rule. All outbound auth tokens should be resolved from client_secrets via secretKey wiring. This needs to be removed and replaced before any production use.

OQ-25: Secret reference resolution ordering

Origin: hub-config.md OQ-7
Status: open
Priority: medium
Question: Should resolveSecretRefs fail at startup if a referenced secretKey doesn't exist in client_secrets yet? Current preference: fail at startup for clients that are enabled: true. If a client is disabled, the missing secret is logged as a warning and left unresolved.

Theme 6: Role & Identity Management

OQ-26: Role import/sync operation

Origin: agent-roles.md OQ-1, storage/README.md OQ-9 (partial), storage/roles.md
Status: resolved by ADR-017
Priority: medium → resolved
Resolution: The hub is database-first for roles from day one. No roles.sync from .opencode/agents/*.md is needed. Role definitions are seeded by migrations for built-in roles. Custom roles are created via hub.createRole. Opencode's .opencode/agents/ file format is an opencode concern, not a hub concern.

OQ-27: Role inheritance with permission resolution

Origin: agent-roles.md OQ-8
Status: open
Priority: medium
Question: When a role has a parentId, its permissions are unioned with the parent's, with the child's rules taking priority in case of conflict. Max depth: 3 levels. Circular inheritance is prevented at role creation time. The description exists but the implementation is not yet specified.
Cross-references: OQ-26 (role sync — resolved, but inheritance still needs implementation)

OQ-28: Dynamic role creation

Origin: agent-roles.md OQ-3
Status: resolved by ADR-017
Priority: low → resolved
Resolution: The hub supports hub.createRole for programmatic role creation. Opencode's Agent.generate() pattern (on-the-fly LLM-driven role creation) is not a hub concern. Roles are DB records, created via hub operations.

OQ-29: Per-session role switching

Origin: agent-roles.md OQ-4
Status: narrowed
Priority: medium
Question: Originally: should a session be able to change roles mid-conversation, like opencode supports? Narrowed by ADR-015: this is a hub-only concern. The hub binds role at session creation. session.updateRole is a potential operation if needed, but v1 roles are bound at creation. No opencode agent model to reconcile.

Theme 7: Task Management

OQ-30: Task storage and sync implementation

Origin: storage/README.md OQ-9
Status: open
Priority: high
Question: The database is the source of truth for tasks; markdown files are the authoring surface. The sync operation (files → database) exists conceptually but is not yet implemented. This blocks the SDD workflow from using database-backed task tracking.
Cross-references: OQ-26 (role sync — resolved, similar pattern)

OQ-31: Bulk task status updates

Origin: storage/tasks.md OQ-2
Status: open
Priority: medium
Question: Should completing a meta task auto-mark all sub-tasks as completed? Likely yes, but this is application-level logic that needs implementation.

OQ-32: Cross-project task dependencies

Origin: storage/tasks.md OQ-3
Status: open
Priority: low
Question: Not supported for v1. Application-layer validation prevents cross-project references. DB trigger guard deferred to Phase 2.

OQ-33: Task embeddings

Origin: storage/tasks.md OQ-1
Status: open
Priority: low
Question: Vector embeddings for similarity search. metadata JSONB can hold an embedding reference later, or a separate task_embeddings table can be added. Deferred.

Theme 8: Deployment & Operations

OQ-34: Background migration vs. startup migration

Origin: hub-startup.md OQ-1
Status: resolved by ADR-014
Priority: medium → resolved
Resolution: Single-container deployment means migrations must complete before the hub serves. Background migration requires schema version negotiation that adds complexity without benefit. Migrations block startup. Docker's restart policy handles failures.

OQ-35: Hot spare / zero-downtime restart

Origin: hub-startup.md OQ-3
Status: resolved by ADR-014
Priority: low → resolved
Resolution: v1 is single-container deployment with Docker restart policy. Zero-downtime restart requires connection draining and session transfer, which is Phase 2. For v1, Docker restart with health checks is sufficient.

OQ-36: Startup observability

Origin: hub-startup.md OQ-4
Status: resolved by ADR-014
Priority: low → resolved
Resolution: The /health endpoint with step-level progress and structured JSON logging to stdout is sufficient. Docker's HEALTHCHECK directive and log aggregation handle the rest. No pub/sub startup events needed.

OQ-37: Redis deployment topology

Origin: hub-architecture.md OQ-1
Status: resolved by ADR-014
Priority: medium → resolved
Resolution: Redis runs in the same Docker network as the hub. Spokes connect via WebSocket, not Redis. Redis is hub-internal only. Latency between hub and Redis is negligible within the same Docker network. If a future topology needs Redis closer to remote spokes, that would be a spoke-level concern (a spoke-side Redis), not the hub's Redis.

Theme 9: Cross-Cutting Implementation Gaps

OQ-38: Hub startup implementation

Origin: hub-startup.md — full startup sequence spec, no implementation yet
Status: open
Priority: high — blocks all functionality
Question: src/main.ts and startHub() are not yet implemented. The full 11-step startup sequence is specified in hub-startup.md. This is the single most blocking implementation gap.
Cross-references: OQ-20 (config reload), OQ-24 (tokenEnv deprecation)

OQ-39: Hub-specific config in operations package

Origin: operations.md Known Gaps
Status: open
Priority: high — blocks hub startup
Question: core/config/types.ts in the operations package has spoke-only config. Hub-specific config (postgres, redis, auth) needs to be added. This overlaps with the hub-config.md spec but the actual code doesn't exist yet.
Cross-references: OQ-38 (startup implementation)

OQ-40: Logger configuration

Origin: operations.md Known Gaps
Status: open
Priority: medium
Question: core/logger/mod.ts is a stub that only logs the ["logtape", "meta"] category. Needs proper config for app-level loggers. Hub startup Step 3 configures the logger, but the implementation is stub-level.

OQ-41: Gitea operations at startup

Origin: storage/README.md OQ-7
Status: narrowed
Priority: medium
Question: Originally: load Gitea swagger spec at startup and register ~300 operations via FromOpenAPI. Narrowed by ADR-015: Gitea integration is no longer a core hub dependency. Git operations come from the dev spoke (or a separate Gitea spoke via FromOpenAPI). Loading Gitea's OpenAPI spec at startup is optional — a future spoke can provide it. For v1, git operations are exposed through the dev spoke, not hub-native Gitea operations.

OQ-42: Keypal adapter testing

Origin: storage/README.md OQ-4
Status: open
Priority: medium
Question: HubKeyStorage (the Drizzle adapter for keypal) needs comprehensive tests before production use.

OQ-43: MCP endpoint authentication detail

Origin: mcp-server.md Auth section
Status: open
Priority: medium
Question: "The MCP endpoint uses bearer token auth. Each runner gets a token at registration." No detail on token format, rotation, issuance, or how tokens are validated. This connects to OQ-01 (API auth model) and OQ-02 (WebSocket auth).
Cross-references: OQ-01 (API auth model), OQ-02 (WebSocket auth)

OQ-44: Reactive vs. call graph `requested` semantics

Origin: call-graph.md OQ-2
Status: open
Priority: medium
Question: In FlowGraph, call.requested creates a node in pending state. In WorkflowReactiveRoot, call.requested maps to NodeStatus.running. This is a deliberate semantic difference — the reactive model tracks execution progress while the call graph model tracks protocol state. But implementers must be aware that feeding the same event to both models produces different initial statuses.

OQ-45: Client config schema evolution

Origin: storage/README.md OQ-8
Status: open
Priority: medium
Question: Existing DB rows in clients.config may fail validation if the TypeBox schema changes. Using Type.Optional() for new fields helps, but breaking changes need a strategy. Full contract migration protocol is a pending task.

OQ-46: Spoke auth field format in config

Origin: hub-config.md OQ-3
Status: open
Priority: high
Question: The SpokeConfig.auth field format is blocked on the spoke-runner WebSocket auth design (OQ-02). Config system supports tokenFile but actual protocol is TBD. The dev spoke (ADR-015) will use tokenFile to read its auth token from a Docker secret or mounted file.
Cross-references: OQ-02 (WebSocket auth), OQ-62 (dev spoke distribution)

OQ-47: Config schema version

Origin: hub-config.md OQ-5
Status: open
Priority: low
Question: BaseConfig.$schema is optional. alkhub-config init should generate it. Implementation detail — doesn't block anything but supports forward compatibility and editor validation.

OQ-48: Cross-doc terminology migration

Origin: storage/README.md OQ-5
Status: open
Priority: low
Question: The ADR-005 rename from "runner" to "spoke" is done in primary specs but "runner/runnerId" references still exist in other architecture docs. Need updating for consistency.

OQ-49: ADR-012 migration

Origin: ADR-012 — Proposed, not Accepted
Status: open
Priority: medium
Question: ADR-012 proposes terminology changes (sessions.agentName → roleName, etc.). The ADR is in "Proposed" status, not "Accepted". The storage docs already use "role" terminology, but the rename needs a migration plan and the ADR needs to be accepted or rejected.

OQ-50: Key rotation background sweep implementation

Origin: storage-spec-phase1-resolutions.md D4
Status: open
Priority: high
Question: Task specify-key-rotation-protocol addresses key rotation, and the protocol is described in storage/services.md, but the background sweep implementation (cron job that re-encrypts client_secrets rows with the current key version) is not yet implemented.
Cross-references: OQ-24 (tokenEnv deprecation — more secrets flow through client_secrets after this), OQ-25 (secret reference resolution — sweep depends on correct secret ref wiring)

Theme 10: Future / Low Priority

OQ-51: Role database-authoritative (Phase 3)

Origin: agent-roles.md Phase 3, storage/roles.md
Status: resolved by ADR-017
Priority: low → resolved
Resolution: The hub is database-first from day one. There is no Phase 1 (file-based) or Phase 2 (file sync). Roles are defined in the roles table from the start, seeded by migrations. Markdown files are not part of the hub's role system.

OQ-52: Memory across sessions

Origin: agent-roles.md OQ-7
Status: deferred
Priority: low
Question: Should LLM accounts have persistent memory across sessions? This is separate from session message history. Could be a memories table or vector store. Deferred — separate feature with no current requirement.

OQ-53: Task versioning

Origin: storage/tasks.md OQ-4
Status: open
Priority: low
Question: Should previous versions of task body be kept? Decision for v1: no versioning, just update in place.

OQ-54: High-contention task notes

Origin: storage/tasks.md
Status: open
Priority: low
Question: DB-level concatenation for append is specified, but consider separating into task_notes table for high-contention scenarios.

OQ-55: Anthropic conversation import

Origin: storage/README.md OQ-6
Status: deferred by ADR-015
Priority: low
Question: Import script for Anthropic conversations. This is a nice-to-have research tool for re-importing past conversations, not a core feature. The format has likely changed since it was last relevant. Not shipped with the codebase.

OQ-56: ADR-013 out-of-scope items

Origin: ADR-013
Status: open
Priority: low
Question: Several items explicitly out of scope for ADR-013: bidirectional Zod ↔ TypeBox sync, runtime schema migration, auto-generation of TypeScript types from wire schemas, converting Zod .transform() / .pipe() output types. May revisit if needed.

OQ-57: Call graph visualization

Origin: call-graph.md What We Defer #2
Status: open
Priority: low
Question: API only, no Sigma.js UI for v1.

OQ-58: Stream deduplication

Origin: call-graph.md What We Defer #3
Status: open
Priority: medium
Question: Value.Hash({operationId, input}) deduplication for multiple subscribers to the same stream. May be needed for subscription scalability.

OQ-59: `requested_by` edge in flowgraph

Origin: call-graph.md What We Defer #4
Status: open
Priority: low
Question: The requested_by edge type is a storage-layer concept for identity tracing. It's persisted in call_graph_edges but not modeled in @alkdev/flowgraph's CallEdgeAttrs. May be added to flowgraph in the future.

OQ-60: Full ujsx call templates

Origin: call-graph.md What We Defer #1
Status: open
Priority: low
Question: Currently using hardcoded workflow sequences. @alkdev/flowgraph/component provides Operation, Sequential, Parallel, Conditional, Map components for declarative template definition. Will adopt when workflow complexity justifies it.

Theme 11: Inference & LLM Integration

OQ-63: What is the exact subset of UIMessageChunk types the hub proxy emits?

Origin: ADR-018
Status: open
Priority: medium
Question: ADR-018 defines an initial subset of AI SDK's UIMessageChunk protocol for the hub's SSE streaming format. The initial set covers text, reasoning, tool call lifecycle, step boundaries, and error events. As features are added (e.g., source URLs, file attachments, dynamic tools), new chunk types need to be specified. Should the hub define a formal schema for its streaming protocol, or document it informally? How do we version the protocol if chunk types change?
Cross-references: OQ-64 (raw HTTP vs SDK streaming)

OQ-64: Should the direct agent use the openai SDK's streaming API or raw HTTP?

Origin: ADR-018
Status: open
Priority: low
Question: The direct agent path can use the openai SDK's typed streaming API (client.chat.completions.create({ stream: true })) or raw HTTP for more control over SSE parsing. The SDK provides convenience (typed responses, automatic tool call accumulation) but adds abstraction. The proxy path must use raw HTTP (Hono SSE handler). Should both paths use the same approach for consistency, or is it acceptable to use the SDK for the direct agent and raw HTTP for the proxy?
Cross-references: OQ-63 (streaming protocol)

OQ-65: What is the buffered write strategy for part persistence?

Origin: ADR-018
Status: open
Priority: medium
Question: Streaming LLM responses produce many part updates (text deltas, state transitions, tool call results). Writing each delta as a separate database write would be extremely expensive. Options: (a) flush on *-end events (per-part commits — text parts committed when done, tool parts committed when complete), (b) flush on step-finish (per-step commits — all parts in a step committed together), (c) flush on finish (per-message commits — all parts committed when the agent turn is complete). Per-part (a) balances latency and write volume best for real-time SSE updates.
Cross-references: OQ-63 (streaming protocol defines when *-end events fire)

Cross-Cutting Dependencies

These questions block each other or share resolution paths:

API Auth Chain: OQ-01 → OQ-02 → OQ-43 → OQ-46 — The API auth model determines WebSocket auth, which determines MCP auth and spoke config format. Resolve top-down.
Spoke Connectivity Chain: OQ-06 → OQ-10 — Spoke provisioning can't work without the hub-side WebSocket handler. Resolve OQ-10 first.
Implementation Bootstrap: OQ-38 → OQ-39 → OQ-40 — Hub startup implementation needs hub config types and proper logger config. These are the minimum viable path to a running hub.
Config Security Chain: OQ-24 → OQ-25 → OQ-50 — Token env deprecation and secret reference resolution are intertwined. OQ-24 must be resolved (remove tokenEnv) before OQ-25 can be validated. After OQ-25, the key rotation background sweep (OQ-50) becomes more important because more secrets flow through client_secrets.
Data Lifecycle Chain: OQ-12 → OQ-13 → OQ-14 — Operation deletion strategy, call graph retention, and payload truncation interact. OQ-12 determines whether operations can be removed at all.
Inference Chain: OQ-63 → OQ-64, OQ-65 — The streaming protocol subset (OQ-63) determines what the direct agent and proxy need to produce. The SDK vs. raw HTTP choice (OQ-64) and the persistence strategy (OQ-65) depend on the protocol definition.

Summary Table

ID	Question	Origin	Priority	Status
OQ-01	API authentication model	hub-architecture	high	open
OQ-02	WebSocket auth for spokes	spoke-runner	high	open
OQ-03	Permission enforcement at call protocol	agent-roles	high	resolved
OQ-04	Service account provisioning	agent-roles	medium	narrowed
OQ-05	Git provider SSO integration	hub-architecture	low	narrowed
OQ-06	Spoke project context	spoke-runner	high	open
OQ-07	Source sync for external compute	spoke-runner	medium	open
OQ-08	Concurrent spoke operations	spoke-runner	low	narrowed
OQ-09	Spoke operation list freshness	spoke-runner	low	narrowed
OQ-10	Hub-side WebSocket handler design	spoke-runner	high	open
OQ-11	Dev spoke and compute spoke lifecycle	spoke-runner, hub-architecture	low	narrowed
OQ-12	Operation deletion vs. call graph FK	call-graph, storage/spokes	high	open
OQ-13	Call graph retention policy	storage/call-graph, storage/README	medium	open
OQ-14	Call graph payload truncation config	storage/call-graph	medium	open
OQ-15	Polymorphic FK for `providerId`	storage/spokes	medium	open
OQ-16	Session/message schema finalization	agent-sessions, storage/sessions	medium	resolved (ADR-016)
OQ-17	Session message compaction	agent-sessions, storage/README	low	resolved (ADR-016)
OQ-18	Message data versioning	storage/README	medium	resolved (ADR-016)
OQ-19	Part nesting	storage/sessions	low	resolved (ADR-016)
OQ-20	Config reload without restart	hub-config, hub-startup	medium	open
OQ-21	CI/CD config generation	hub-config	high	resolved (ADR-014)
OQ-22	Multiple config file layers	hub-config	low	resolved (ADR-014)
OQ-23	PostgresConfig SSL details	hub-config	low	resolved (ADR-014)
OQ-24	HTTPServiceConfig.auth.tokenEnv deprecation	hub-config, operations	high	open
OQ-25	Secret reference resolution ordering	hub-config	medium	open
OQ-26	Role import/sync operation	agent-roles, storage/README, storage/roles	medium	resolved (ADR-017)
OQ-27	Role inheritance with permission resolution	agent-roles	medium	open
OQ-28	Dynamic role creation	agent-roles	low	resolved (ADR-017)
OQ-29	Per-session role switching	agent-roles	medium	narrowed
OQ-30	Task storage and sync implementation	storage/README	high	open
OQ-31	Bulk task status updates	storage/tasks	medium	open
OQ-32	Cross-project task dependencies	storage/tasks	low	open
OQ-33	Task embeddings	storage/tasks	low	open
OQ-34	Background vs. startup migration	hub-startup	medium	resolved (ADR-014)
OQ-35	Hot spare / zero-downtime restart	hub-startup	low	resolved (ADR-014)
OQ-36	Startup observability	hub-startup	low	resolved (ADR-014)
OQ-37	Redis deployment topology	hub-architecture	medium	resolved (ADR-014)
OQ-38	Hub startup implementation	hub-startup	high	open
OQ-39	Hub-specific config in operations package	operations	high	open
OQ-40	Logger configuration	operations	medium	open
OQ-41	Gitea operations at startup	storage/README	medium	narrowed
OQ-42	Keypal adapter testing	storage/README	medium	open
OQ-43	MCP endpoint authentication detail	mcp-server	medium	open
OQ-44	Reactive vs. call graph requested semantics	call-graph	medium	open
OQ-45	Client config schema evolution	storage/README	medium	open
OQ-46	Spoke auth field format in config	hub-config	high	open
OQ-47	Config schema version	hub-config	low	open
OQ-48	Cross-doc terminology migration	storage/README	low	open
OQ-49	ADR-012 migration	decisions/ADR-012	medium	open
OQ-50	Key rotation background sweep	decisions/storage-spec-phase1	high	open
OQ-51	Role database-authoritative (Phase 3)	agent-roles, storage/roles	low	resolved (ADR-017)
OQ-52	Memory across sessions	agent-roles	low	deferred
OQ-53	Task versioning	storage/tasks	low	open
OQ-54	High-contention task notes	storage/tasks	low	open
OQ-55	Anthropic conversation import	storage/README	low	deferred
OQ-56	ADR-013 out-of-scope items	decisions/ADR-013	low	open
OQ-57	Call graph visualization	call-graph	low	open
OQ-58	Stream deduplication	call-graph	medium	open
OQ-59	`requested_by` edge in flowgraph	call-graph	low	open
OQ-60	Full ujsx call templates	call-graph	low	open
OQ-61	Dev spoke operations	ADR-015	medium	open
OQ-62	Dev spoke distribution and config	ADR-015	medium	open
OQ-63	Hub proxy SSE chunk type subset	ADR-018	medium	open
OQ-64	Direct agent: openai SDK vs raw HTTP	ADR-018	low	open
OQ-65	Part persistence buffered write strategy	ADR-018	medium	open

High Priority Open Questions (Blocking)

These questions block core functionality and should be resolved first:

ID	Question	Blocks
OQ-01	API authentication model	All authenticated endpoints
OQ-02	WebSocket auth for spokes	All spoke connections
OQ-06	Spoke project context	Spoke provisioning
OQ-10	Hub-side WebSocket handler design	All spoke functionality
OQ-12	Operation deletion vs. call graph FK	Operation lifecycle
OQ-24	HTTPServiceConfig.auth.tokenEnv deprecation	Security (env var leak)
OQ-38	Hub startup implementation	All functionality
OQ-39	Hub-specific config in operations package	Hub startup
OQ-46	Spoke auth field format in config	Spoke config
OQ-50	Key rotation background sweep	Production secret management

Resolution Priority Order

Suggested order for resolving the high-priority questions, based on dependency chains:

OQ-38 + OQ-39 — Hub startup implementation + config types (enables everything)
OQ-01 — API auth model (unblocks OQ-02, OQ-43, OQ-46)
OQ-02 — WebSocket auth (unblocks OQ-10, OQ-46)
OQ-10 — Hub-side WebSocket handler (enables spokes)
OQ-24 — tokenEnv deprecation (security fix)
OQ-12 — Operation deletion strategy (data integrity)
OQ-06 — Spoke project context (spoke provisioning)
OQ-50 — Key rotation sweep (production secret management)

Questions Resolved by ADRs

ADR	Questions Resolved	Key Decisions
ADR-014	OQ-21, OQ-22, OQ-23, OQ-34, OQ-35, OQ-36, OQ-37	Docker as primary deployment; Redis/Postgres same network; config via mounted volumes; single-container restart; migrations block startup
ADR-015	OQ-16, OQ-17, OQ-26, OQ-28, OQ-51, OQ-55	Dev spoke replaces opencode integration; hub owns session format; opencode compat is import tool; opencode-spoke is optional future
ADR-016	OQ-18, OQ-19	Hub defines own canonical schema; JSONB implicitly versioned; flat parts for v1; import is compat tool
ADR-017	OQ-26, OQ-28, OQ-51	Database-first roles; seeded by migrations; `hub.createRole` for custom roles; no `.opencode/agents/` file sync

44 KiB Raw Blame History Unescape Escape