Add centralized open questions tracker for hub architecture
60 open questions across 10 themes, synthesized from 20+ architecture and storage documents. Follows the flowgraph open-questions.md pattern with ID tracking, cross-references, dependency chains, and a priority resolution order for the 11 high-priority blocking questions.
This commit is contained in:
637
docs/architecture/open-questions.md
Normal file
637
docs/architecture/open-questions.md
Normal file
@@ -0,0 +1,637 @@
|
|||||||
|
---
|
||||||
|
status: reviewed
|
||||||
|
last_updated: 2026-05-25
|
||||||
|
---
|
||||||
|
|
||||||
|
# Open Questions Tracker
|
||||||
|
|
||||||
|
Cross-cutting compilation of all unresolved questions across the hub architecture documents, organized by theme. Questions that appear in multiple documents are unified here with cross-references.
|
||||||
|
|
||||||
|
## How to Use This Document
|
||||||
|
|
||||||
|
- Each question has an **ID** (e.g., OQ-01), **status**, **origin** (which doc(s)), and **priority** assessment
|
||||||
|
- **Cross-references** link related questions that may conflict or answer each other
|
||||||
|
- When a question is resolved, update its status to `resolved` and add a resolution note
|
||||||
|
- Once all questions in a theme are resolved, the theme section can be removed
|
||||||
|
|
||||||
|
## Table of Contents
|
||||||
|
|
||||||
|
| Theme | Questions | Focus |
|
||||||
|
|-------|-----------|-------|
|
||||||
|
| 1. Authentication & Authorization | OQ-01–OQ-05 | Auth models, permissions, SSO |
|
||||||
|
| 2. Spoke Connectivity & Lifecycle | OQ-06–OQ-11 | Spoke provisioning, WebSocket, concurrent ops |
|
||||||
|
| 3. Data Integrity & Lifecycle | OQ-12–OQ-15 | Deletion, retention, truncation, FK enforcement |
|
||||||
|
| 4. Session & Schema Design | OQ-16–OQ-19 | Message schema, compaction, versioning, nesting |
|
||||||
|
| 5. Configuration & Infrastructure | OQ-20–OQ-25 | Config reload, CI/CD, SSL, tokenEnv, secret refs |
|
||||||
|
| 6. Role & Identity Management | OQ-26–OQ-29 | Role sync, inheritance, dynamic creation, switching |
|
||||||
|
| 7. Task Management | OQ-30–OQ-33 | Task storage, bulk updates, embeddings |
|
||||||
|
| 8. Deployment & Operations | OQ-34–OQ-37 | Migrations, hot spare, observability, Redis topology |
|
||||||
|
| 9. Cross-Cutting Implementation Gaps | OQ-38–OQ-50 | Startup, config, logger, Gitea, keypal, auth, schemas |
|
||||||
|
| 10. Future / Low Priority | OQ-51–OQ-60 | Phase 3+, memory, versioning, visualization |
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Theme 1: Authentication & Authorization
|
||||||
|
|
||||||
|
### OQ-01: What is the API authentication model?
|
||||||
|
|
||||||
|
- **Origin**: [hub-architecture.md](hub-architecture.md) OQ-2
|
||||||
|
- **Status**: open
|
||||||
|
- **Priority**: high — blocks all authenticated endpoints
|
||||||
|
- **Question**: Should the hub use API keys with the keypal pattern, a simpler token auth stopgap, or something else? This affects every authenticated endpoint in the system.
|
||||||
|
- **Cross-references**: OQ-02 (WebSocket auth), OQ-24 (MCP auth)
|
||||||
|
|
||||||
|
### OQ-02: How does WebSocket authentication work for spoke connections?
|
||||||
|
|
||||||
|
- **Origin**: [spoke-runner.md](spoke-runner.md) OQ-4
|
||||||
|
- **Status**: open
|
||||||
|
- **Priority**: high — blocks all spoke connections
|
||||||
|
- **Question**: Should the spoke authenticate via token in the first message after connect, token in the query string, or token in the subprotocol header? This also affects the `SpokeConfig.auth` format — the config system currently supports `tokenFile` but the actual auth protocol is undefined.
|
||||||
|
- **Cross-references**: OQ-01 (API auth model), OQ-18 (spoke config auth field), [infrastructure.md](infrastructure.md) Security section
|
||||||
|
|
||||||
|
### OQ-03: How are permissions enforced at the call protocol layer?
|
||||||
|
|
||||||
|
- **Origin**: [agent-roles.md](agent-roles.md) OQ-2
|
||||||
|
- **Status**: resolved
|
||||||
|
- **Priority**: high
|
||||||
|
- **Resolution**: `OperationContext.identity` carries the resolved permissions from `sessions.data.scope`. The `CallHandler` evaluates `AccessControl.requiredScopes` against the session's resolved scope. The principal-agent framework ensures delegated permissions are properly intersected.
|
||||||
|
|
||||||
|
### OQ-04: How are LLM service accounts provisioned?
|
||||||
|
|
||||||
|
- **Origin**: [agent-roles.md](agent-roles.md) OQ-6
|
||||||
|
- **Status**: open
|
||||||
|
- **Priority**: medium
|
||||||
|
- **Question**: LLM accounts currently require manual creation (e.g., `glm-5.1@alk.dev`). Should there be an automated provisioning flow (`hub.createAccount` operation), or is manual provisioning sufficient for v1?
|
||||||
|
- **Cross-references**: OQ-03 (permission enforcement)
|
||||||
|
|
||||||
|
### OQ-05: Should SSO be shared with Gitea?
|
||||||
|
|
||||||
|
- **Origin**: [hub-architecture.md](hub-architecture.md) OQ-3
|
||||||
|
- **Status**: open
|
||||||
|
- **Priority**: medium
|
||||||
|
- **Question**: Gitea at `git.alk.dev` uses its own auth. Should `api.alk.dev` share sessions with Gitea, or maintain separate auth? This affects user experience but is not blocking for v1.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Theme 2: Spoke Connectivity & Lifecycle
|
||||||
|
|
||||||
|
### OQ-06: How does a spoke receive its project context?
|
||||||
|
|
||||||
|
- **Origin**: [spoke-runner.md](spoke-runner.md) OQ-1
|
||||||
|
- **Status**: open
|
||||||
|
- **Priority**: high — blocks spoke provisioning
|
||||||
|
- **Question**: Does the hub tell the spoke which git repo to clone, or does the spoke come pre-configured with a project? This is fundamental to the spoke provisioning workflow.
|
||||||
|
- **Cross-references**: OQ-07 (source sync)
|
||||||
|
|
||||||
|
### OQ-07: How does source sync work for external compute?
|
||||||
|
|
||||||
|
- **Origin**: [spoke-runner.md](spoke-runner.md) OQ-3
|
||||||
|
- **Status**: open
|
||||||
|
- **Priority**: medium
|
||||||
|
- **Question**: For GPU compute spokes on vast.ai — does the spoke clone from Gitea automatically, or does the hub push source to it?
|
||||||
|
- **Cross-references**: OQ-06 (project context)
|
||||||
|
|
||||||
|
### OQ-08: Can a spoke handle concurrent operations?
|
||||||
|
|
||||||
|
- **Origin**: [spoke-runner.md](spoke-runner.md) OQ-5
|
||||||
|
- **Status**: open
|
||||||
|
- **Priority**: medium
|
||||||
|
- **Question**: Can a spoke handle multiple `call.requested` events concurrently? Concurrent processing is better for SUBSCRIPTION operations but introduces complexity in state management on the spoke side.
|
||||||
|
- **Cross-references**: OQ-09 (operation list freshness)
|
||||||
|
|
||||||
|
### OQ-09: When does a spoke re-register its operations?
|
||||||
|
|
||||||
|
- **Origin**: [spoke-runner.md](spoke-runner.md) OQ-6
|
||||||
|
- **Status**: open
|
||||||
|
- **Priority**: medium
|
||||||
|
- **Question**: Does the spoke re-register on reconnect only, or does it push updates when its local registry changes? This affects the hub's operation routing and the RunnerPool design.
|
||||||
|
|
||||||
|
### OQ-10: What is the design for the hub-side WebSocket handler?
|
||||||
|
|
||||||
|
- **Origin**: [spoke-runner.md](spoke-runner.md) Hub-Side WebSocket Handling section
|
||||||
|
- **Status**: open
|
||||||
|
- **Priority**: high — blocks all spoke functionality
|
||||||
|
- **Question**: What is the full design for the hub-side WebSocket handler? This includes: Hono WebSocket upgrade handler, per-connection `WebSocketEventTarget`, per-connection `PendingRequestMap`, spoke lifecycle management (connect/register/heartbeat/disconnect), identity/authentication integration, and reconnection state recovery. Currently described as "an architectural task that needs deeper design" with no spec.
|
||||||
|
- **Cross-references**: OQ-02 (WebSocket auth), OQ-06 (spoke project context — constrains handler message types), OQ-08 (concurrent operations)
|
||||||
|
|
||||||
|
### OQ-11: Container spoke lifecycle
|
||||||
|
|
||||||
|
- **Origin**: [spoke-runner.md](spoke-runner.md) OQ-2, [hub-architecture.md](hub-architecture.md) Components table
|
||||||
|
- **Status**: open
|
||||||
|
- **Priority**: low
|
||||||
|
- **Question**: Container spoke extends base spoke with Docker container lifecycle management and opencode integration. Design is deferred until base spoke is working.
|
||||||
|
- **Cross-references**: OQ-06 (project context), OQ-07 (source sync)
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Theme 3: Data Integrity & Lifecycle
|
||||||
|
|
||||||
|
### OQ-12: Operation deletion and call graph referential integrity
|
||||||
|
|
||||||
|
- **Origin**: [call-graph.md](call-graph.md) OQ-1, [storage/spokes.md](storage/spokes.md) OQ-1
|
||||||
|
- **Status**: open
|
||||||
|
- **Priority**: high — blocks operation lifecycle management
|
||||||
|
- **Question**: The `call_graph_nodes.operationId` column has a RESTRICT FK to `operations.id`. An operation cannot be deleted while any call records reference it. Two strategies proposed: (a) deny removal while call records exist, or (b) reassign referencing call records to a sentinel `__removed__` operation. Making `operationId` nullable in flowgraph's `CallNodeAttrs` is another option. This needs coordination with the `@alkdev/flowgraph` package.
|
||||||
|
- **Cross-references**: OQ-13 (call graph retention interacts with deletion constraints)
|
||||||
|
|
||||||
|
### OQ-13: Call graph retention policy
|
||||||
|
|
||||||
|
- **Origin**: [storage/call-graph.md](storage/call-graph.md) Retention Policy section, [storage/README.md](storage/README.md) OQ-3
|
||||||
|
- **Status**: open
|
||||||
|
- **Priority**: medium
|
||||||
|
- **Question**: Need TTL-based cleanup of completed/failed call graph records older than N days, with aggregation for observability. Default 90 days is specified but no config field exists yet. Aggregation for observability (dashboards, metrics) is deferred to Phase 2.
|
||||||
|
- **Cross-references**: OQ-12 (operation deletion)
|
||||||
|
|
||||||
|
### OQ-14: Call graph payload truncation strategy
|
||||||
|
|
||||||
|
- **Origin**: [storage/call-graph.md](storage/call-graph.md)
|
||||||
|
- **Status**: open
|
||||||
|
- **Priority**: medium
|
||||||
|
- **Question**: Strategy defined (10KB threshold, `{ _truncated }` format) but no config field exists for the threshold. Payload redaction strategy also needs config fields. Object storage for payloads exceeding the truncation threshold is Phase 2 and not yet implemented.
|
||||||
|
- **Cross-references**: OQ-13 (retention policy)
|
||||||
|
|
||||||
|
### OQ-15: Polymorphic FK enforcement for `providerId`
|
||||||
|
|
||||||
|
- **Origin**: [storage/spokes.md](storage/spokes.md) OQ-2
|
||||||
|
- **Status**: open
|
||||||
|
- **Priority**: medium
|
||||||
|
- **Question**: `providerId` in `spokes` table references different parent tables depending on `spokeType` (either `dev_env_spokes` or `compute_spokes`). Current approach is application-layer enforcement. Alternatives (two nullable FK columns, DB triggers) are deferred.
|
||||||
|
- **Cross-references**: None currently
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Theme 4: Session & Schema Design
|
||||||
|
|
||||||
|
### OQ-16: Session/message schema finalization
|
||||||
|
|
||||||
|
- **Origin**: [agent-sessions.md](agent-sessions.md) Schema Research Needed section, [storage/sessions.md](storage/sessions.md)
|
||||||
|
- **Status**: open
|
||||||
|
- **Priority**: high — blocks session storage implementation
|
||||||
|
- **Question**: The message/part schema needs more iteration. Opencode's drizzle+sqlite schema uses a message tree format with parent/child parts that needs reconciliation with AI SDK `UIMessage` part types. Which exact subset of opencode's part types does the hub use? How do we handle the session `data` column shapes (formally type-constrained or application-layer guidance)?
|
||||||
|
- **Cross-references**: OQ-17 (compaction), OQ-19 (part nesting)
|
||||||
|
|
||||||
|
### OQ-17: Session message compaction
|
||||||
|
|
||||||
|
- **Origin**: [agent-sessions.md](agent-sessions.md), [storage/README.md](storage/README.md) OQ-2
|
||||||
|
- **Status**: open
|
||||||
|
- **Priority**: medium
|
||||||
|
- **Question**: Need to define what compaction means for hub-direct AI SDK sessions. Opencode has a `compaction` agent/part type. The hub needs a strategy for long-running sessions that accumulate many messages.
|
||||||
|
- **Cross-references**: OQ-16 (schema finalization)
|
||||||
|
|
||||||
|
### OQ-18: Message data versioning
|
||||||
|
|
||||||
|
- **Origin**: [storage/README.md](storage/README.md) OQ-1
|
||||||
|
- **Status**: open
|
||||||
|
- **Priority**: medium
|
||||||
|
- **Question**: Should the `data` column format be versioned for forward compatibility? Opencode has a `version` column on sessions. If the data shape evolves, old records need to be readable.
|
||||||
|
- **Cross-references**: OQ-45 (client config schema evolution), OQ-16 (schema finalization)
|
||||||
|
|
||||||
|
### OQ-19: Part nesting
|
||||||
|
|
||||||
|
- **Origin**: [storage/sessions.md](storage/sessions.md)
|
||||||
|
- **Status**: open
|
||||||
|
- **Priority**: low
|
||||||
|
- **Question**: Currently flat parts with `messageId` FK. If nesting becomes necessary (e.g., tool results containing sub-parts), it would require a `parentId` column on `parts`. Not needed for v1 but should be considered in schema design.
|
||||||
|
- **Cross-references**: OQ-16 (schema finalization)
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Theme 5: Configuration & Infrastructure
|
||||||
|
|
||||||
|
### OQ-20: Config reload without restart
|
||||||
|
|
||||||
|
- **Origin**: [hub-config.md](hub-config.md) OQ-1, [hub-startup.md](hub-startup.md) OQ-2
|
||||||
|
- **Status**: open
|
||||||
|
- **Priority**: medium
|
||||||
|
- **Question**: For non-encrypted fields (logLevel, cache TTLs), should SIGHUP or an API call trigger re-reading the config file? Encrypted fields would need the master key to remain in memory, which the current design explicitly avoids after startup. Currently config is read-once; changes require restart.
|
||||||
|
- **Cross-references**: OQ-21 (CI/CD config generation), OQ-22 (config layers)
|
||||||
|
|
||||||
|
### OQ-21: Config file generation for CI/CD
|
||||||
|
|
||||||
|
- **Origin**: [hub-config.md](hub-config.md) OQ-2
|
||||||
|
- **Status**: open
|
||||||
|
- **Priority**: high — blocks deployment
|
||||||
|
- **Question**: The `alkhub-config` CLI requires the master key to encrypt values. How does CI/CD get the master key? Options: (a) CI has access to the master key secret, (b) config files are pre-encrypted and stored in a private repo, (c) encryption happens at deploy time on the host.
|
||||||
|
- **Cross-references**: OQ-20 (config reload)
|
||||||
|
|
||||||
|
### OQ-22: Multiple config file layers
|
||||||
|
|
||||||
|
- **Origin**: [hub-config.md](hub-config.md) OQ-4
|
||||||
|
- **Status**: open
|
||||||
|
- **Priority**: low
|
||||||
|
- **Question**: Should the config loader support a base config + overlay pattern (e.g., `/etc/alkhub/config.json` + `/etc/alkhub/config.local.json`)? Useful for dev vs. prod.
|
||||||
|
- **Cross-references**: OQ-20 (config reload)
|
||||||
|
|
||||||
|
### OQ-23: What are the production SSL/TLS requirements for PostgresConfig?
|
||||||
|
|
||||||
|
- **Origin**: [hub-config.md](hub-config.md) PostgresConfig section
|
||||||
|
- **Status**: open
|
||||||
|
- **Priority**: medium
|
||||||
|
- **Question**: `PostgresConfig.ssl` is currently `Type.Optional(Type.Boolean())` — "true = enable SSL with default CA verification". For production, TLS to Postgres is essential. What detailed SSL config is needed (CA certs, client certs, verify modes, custom CA)? Should we use a `PostgresSslConfig` object or a connection string-based approach?
|
||||||
|
|
||||||
|
### OQ-24: HTTPServiceConfig.auth.tokenEnv deprecation
|
||||||
|
|
||||||
|
- **Origin**: [hub-config.md](hub-config.md), [operations.md](operations.md)
|
||||||
|
- **Status**: open
|
||||||
|
- **Priority**: high — security violation
|
||||||
|
- **Question**: `HTTPServiceConfig.auth.tokenEnv` is deprecated and should be removed. The `from_openapi.ts` line `Deno.env.get(config.auth.tokenEnv)` is a bug that violates the "no secrets in env vars" rule. All outbound auth tokens should be resolved from `client_secrets` via `secretKey` wiring. This needs to be removed and replaced before any production use.
|
||||||
|
|
||||||
|
### OQ-25: Secret reference resolution ordering
|
||||||
|
|
||||||
|
- **Origin**: [hub-config.md](hub-config.md) OQ-7
|
||||||
|
- **Status**: open
|
||||||
|
- **Priority**: medium
|
||||||
|
- **Question**: Should `resolveSecretRefs` fail at startup if a referenced `secretKey` doesn't exist in `client_secrets` yet? Current preference: fail at startup for clients that are `enabled: true`. If a client is disabled, the missing secret is logged as a warning and left unresolved.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Theme 6: Role & Identity Management
|
||||||
|
|
||||||
|
### OQ-26: Role import/sync operation
|
||||||
|
|
||||||
|
- **Origin**: [agent-roles.md](agent-roles.md) OQ-1, [storage/README.md](storage/README.md) OQ-9 (partial), [storage/roles.md](storage/roles.md)
|
||||||
|
- **Status**: open
|
||||||
|
- **Priority**: medium
|
||||||
|
- **Question**: Should there be a `roles.sync` operation that reads `.opencode/agents/*.md` and syncs them to the `roles` table? Phase 2 of the role transition plan. Files are the authoring surface; database is the source of truth at runtime.
|
||||||
|
- **Cross-references**: OQ-27 (role inheritance resolution), OQ-28 (dynamic roles)
|
||||||
|
|
||||||
|
### OQ-27: Role inheritance with permission resolution
|
||||||
|
|
||||||
|
- **Origin**: [agent-roles.md](agent-roles.md) OQ-8
|
||||||
|
- **Status**: open
|
||||||
|
- **Priority**: medium
|
||||||
|
- **Question**: When a role has a `parentId`, its permissions are unioned with the parent's, with the child's rules taking priority in case of conflict. Max depth: 3 levels. Circular inheritance is prevented at role creation time. The description exists but the implementation is not yet specified.
|
||||||
|
- **Cross-references**: OQ-26 (role sync)
|
||||||
|
|
||||||
|
### OQ-28: Dynamic role creation
|
||||||
|
|
||||||
|
- **Origin**: [agent-roles.md](agent-roles.md) OQ-3
|
||||||
|
- **Status**: open
|
||||||
|
- **Priority**: low
|
||||||
|
- **Question**: Opencode supports `Agent.generate()` for on-the-fly role creation. The hub currently only supports predefined roles. Should dynamic role creation be supported? Decision: start with predefined, add later if needed.
|
||||||
|
- **Cross-references**: OQ-26 (role sync)
|
||||||
|
|
||||||
|
### OQ-29: Per-session role switching
|
||||||
|
|
||||||
|
- **Origin**: [agent-roles.md](agent-roles.md) OQ-4
|
||||||
|
- **Status**: open
|
||||||
|
- **Priority**: medium
|
||||||
|
- **Question**: Should a session be able to change roles mid-conversation? Opencode supports this. Our current model binds role at session creation. Decision: support `session.updateRole` operation, but this requires re-evaluating and storing new resolved permissions in `sessions.data.scope`.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Theme 7: Task Management
|
||||||
|
|
||||||
|
### OQ-30: Task storage and sync implementation
|
||||||
|
|
||||||
|
- **Origin**: [storage/README.md](storage/README.md) OQ-9
|
||||||
|
- **Status**: open
|
||||||
|
- **Priority**: high
|
||||||
|
- **Question**: The database is the source of truth for tasks; markdown files are the authoring surface. The sync operation (files → database) exists conceptually but is not yet implemented. This blocks the SDD workflow from using database-backed task tracking.
|
||||||
|
- **Cross-references**: OQ-26 (role sync — similar pattern)
|
||||||
|
|
||||||
|
### OQ-31: Bulk task status updates
|
||||||
|
|
||||||
|
- **Origin**: [storage/tasks.md](storage/tasks.md) OQ-2
|
||||||
|
- **Status**: open
|
||||||
|
- **Priority**: medium
|
||||||
|
- **Question**: Should completing a meta task auto-mark all sub-tasks as completed? Likely yes, but this is application-level logic that needs implementation.
|
||||||
|
|
||||||
|
### OQ-32: Cross-project task dependencies
|
||||||
|
|
||||||
|
- **Origin**: [storage/tasks.md](storage/tasks.md) OQ-3
|
||||||
|
- **Status**: open
|
||||||
|
- **Priority**: low
|
||||||
|
- **Question**: Not supported for v1. Application-layer validation prevents cross-project references. DB trigger guard deferred to Phase 2.
|
||||||
|
|
||||||
|
### OQ-33: Task embeddings
|
||||||
|
|
||||||
|
- **Origin**: [storage/tasks.md](storage/tasks.md) OQ-1
|
||||||
|
- **Status**: open
|
||||||
|
- **Priority**: low
|
||||||
|
- **Question**: Vector embeddings for similarity search. `metadata` JSONB can hold an embedding reference later, or a separate `task_embeddings` table can be added. Deferred.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Theme 8: Deployment & Operations
|
||||||
|
|
||||||
|
### OQ-34: Background migration vs. startup migration
|
||||||
|
|
||||||
|
- **Origin**: [hub-startup.md](hub-startup.md) OQ-1
|
||||||
|
- **Status**: open
|
||||||
|
- **Priority**: medium
|
||||||
|
- **Question**: Should migrations block startup, or should they run in the background while the hub serves with the old schema? Recommendation: block for now (simpler, safer). Revisit if startup latency becomes a problem with large migrations.
|
||||||
|
|
||||||
|
### OQ-35: Hot spare / zero-downtime restart
|
||||||
|
|
||||||
|
- **Origin**: [hub-startup.md](hub-startup.md) OQ-3
|
||||||
|
- **Status**: open
|
||||||
|
- **Priority**: low
|
||||||
|
- **Question**: For production deployments, can we start a new hub process before shutting down the old one? Requires connection draining and session transfer. Deferred — hub is single-instance for now.
|
||||||
|
- **Cross-references**: [infrastructure.md](infrastructure.md) single-instance model
|
||||||
|
|
||||||
|
### OQ-36: Startup observability
|
||||||
|
|
||||||
|
- **Origin**: [hub-startup.md](hub-startup.md) OQ-4
|
||||||
|
- **Status**: open
|
||||||
|
- **Priority**: low
|
||||||
|
- **Question**: Should the startup sequence emit pub/sub events so monitoring systems can track progress, or is the `/health` endpoint plus structured logs sufficient? Recommendation: `/health` for now.
|
||||||
|
|
||||||
|
### OQ-37: Redis deployment topology
|
||||||
|
|
||||||
|
- **Origin**: [hub-architecture.md](hub-architecture.md) OQ-1
|
||||||
|
- **Status**: open
|
||||||
|
- **Priority**: medium
|
||||||
|
- **Question**: Redis is deployed on the hub server. For production with many spokes on a compute server, may want Redis closer to containers for lower pub/sub latency. Current approach works for v1 but may need topology changes at scale.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Theme 9: Cross-Cutting Implementation Gaps
|
||||||
|
|
||||||
|
### OQ-38: Hub startup implementation
|
||||||
|
|
||||||
|
- **Origin**: [hub-startup.md](hub-startup.md) — full startup sequence spec, no implementation yet
|
||||||
|
- **Status**: open
|
||||||
|
- **Priority**: high — blocks all functionality
|
||||||
|
- **Question**: `src/main.ts` and `startHub()` are not yet implemented. The full 11-step startup sequence is specified in hub-startup.md. This is the single most blocking implementation gap.
|
||||||
|
- **Cross-references**: OQ-20 (config reload), OQ-24 (tokenEnv deprecation)
|
||||||
|
|
||||||
|
### OQ-39: Hub-specific config in operations package
|
||||||
|
|
||||||
|
- **Origin**: [operations.md](operations.md) Known Gaps
|
||||||
|
- **Status**: open
|
||||||
|
- **Priority**: high — blocks hub startup
|
||||||
|
- **Question**: `core/config/types.ts` in the operations package has spoke-only config. Hub-specific config (postgres, redis, auth) needs to be added. This overlaps with the hub-config.md spec but the actual code doesn't exist yet.
|
||||||
|
- **Cross-references**: OQ-38 (startup implementation)
|
||||||
|
|
||||||
|
### OQ-40: Logger configuration
|
||||||
|
|
||||||
|
- **Origin**: [operations.md](operations.md) Known Gaps
|
||||||
|
- **Status**: open
|
||||||
|
- **Priority**: medium
|
||||||
|
- **Question**: `core/logger/mod.ts` is a stub that only logs the `["logtape", "meta"]` category. Needs proper config for app-level loggers. Hub startup Step 3 configures the logger, but the implementation is stub-level.
|
||||||
|
|
||||||
|
### OQ-41: Gitea operations at startup
|
||||||
|
|
||||||
|
- **Origin**: [storage/README.md](storage/README.md) OQ-7
|
||||||
|
- **Status**: open
|
||||||
|
- **Priority**: medium
|
||||||
|
- **Question**: Load Gitea swagger spec at startup and register ~300 operations via FromOpenAPI. This wires the hub to the Gitea API for repository operations but is not yet implemented.
|
||||||
|
|
||||||
|
### OQ-42: Keypal adapter testing
|
||||||
|
|
||||||
|
- **Origin**: [storage/README.md](storage/README.md) OQ-4
|
||||||
|
- **Status**: open
|
||||||
|
- **Priority**: medium
|
||||||
|
- **Question**: `HubKeyStorage` (the Drizzle adapter for keypal) needs comprehensive tests before production use.
|
||||||
|
|
||||||
|
### OQ-43: MCP endpoint authentication detail
|
||||||
|
|
||||||
|
- **Origin**: [mcp-server.md](mcp-server.md) Auth section
|
||||||
|
- **Status**: open
|
||||||
|
- **Priority**: medium
|
||||||
|
- **Question**: "The MCP endpoint uses bearer token auth. Each runner gets a token at registration." No detail on token format, rotation, issuance, or how tokens are validated. This connects to OQ-01 (API auth model) and OQ-02 (WebSocket auth).
|
||||||
|
|
||||||
|
### OQ-44: Reactive vs. call graph `requested` semantics
|
||||||
|
|
||||||
|
- **Origin**: [call-graph.md](call-graph.md) OQ-2
|
||||||
|
- **Status**: open
|
||||||
|
- **Priority**: medium
|
||||||
|
- **Question**: In `FlowGraph`, `call.requested` creates a node in `pending` state. In `WorkflowReactiveRoot`, `call.requested` maps to `NodeStatus.running`. This is a deliberate semantic difference — the reactive model tracks execution progress while the call graph model tracks protocol state. But implementers must be aware that feeding the same event to both models produces different initial statuses.
|
||||||
|
|
||||||
|
### OQ-45: Client config schema evolution
|
||||||
|
|
||||||
|
- **Origin**: [storage/README.md](storage/README.md) OQ-8
|
||||||
|
- **Status**: open
|
||||||
|
- **Priority**: medium
|
||||||
|
- **Question**: Existing DB rows in `clients.config` may fail validation if the TypeBox schema changes. Using `Type.Optional()` for new fields helps, but breaking changes need a strategy. Full contract migration protocol is a pending task.
|
||||||
|
|
||||||
|
### OQ-46: Spoke auth field format in config
|
||||||
|
|
||||||
|
- **Origin**: [hub-config.md](hub-config.md) OQ-3
|
||||||
|
- **Status**: open
|
||||||
|
- **Priority**: high
|
||||||
|
- **Question**: The `SpokeConfig.auth` field format is blocked on the spoke-runner WebSocket auth design (OQ-02). Config system supports `tokenFile` but actual protocol is TBD.
|
||||||
|
- **Cross-references**: OQ-02 (WebSocket auth)
|
||||||
|
|
||||||
|
### OQ-47: Config schema version
|
||||||
|
|
||||||
|
- **Origin**: [hub-config.md](hub-config.md) OQ-5
|
||||||
|
- **Status**: open
|
||||||
|
- **Priority**: low
|
||||||
|
- **Question**: `BaseConfig.$schema` is optional. `alkhub-config init` should generate it. Implementation detail — doesn't block anything but supports forward compatibility and editor validation.
|
||||||
|
|
||||||
|
### OQ-48: Cross-doc terminology migration
|
||||||
|
|
||||||
|
- **Origin**: [storage/README.md](storage/README.md) OQ-5
|
||||||
|
- **Status**: open
|
||||||
|
- **Priority**: low
|
||||||
|
- **Question**: The ADR-005 rename from "runner" to "spoke" is done in primary specs but "runner/runnerId" references still exist in other architecture docs. Need updating for consistency.
|
||||||
|
|
||||||
|
### OQ-49: ADR-012 migration
|
||||||
|
|
||||||
|
- **Origin**: [ADR-012](../decisions/ADR-012-agent-vs-role-vs-account.md) — Proposed, not Accepted
|
||||||
|
- **Status**: open
|
||||||
|
- **Priority**: medium
|
||||||
|
- **Question**: ADR-012 proposes terminology changes (`sessions.agentName` → `roleName`, etc.). The ADR is in "Proposed" status, not "Accepted". The storage docs already use "role" terminology, but the rename needs a migration plan and the ADR needs to be accepted or rejected.
|
||||||
|
|
||||||
|
### OQ-50: Key rotation background sweep implementation
|
||||||
|
|
||||||
|
- **Origin**: [storage-spec-phase1-resolutions.md](../decisions/storage-spec-phase1-resolutions.md) D4
|
||||||
|
- **Status**: open
|
||||||
|
- **Priority**: high
|
||||||
|
- **Question**: Task `specify-key-rotation-protocol` addresses key rotation, and the protocol is described in storage/services.md, but the background sweep implementation (cron job that re-encrypts `client_secrets` rows with the current key version) is not yet implemented.
|
||||||
|
- **Cross-references**: OQ-24 (tokenEnv deprecation — more secrets flow through `client_secrets` after this), OQ-25 (secret reference resolution — sweep depends on correct secret ref wiring)
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Theme 10: Future / Low Priority
|
||||||
|
|
||||||
|
### OQ-51: Role database-authoritative phase (Phase 3)
|
||||||
|
|
||||||
|
- **Origin**: [agent-roles.md](agent-roles.md) Phase 3, [storage/roles.md](storage/roles.md)
|
||||||
|
- **Status**: open
|
||||||
|
- **Priority**: low
|
||||||
|
- **Question**: Eventually, role definitions should be primarily in the database with files only for version control. This is Phase 3 and not blocking for v1.
|
||||||
|
|
||||||
|
### OQ-52: Memory across sessions
|
||||||
|
|
||||||
|
- **Origin**: [agent-roles.md](agent-roles.md) OQ-7
|
||||||
|
- **Status**: open
|
||||||
|
- **Priority**: low
|
||||||
|
- **Question**: Should LLM accounts have persistent memory across sessions? This is separate from session message history. Could be a `memories` table or vector store. Deferred — separate feature.
|
||||||
|
|
||||||
|
### OQ-53: Task versioning
|
||||||
|
|
||||||
|
- **Origin**: [storage/tasks.md](storage/tasks.md) OQ-4
|
||||||
|
- **Status**: open
|
||||||
|
- **Priority**: low
|
||||||
|
- **Question**: Should previous versions of task body be kept? Decision for v1: no versioning, just update in place.
|
||||||
|
|
||||||
|
### OQ-54: High-contention task notes
|
||||||
|
|
||||||
|
- **Origin**: [storage/tasks.md](storage/tasks.md)
|
||||||
|
- **Status**: open
|
||||||
|
- **Priority**: low
|
||||||
|
- **Question**: DB-level concatenation for append is specified, but consider separating into `task_notes` table for high-contention scenarios.
|
||||||
|
|
||||||
|
### OQ-55: Anthropic conversation import
|
||||||
|
|
||||||
|
- **Origin**: [storage/README.md](storage/README.md) OQ-6
|
||||||
|
- **Status**: open
|
||||||
|
- **Priority**: low
|
||||||
|
- **Question**: Import script for Anthropic conversations is deferred. Export format is documented.
|
||||||
|
|
||||||
|
### OQ-56: ADR-013 out-of-scope items
|
||||||
|
|
||||||
|
- **Origin**: [ADR-013](../decisions/ADR-013-schema-system-integration.md)
|
||||||
|
- **Status**: open
|
||||||
|
- **Priority**: low
|
||||||
|
- **Question**: Several items explicitly out of scope for ADR-013: bidirectional Zod ↔ TypeBox sync, runtime schema migration, auto-generation of TypeScript types from wire schemas, converting Zod `.transform()` / `.pipe()` output types. May revisit if needed.
|
||||||
|
|
||||||
|
### OQ-57: Call graph visualization
|
||||||
|
|
||||||
|
- **Origin**: [call-graph.md](call-graph.md) What We Defer #2
|
||||||
|
- **Status**: open
|
||||||
|
- **Priority**: low
|
||||||
|
- **Question**: API only, no Sigma.js UI for v1.
|
||||||
|
|
||||||
|
### OQ-58: Stream deduplication
|
||||||
|
|
||||||
|
- **Origin**: [call-graph.md](call-graph.md) What We Defer #3
|
||||||
|
- **Status**: open
|
||||||
|
- **Priority**: medium
|
||||||
|
- **Question**: `Value.Hash({operationId, input})` deduplication for multiple subscribers to the same stream. May be needed for subscription scalability.
|
||||||
|
|
||||||
|
### OQ-59: `requested_by` edge in flowgraph
|
||||||
|
|
||||||
|
- **Origin**: [call-graph.md](call-graph.md) What We Defer #4
|
||||||
|
- **Status**: open
|
||||||
|
- **Priority**: low
|
||||||
|
- **Question**: The `requested_by` edge type is a storage-layer concept for identity tracing. It's persisted in `call_graph_edges` but not modeled in `@alkdev/flowgraph`'s `CallEdgeAttrs`. May be added to flowgraph in the future.
|
||||||
|
|
||||||
|
### OQ-60: Full ujsx call templates
|
||||||
|
|
||||||
|
- **Origin**: [call-graph.md](call-graph.md) What We Defer #1
|
||||||
|
- **Status**: open
|
||||||
|
- **Priority**: low
|
||||||
|
- **Question**: Currently using hardcoded workflow sequences. `@alkdev/flowgraph/component` provides `Operation`, `Sequential`, `Parallel`, `Conditional`, `Map` components for declarative template definition. Will adopt when workflow complexity justifies it.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Cross-Cutting Dependencies
|
||||||
|
|
||||||
|
These questions block each other or share resolution paths:
|
||||||
|
|
||||||
|
1. **API Auth Chain**: OQ-01 → OQ-02 → OQ-43 → OQ-46 — The API auth model determines WebSocket auth, which determines MCP auth and spoke config format. Resolve top-down.
|
||||||
|
|
||||||
|
2. **Spoke Connectivity Chain**: OQ-06 → OQ-10 — Spoke provisioning can't work without the hub-side WebSocket handler. Resolve OQ-10 first.
|
||||||
|
|
||||||
|
3. **Session Schema Chain**: OQ-16 → OQ-17 → OQ-18 → OQ-19 — Schema finalization blocks compaction and versioning design. Resolve OQ-16 first.
|
||||||
|
|
||||||
|
4. **Implementation Bootstrap**: OQ-38 → OQ-39 → OQ-40 — Hub startup implementation needs hub config types and proper logger config. These are the minimum viable path to a running hub.
|
||||||
|
|
||||||
|
5. **Config Security Chain**: OQ-24 → OQ-25 → OQ-50 — Token env deprecation and secret reference resolution are intertwined. OQ-24 must be resolved (remove tokenEnv) before OQ-25 can be validated. After OQ-25, the key rotation background sweep (OQ-50) becomes more important because more secrets flow through `client_secrets`.
|
||||||
|
|
||||||
|
6. **Data Lifecycle Chain**: OQ-12 → OQ-13 → OQ-14 — Operation deletion strategy, call graph retention, and payload truncation interact. OQ-12 determines whether operations can be removed at all.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Summary Table
|
||||||
|
|
||||||
|
| ID | Question | Origin | Priority | Status |
|
||||||
|
|----|----------|--------|----------|--------|
|
||||||
|
| OQ-01 | API authentication model | hub-architecture | high | open |
|
||||||
|
| OQ-02 | WebSocket auth for spokes | spoke-runner | high | open |
|
||||||
|
| OQ-03 | Permission enforcement at call protocol | agent-roles | high | resolved |
|
||||||
|
| OQ-04 | LLM account provisioning | agent-roles | medium | open |
|
||||||
|
| OQ-05 | SSO with Gitea | hub-architecture | medium | open |
|
||||||
|
| OQ-06 | Spoke project context | spoke-runner | high | open |
|
||||||
|
| OQ-07 | Source sync for external compute | spoke-runner | medium | open |
|
||||||
|
| OQ-08 | Concurrent spoke operations | spoke-runner | medium | open |
|
||||||
|
| OQ-09 | Spoke operation list freshness | spoke-runner | medium | open |
|
||||||
|
| OQ-10 | Hub-side WebSocket handler design | spoke-runner | high | open |
|
||||||
|
| OQ-11 | Container spoke lifecycle | spoke-runner, hub-architecture | low | open |
|
||||||
|
| OQ-12 | Operation deletion vs. call graph FK | call-graph, storage/spokes | high | open |
|
||||||
|
| OQ-13 | Call graph retention policy | storage/call-graph, storage/README | medium | open |
|
||||||
|
| OQ-14 | Call graph payload truncation config | storage/call-graph | medium | open |
|
||||||
|
| OQ-15 | Polymorphic FK for `providerId` | storage/spokes | medium | open |
|
||||||
|
| OQ-16 | Session/message schema finalization | agent-sessions, storage/sessions | high | open |
|
||||||
|
| OQ-17 | Session message compaction | agent-sessions, storage/README | medium | open |
|
||||||
|
| OQ-18 | Message data versioning | storage/README | medium | open |
|
||||||
|
| OQ-19 | Part nesting | storage/sessions | low | open |
|
||||||
|
| OQ-20 | Config reload without restart | hub-config, hub-startup | medium | open |
|
||||||
|
| OQ-21 | CI/CD config generation | hub-config | high | open |
|
||||||
|
| OQ-22 | Multiple config file layers | hub-config | low | open |
|
||||||
|
| OQ-23 | PostgresConfig SSL details | hub-config | medium | open |
|
||||||
|
| OQ-24 | HTTPServiceConfig.auth.tokenEnv deprecation | hub-config, operations | high | open |
|
||||||
|
| OQ-25 | Secret reference resolution ordering | hub-config | medium | open |
|
||||||
|
| OQ-26 | Role import/sync operation | agent-roles, storage/README, storage/roles | medium | open |
|
||||||
|
| OQ-27 | Role inheritance with permission resolution | agent-roles | medium | open |
|
||||||
|
| OQ-28 | Dynamic role creation | agent-roles | low | open |
|
||||||
|
| OQ-29 | Per-session role switching | agent-roles | medium | open |
|
||||||
|
| OQ-30 | Task storage and sync implementation | storage/README | high | open |
|
||||||
|
| OQ-31 | Bulk task status updates | storage/tasks | medium | open |
|
||||||
|
| OQ-32 | Cross-project task dependencies | storage/tasks | low | open |
|
||||||
|
| OQ-33 | Task embeddings | storage/tasks | low | open |
|
||||||
|
| OQ-34 | Background vs. startup migration | hub-startup | medium | open |
|
||||||
|
| OQ-35 | Hot spare / zero-downtime restart | hub-startup | low | open |
|
||||||
|
| OQ-36 | Startup observability | hub-startup | low | open |
|
||||||
|
| OQ-37 | Redis deployment topology | hub-architecture | medium | open |
|
||||||
|
| OQ-38 | Hub startup implementation | hub-startup | high | open |
|
||||||
|
| OQ-39 | Hub-specific config in operations package | operations | high | open |
|
||||||
|
| OQ-40 | Logger configuration | operations | medium | open |
|
||||||
|
| OQ-41 | Gitea operations at startup | storage/README | medium | open |
|
||||||
|
| OQ-42 | Keypal adapter testing | storage/README | medium | open |
|
||||||
|
| OQ-43 | MCP endpoint authentication detail | mcp-server | medium | open |
|
||||||
|
| OQ-44 | Reactive vs. call graph requested semantics | call-graph | medium | open |
|
||||||
|
| OQ-45 | Client config schema evolution | storage/README | medium | open |
|
||||||
|
| OQ-46 | Spoke auth field format in config | hub-config | high | open |
|
||||||
|
| OQ-47 | Config schema version | hub-config | low | open |
|
||||||
|
| OQ-48 | Cross-doc terminology migration | storage/README | low | open |
|
||||||
|
| OQ-49 | ADR-012 migration | decisions/ADR-012 | medium | open |
|
||||||
|
| OQ-50 | Key rotation background sweep | decisions/storage-spec-phase1 | high | open |
|
||||||
|
| OQ-51 | Role database-authoritative (Phase 3) | agent-roles, storage/roles | low | open |
|
||||||
|
| OQ-52 | Memory across sessions | agent-roles | low | open |
|
||||||
|
| OQ-53 | Task versioning | storage/tasks | low | open |
|
||||||
|
| OQ-54 | High-contention task notes | storage/tasks | low | open |
|
||||||
|
| OQ-55 | Anthropic conversation import | storage/README | low | open |
|
||||||
|
| OQ-56 | ADR-013 out-of-scope items | decisions/ADR-013 | low | open |
|
||||||
|
| OQ-57 | Call graph visualization | call-graph | low | open |
|
||||||
|
| OQ-58 | Stream deduplication | call-graph | medium | open |
|
||||||
|
| OQ-59 | `requested_by` edge in flowgraph | call-graph | low | open |
|
||||||
|
| OQ-60 | Full ujsx call templates | call-graph | low | open |
|
||||||
|
|
||||||
|
### High Priority Open Questions (Blocking)
|
||||||
|
|
||||||
|
These 11 questions block core functionality and should be resolved first:
|
||||||
|
|
||||||
|
| ID | Question | Blocks |
|
||||||
|
|----|----------|--------|
|
||||||
|
| OQ-01 | API authentication model | All authenticated endpoints |
|
||||||
|
| OQ-02 | WebSocket auth for spokes | All spoke connections |
|
||||||
|
| OQ-06 | Spoke project context | Spoke provisioning |
|
||||||
|
| OQ-10 | Hub-side WebSocket handler design | All spoke functionality |
|
||||||
|
| OQ-12 | Operation deletion vs. call graph FK | Operation lifecycle |
|
||||||
|
| OQ-16 | Session/message schema finalization | Session storage implementation |
|
||||||
|
| OQ-21 | CI/CD config generation | Deployment |
|
||||||
|
| OQ-24 | HTTPServiceConfig.auth.tokenEnv deprecation | Security (env var leak) |
|
||||||
|
| OQ-38 | Hub startup implementation | All functionality |
|
||||||
|
| OQ-39 | Hub-specific config in operations package | Hub startup |
|
||||||
|
| OQ-46 | Spoke auth field format in config | Spoke config |
|
||||||
|
| OQ-50 | Key rotation background sweep | Production secret management |
|
||||||
|
|
||||||
|
### Resolution Priority Order
|
||||||
|
|
||||||
|
Suggested order for resolving the high-priority questions, based on dependency chains:
|
||||||
|
|
||||||
|
1. **OQ-38 + OQ-39** — Hub startup implementation + config types (enables everything)
|
||||||
|
2. **OQ-01** — API auth model (unblocks OQ-02, OQ-43, OQ-46)
|
||||||
|
3. **OQ-02** — WebSocket auth (unblocks OQ-10, OQ-46)
|
||||||
|
4. **OQ-10** — Hub-side WebSocket handler (enables spokes)
|
||||||
|
5. **OQ-24** — tokenEnv deprecation (security fix)
|
||||||
|
6. **OQ-16** — Session/message schema (enables storage)
|
||||||
|
7. **OQ-12** — Operation deletion strategy (data integrity)
|
||||||
|
8. **OQ-21** — CI/CD config generation (deployment)
|
||||||
|
9. **OQ-06** — Spoke project context (spoke provisioning)
|
||||||
|
10. **OQ-50** — Key rotation sweep (production secret management)
|
||||||
Reference in New Issue
Block a user