import research docs from prior conversation and scattered sources
This commit is contained in:
675
docs/research/agent-hud-architecture.md
Normal file
675
docs/research/agent-hud-architecture.md
Normal file
@@ -0,0 +1,675 @@
|
||||
# Agent HUD: A Context Management Architecture for LLM Agents
|
||||
|
||||
## Research → Design Synthesis
|
||||
|
||||
This document synthesizes findings from six research areas into a concrete architecture for an agent HUD system that replaces ad-hoc compaction with structured, cache-aware context management.
|
||||
|
||||
**See also**: [`ujsx-v2-typebox-rewrite.md`](./ujsx-v2-typebox-rewrite.md) — The UJSX v2 rewrite plan, replacing the existing POC at `/workspace/aui` with a TypeBox-schema-driven universal JSX IR that supports actual JSX syntax, bi-directional transforms, and component schemas as tool schemas.
|
||||
|
||||
---
|
||||
|
||||
## The Problem
|
||||
|
||||
Current LLM agent interfaces have a fundamental flaw: **context is managed as a monolithic conversation log that grows until it must be violently compacted**. This creates several pathologies:
|
||||
|
||||
1. **Information cliff**: Compaction replaces rich conversation with a lossy summary. Everything before the compaction boundary is gone — the agent can't even know what it lost.
|
||||
|
||||
2. **Reactive, not proactive**: OpenCode's compaction fires at ~92% context usage. There's no budgeting — system prompt, tool definitions, and conversation history compete for the same fixed space with no allocation strategy.
|
||||
|
||||
3. **Cognitive waste**: The agent sees the full conversation log every turn, including messages that are no longer relevant. Early messages about setup, resolved errors, and abandoned approaches consume tokens without providing value.
|
||||
|
||||
4. **Cognitive strain**: The agent has no ambient awareness of its own state. It must explicitly call tools to check context usage, search history, or understand where it is in a task. Each tool call costs a turn and consumes context.
|
||||
|
||||
5. **No provider-cache alignment**: OpenCode's 2-part system prompt split (`llm.ts:115-126`) is the only cache-aware structure. The rest of the context — messages, tool results, the full conversation — has no cache segmentation.
|
||||
|
||||
The key insight: **the "agent frame" — what the LLM sees in a single turn — should be treated as a composition problem, not an append-only log problem.**
|
||||
|
||||
---
|
||||
|
||||
## The Leverage Point
|
||||
|
||||
OpenCode's `llm.ts:115-126` reveals the critical pattern:
|
||||
|
||||
```typescript
|
||||
// rejoin to maintain 2-part structure for caching if header unchanged
|
||||
if (system.length > 2 && system[0] === header) {
|
||||
const rest = system.slice(1)
|
||||
system.length = 0
|
||||
system.push(header, rest.join("\n"))
|
||||
}
|
||||
```
|
||||
|
||||
This splits the system prompt into:
|
||||
- **Part 0** (cached, stable): Agent prompt, provider prompt — changes rarely, benefits from prompt caching
|
||||
- **Part 1** (dynamic, re-cached each call): Environment, skills, instructions — changes often
|
||||
|
||||
The plugin system pushes to `output.system[]` which gets merged into Part 1. The open-memory plugin injects context status (~50 tokens) into this dynamic part.
|
||||
|
||||
**This 2-part pattern generalizes.** We can extend it to a multi-part context composition with different cache characteristics:
|
||||
|
||||
| Part | Content | Changes | Cache Value |
|
||||
|------|---------|---------|-------------|
|
||||
| 0 | Agent identity, core instructions | Rarely | Highest (static across turns) |
|
||||
| 1 | HUD static layer (task, key decisions) | On agent action | High (stable within a task phase) |
|
||||
| 2 | HUD dynamic layer (notes, recent events) | Every turn | Medium (re-cached each call) |
|
||||
| 3 | Conversation messages | Every turn | Low (grows monotonically) |
|
||||
|
||||
---
|
||||
|
||||
## Architecture: The Agent HUD
|
||||
|
||||
### Core Concept
|
||||
|
||||
The HUD is a **structured markdown document** that the agent sees at the top of every turn, injected via the `experimental.chat.system.transform` plugin hook. Unlike compaction (which discards information), the HUD is:
|
||||
- **Composed**: Built from components, not a flat log
|
||||
- **Bidirectional**: The agent can update it via tools
|
||||
- **Cache-aware**: Structured so static parts benefit from provider caching
|
||||
- **Adaptive**: Automatically adjusts density based on context pressure
|
||||
|
||||
### The "Agent Frame" Model
|
||||
|
||||
Instead of treating the conversation as an append-only log, think of each agent turn as a **frame** composed of:
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────┐
|
||||
│ System Prompt Part 0 (cached) │ ← Agent identity, core instructions
|
||||
│ [never changes within a session] │
|
||||
├─────────────────────────────────────┤
|
||||
│ System Prompt Part 1 (re-cached) │ ← HUD rendered to markdown
|
||||
│ ┌─────────────────────────────────┐ │
|
||||
│ │ # Task │ │ ← Static HUD (changes on action)
|
||||
│ │ Implement auth module │ │
|
||||
│ │ │ │
|
||||
│ │ ## Context │ │ ← Adaptive density
|
||||
│ │ 67% used (134k/200k tokens) │ │ ← Context status
|
||||
│ │ Trend: growing rapidly │ │
|
||||
│ │ │ │
|
||||
│ │ ## Key Decisions │ │ ← Agent-maintained state
|
||||
│ │ • Using JWT over sessions │ │
|
||||
│ │ • bcrypt for password hashing │ │
|
||||
│ │ │ │
|
||||
│ │ ## Active Files │ │ ← Derived from tool calls
|
||||
│ │ • src/auth/mod.ts (editing) │ │
|
||||
│ │ • src/auth/jwt.ts (referenced) │ │
|
||||
│ │ │ │
|
||||
│ │ ## Next Steps │ │ ← Agent's own plan
|
||||
│ │ 1. Add refresh token rotation │ │
|
||||
│ │ 2. Write auth middleware │ │
|
||||
│ │ 3. Add tests │ │
|
||||
│ │ │ │
|
||||
│ │ ## Notes │ │ ← Agent's scratchpad
|
||||
│ │ • DB schema: users, sessions │ │
|
||||
│ │ • Rate limit: 100/min default │ │
|
||||
│ └─────────────────────────────────┘ │
|
||||
├─────────────────────────────────────┤
|
||||
│ Conversation Messages │ ← Standard message history
|
||||
│ [filtered by compaction boundary] │
|
||||
└─────────────────────────────────────┘
|
||||
```
|
||||
|
||||
### The Append-Only Event Log
|
||||
|
||||
Underlying the HUD is an **append-only event log** — similar to Yjs/CRDT event streams but simpler because we're single-author (one agent per session). This is the "source of truth" that the HUD renders from.
|
||||
|
||||
```typescript
|
||||
interface HudEvent {
|
||||
id: string // UUID
|
||||
type: HudEventType // discriminated union
|
||||
timestamp: number // Unix ms
|
||||
sessionId: string // opencode session ID
|
||||
turn: number // which agent turn
|
||||
payload: unknown // type-specific data
|
||||
}
|
||||
|
||||
type HudEventType =
|
||||
| "task.set" // Agent sets the current task description
|
||||
| "task.update" // Agent refines the task
|
||||
| "decision.record" // Agent records a key decision
|
||||
| "note.add" // Agent adds a note
|
||||
| "note.update" // Agent updates a note
|
||||
| "note.remove" // Agent removes a note
|
||||
| "file.open" // Agent reads a file (derived from tool calls)
|
||||
| "file.edit" // Agent edits a file (derived from tool calls)
|
||||
| "file.close" // File falls out of recent context
|
||||
| "step.complete" // Agent marks a step complete
|
||||
| "step.add" // Agent adds a next step
|
||||
| "step.reorder" // Agent reorders steps
|
||||
| "error.encountered" // Agent encounters an error
|
||||
| "error.resolved" // Agent resolves an error
|
||||
| "blocker.add" // Agent identifies a blocker
|
||||
| "blocker.remove" // Agent removes a blocker
|
||||
| "context.snapshot" // Context window usage snapshot (auto-generated)
|
||||
| "compact.before" // Pre-compaction state snapshot
|
||||
| "compact.after" // Post-compaction state + summary
|
||||
```
|
||||
|
||||
The event log is persisted and append-only. It replaces the "conversation log as only state" model with "conversation log as one input to the HUD, alongside structured state."
|
||||
|
||||
### Rendering Pipeline: UJSX → MDAST → Markdown
|
||||
|
||||
The HUD uses the **UJSX v2** universal JSX IR (see [`ujsx-v2-typebox-rewrite.md`](./ujsx-v2-typebox-rewrite.md)) built on TypeBox schemas. The existing POC at `/workspace/aui` already implements UJSX → mdast → markdown with a `TransformRegistry` and `mdast-util-to-markdown`. The v2 rewrite adds:
|
||||
|
||||
- Actual JSX syntax (via `jsxImportSource: "@ade/ujsx"`)
|
||||
- TypeBox schema-driven node types (schemas ARE types, schemas ARE tool parameter schemas)
|
||||
- Bi-directional transforms (UJSX ↔ mdast, UJSX ↔ hast, etc.)
|
||||
- HTML-agnostic core (no `onClick`, `className` in universal props)
|
||||
|
||||
```
|
||||
HUD Components (.tsx with JSX syntax)
|
||||
│
|
||||
▼
|
||||
h() factory / JSX transform → UElement tree (TypeBox-schema-validated)
|
||||
│
|
||||
▼ TransformRegistry (direction: 'ujsx->mdast')
|
||||
│ Rules match on TypeBox schemas, not string tags
|
||||
│
|
||||
mdast tree
|
||||
│
|
||||
▼ mdast-util-to-markdown + mdast-util-gfm
|
||||
│
|
||||
Markdown String
|
||||
│
|
||||
▼ Injected via experimental.chat.system.transform
|
||||
│
|
||||
System Prompt Part 1
|
||||
```
|
||||
|
||||
**Why UJSX (not Hono JSXNode, not hast→mdast)?**
|
||||
|
||||
1. **Schema-driven**: TypeBox schemas serve triple duty — TypeScript types, runtime validation, and tool parameter schemas. Component props = tool input schemas. Zero duplication.
|
||||
2. **Bi-directional**: Rules convert both UJSX→mdast AND mdast→UJSX. Parse existing markdown (notes, AGENTS.md) back into the IR.
|
||||
3. **HTML-agnostic**: No `onClick`, `className`, `aria-*` in the core. The IR isn't pretending to be HTML.
|
||||
4. **HostConfig preserved**: The same component tree renders to graphology, markdown, or future targets.
|
||||
5. **Actual JSX syntax**: With `jsxImportSource`, write `<TaskSection task={state.task} density="compact" />` not `h('TaskSection', { task: state.task, density: 'compact' })`.
|
||||
|
||||
### HUD Component Architecture
|
||||
|
||||
```tsx
|
||||
// The root HUD component
|
||||
function HUD({ state, contextInfo }: { state: HudState, contextInfo: ContextInfo }) {
|
||||
const density = getDensity(contextInfo.percentage)
|
||||
|
||||
return (
|
||||
<Container>
|
||||
<TaskSection task={state.task} density={density} />
|
||||
<ContextBar info={contextInfo} density={density} />
|
||||
{density !== 'minimal' && <DecisionsList decisions={state.decisions} density={density} />}
|
||||
{density !== 'minimal' && <ActiveFiles files={state.activeFiles} density={density} />}
|
||||
<NextSteps steps={state.nextSteps} density={density} />
|
||||
{density === 'full' && <NotesSection notes={state.notes} />}
|
||||
{contextInfo.percentage > 85 && <WarningBanner info={contextInfo} />}
|
||||
</Container>
|
||||
)
|
||||
}
|
||||
|
||||
// Adaptive density: renders differently based on context pressure
|
||||
type Density = 'full' | 'compact' | 'minimal'
|
||||
|
||||
function getDensity(percentage: number): Density {
|
||||
if (percentage < 70) return 'full'
|
||||
if (percentage < 85) return 'compact'
|
||||
return 'minimal'
|
||||
}
|
||||
|
||||
// Example adaptive component
|
||||
function DecisionsList({ decisions, density }: { decisions: Decision[], density: Density }) {
|
||||
if (density === 'compact') {
|
||||
return <text>{`## Decisions (${decisions.length})\n` + decisions.map(d => `- ${d.summary}`).join('\n')}</text>
|
||||
}
|
||||
return (
|
||||
<section title="Decisions">
|
||||
{decisions.map(d => <DecisionItem decision={d} />)}
|
||||
</section>
|
||||
)
|
||||
}
|
||||
```
|
||||
|
||||
### Cache-Aware System Prompt Structure
|
||||
|
||||
The key extension of OpenCode's 2-part pattern:
|
||||
|
||||
```
|
||||
System Message 0 (cached, stable):
|
||||
- Agent identity prompt
|
||||
- Provider-specific prompt
|
||||
— Never changes within a session
|
||||
|
||||
System Message 1 (re-cached each call):
|
||||
- HUD static layer (task, decisions, next steps)
|
||||
- HUD dynamic layer (context bar, notes, active files)
|
||||
- Environment info, skills, instructions
|
||||
— Changes based on agent actions and context pressure
|
||||
```
|
||||
|
||||
The **static layer** of the HUD should change only when the agent explicitly updates it (task change, decision recorded, step completed). The **dynamic layer** changes every turn (context percentage, recent files, notes).
|
||||
|
||||
For Anthropic's `cache_control`, we mark:
|
||||
- System message 0 → `cacheControl: { type: "ephemeral" }` (already done by opencode)
|
||||
- System message 1 → `cacheControl: { type: "ephemeral" }` (already done by opencode)
|
||||
|
||||
The savings come from keeping message 0 stable (0.1x cost on cache hits) and making message 1 as small as possible while still providing full situational awareness.
|
||||
|
||||
### HUD State and Event Log Storage
|
||||
|
||||
The event log uses the same SQLite database opencode already has, with a new table:
|
||||
|
||||
```sql
|
||||
CREATE TABLE hud_event (
|
||||
id TEXT PRIMARY KEY,
|
||||
session_id TEXT NOT NULL REFERENCES session(id),
|
||||
type TEXT NOT NULL, -- HudEventType
|
||||
turn INTEGER NOT NULL, -- agent turn number
|
||||
timestamp INTEGER NOT NULL, -- Unix ms
|
||||
payload TEXT NOT NULL, -- JSON
|
||||
created_at INTEGER NOT NULL DEFAULT (unixepoch())
|
||||
);
|
||||
|
||||
CREATE INDEX idx_hud_event_session ON hud_event(session_id, timestamp);
|
||||
CREATE INDEX idx_hud_event_type ON hud_event(session_id, type);
|
||||
```
|
||||
|
||||
The **HUD state** is derived from the event log by projectors (similar to opencode's existing event sourcing pattern):
|
||||
|
||||
```typescript
|
||||
interface HudState {
|
||||
task: { description: string; updatedAt: number } | null
|
||||
decisions: Array<{ id: string; summary: string; details: string; recordedAt: number }>
|
||||
notes: Array<{ id: string; content: string; updatedAt: number }>
|
||||
activeFiles: Array<{ path: string; lastAccessed: number; status: 'reading' | 'editing' | 'referenced' }>
|
||||
nextSteps: Array<{ id: string; description: string; completed: boolean; order: number }>
|
||||
blockers: Array<{ id: string; description: string; resolved: boolean }>
|
||||
errors: Array<{ id: string; message: string; resolved: boolean; resolvedAt?: number }>
|
||||
}
|
||||
```
|
||||
|
||||
Projectors are pure functions: `(state, event) => newState`. They're deterministic and idempotent — the state can always be rebuilt from the event log.
|
||||
|
||||
Some events are **derived** rather than agent-authored:
|
||||
- `file.open`, `file.edit`, `file.close`: Intercepted from opencode's `tool.execute.before/after` hook by watching for Read, Write, Edit tools
|
||||
- `context.snapshot`: Auto-generated from `SessionProcessor` events tracking token usage
|
||||
|
||||
### HUD Tools (Agent-Facing)
|
||||
|
||||
The agent interacts with the HUD through two tools (following the router pattern from open-memory):
|
||||
|
||||
**`hud`** (router tool — reduces context bloat from tool definitions):
|
||||
|
||||
```typescript
|
||||
hud(input: {
|
||||
tool: string, // operation name
|
||||
args?: object // operation arguments
|
||||
})
|
||||
```
|
||||
|
||||
Operations:
|
||||
- `task.get` / `task.set` / `task.update` — Manage current task
|
||||
- `decisions.list` / `decisions.record` / `decisions.remove` — Key decisions log
|
||||
- `notes.list` / `notes.add` / `notes.update` / `notes.remove` — Scratchpad
|
||||
- `steps.list` / `steps.add` / `steps.complete` / `steps.reorder` — Next steps / plan
|
||||
- `blockers.list` / `blockers.add` / `blockers.remove` — Blockers
|
||||
- `snapshot` — Full HUD state
|
||||
- `history` — Recent event log (for understanding what changed)
|
||||
|
||||
**`hud_compact`** (mutation tool — separate to prevent accidental use):
|
||||
|
||||
```typescript
|
||||
hud_compact() // Triggers both HUD compaction AND opencode compaction
|
||||
```
|
||||
|
||||
This is distinct from `memory_compact` because:
|
||||
1. It snapshots HUD state to the event log before compaction
|
||||
2. After compaction, the stable HUD layer carries forward — the agent doesn't lose its task, decisions, or notes
|
||||
3. It can trigger compaction at a natural breakpoint (when the agent says "next steps updated, good time to compact")
|
||||
|
||||
### Plugin Integration (opencode)
|
||||
|
||||
The HUD is implemented as an opencode plugin, using these hooks:
|
||||
|
||||
```typescript
|
||||
const HUDPlugin: Plugin = async (ctx) => {
|
||||
const stateManager = new HudStateManager(ctx) // Manages event log + projectors
|
||||
const renderer = new HudRenderer() // JSX component → markdown
|
||||
|
||||
return {
|
||||
tool: { hud: createHudTool(ctx, stateManager), hud_compact: createHudCompactTool(ctx, stateManager) },
|
||||
|
||||
"experimental.chat.system.transform": async (input, output) => {
|
||||
// Render HUD and inject as system prompt
|
||||
const state = stateManager.getState(input.sessionID)
|
||||
const contextInfo = getContextInfo(input.sessionID)
|
||||
const markdown = renderer.render(HUD, { state, contextInfo })
|
||||
output.system.push(markdown)
|
||||
},
|
||||
|
||||
"experimental.session.compacting": async (input, output) => {
|
||||
// Before compaction: snapshot HUD state to event log
|
||||
stateManager.recordEvent(input.sessionID, { type: "compact.before", payload: stateManager.getState(input.sessionID) })
|
||||
// Replace compaction prompt with self-continuity + HUD-aware prompt
|
||||
output.prompt = getCompactionPrompt(stateManager.getState(input.sessionID))
|
||||
},
|
||||
|
||||
"event": async ({ event }) => {
|
||||
// Derive file events from tool calls
|
||||
if (event.type === "tool.execute.after") {
|
||||
stateManager.maybeRecordFileEvent(event)
|
||||
}
|
||||
// Track context snapshots
|
||||
if (event.type === "message.updated") {
|
||||
stateManager.maybeRecordContextSnapshot(event)
|
||||
}
|
||||
},
|
||||
|
||||
"tool.execute.before": async (input, output) => {
|
||||
// Track file access from tool calls
|
||||
stateManager.trackToolCall(input)
|
||||
},
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### Rendering Pipeline Implementation
|
||||
|
||||
```typescript
|
||||
// core/renderer.ts
|
||||
|
||||
import type { FC } from "hono/jsx"
|
||||
|
||||
interface RenderContext {
|
||||
density: Density
|
||||
state: HudState
|
||||
contextInfo: ContextInfo
|
||||
}
|
||||
|
||||
class HudRenderer {
|
||||
// Custom walker over Hono JSXNode tree
|
||||
render(component: FC<any>, props: Record<string, any>): string {
|
||||
const node = component(props)
|
||||
return this.walkNode(node)
|
||||
}
|
||||
|
||||
private walkNode(node: any): string {
|
||||
if (typeof node === "string" || typeof node === "number") {
|
||||
return String(node)
|
||||
}
|
||||
if (node === null || node === undefined || node === false) {
|
||||
return ""
|
||||
}
|
||||
if (node instanceof Promise) {
|
||||
throw new Error("Async components not supported in HUD rendering")
|
||||
}
|
||||
|
||||
// JSXFragmentNode — just concatenate children
|
||||
if (node.tag === null || node.tag === Symbol.for("hono.fragment")) {
|
||||
return (node.children as any[]).map(c => this.walkNode(c)).join("\n")
|
||||
}
|
||||
|
||||
// Function component — call it and walk the result
|
||||
if (typeof node.tag === "function") {
|
||||
const result = node.tag({ ...node.props, children: node.children })
|
||||
return this.walkNode(result)
|
||||
}
|
||||
|
||||
// Intrinsic element — map HTML tag to markdown
|
||||
return this.renderElement(node.tag, node.props, node.children)
|
||||
}
|
||||
|
||||
private renderElement(tag: string, props: any, children: any[]): string {
|
||||
const childContent = (children || []).map(c => this.walkNode(c)).join("\n")
|
||||
|
||||
switch (tag) {
|
||||
case "h1": return `# ${childContent}`
|
||||
case "h2": return `## ${childContent}`
|
||||
case "h3": return `### ${childContent}`
|
||||
case "strong": case "b": return `**${childContent}**`
|
||||
case "em": case "i": return `*${childContent}*`
|
||||
case "code": return props.lang
|
||||
? `\`\`\`${props.lang}\n${childContent}\n\`\`\``
|
||||
: `\`${childContent}\``
|
||||
case "ul": return childContent
|
||||
case "li": return `- ${childContent}`
|
||||
case "ol": return childContent // handled by parent
|
||||
case "p": return childContent
|
||||
case "div": case "section": case "article": case "main": case "header":
|
||||
case "footer": case "nav": case "aside": case "span":
|
||||
return childContent // container — just pass through content
|
||||
case "a": return `[${childContent}](${props.href})`
|
||||
case "blockquote": return childContent.split("\n").map(l => `> ${l}`).join("\n")
|
||||
case "hr": return "---"
|
||||
case "br": return "\n"
|
||||
case "pre": return childContent // content already formatted by <code>
|
||||
case "table": return this.renderTable(props, children)
|
||||
default:
|
||||
// Unknown/custom tags: use data-md attribute or just render content
|
||||
if (props?.["data-md"]) {
|
||||
// Custom markdown rendering hint
|
||||
return this.renderCustomMd(props["data-md"], props, childContent)
|
||||
}
|
||||
return childContent
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### Adaptive Density in Practice
|
||||
|
||||
The key insight from open-memory's context status thresholding: **information density should be proportional to context pressure.**
|
||||
|
||||
```
|
||||
Context < 70% (GREEN / "full"):
|
||||
┌────────────────────────────────────┐
|
||||
│ # Task │
|
||||
│ Implement user authentication │
|
||||
│ │
|
||||
│ ## Context: 45% (90k/200k) stable │
|
||||
│ │
|
||||
│ ## Decisions (3) │
|
||||
│ • Using JWT over sessions │
|
||||
│ • bcrypt for password hashing │
|
||||
│ • Rate limiting: 100/min default │
|
||||
│ │
|
||||
│ ## Active Files │
|
||||
│ • src/auth/mod.ts (editing) │
|
||||
│ • src/auth/jwt.ts (referenced) │
|
||||
│ • src/db/schema.ts (referenced) │
|
||||
│ │
|
||||
│ ## Next Steps │
|
||||
│ 1. ~~Add refresh token rotation~~ │
|
||||
│ 2. Write auth middleware │
|
||||
│ 3. Add tests │
|
||||
│ │
|
||||
│ ## Notes │
|
||||
│ • DB schema: users, sessions │
|
||||
│ • Env vars: JWT_SECRET, DB_URL │
|
||||
└────────────────────────────────────┘
|
||||
~300-500 tokens
|
||||
|
||||
Context 70-85% (YELLOW / "compact"):
|
||||
┌────────────────────────────────────┐
|
||||
│ Task: Implement auth │ 72% (144k) ↑ │
|
||||
│ Decisions: JWT, bcrypt, 100/min │
|
||||
│ Steps: ~~rotation~~, middleware, │
|
||||
│ tests │
|
||||
│ Files: auth/mod.ts, auth/jwt.ts │
|
||||
│ Notes: DB schema users/sessions │
|
||||
└────────────────────────────────────┘
|
||||
~100-150 tokens
|
||||
|
||||
Context 85-92% (RED / "minimal"):
|
||||
┌────────────────────────────────────┐
|
||||
│ Auth impl │ 89% │ ⚠ compact soon │
|
||||
│ JWT+bcrypt, step: middleware │
|
||||
└────────────────────────────────────┘
|
||||
~30-50 tokens
|
||||
```
|
||||
|
||||
This adaptive compression means the agent always has situational awareness, but the cost scales inversely with available context.
|
||||
|
||||
### Compaction That Preserves State
|
||||
|
||||
The critical difference from current compaction: **the HUD state survives compaction.**
|
||||
|
||||
Current flow:
|
||||
```
|
||||
[conversation] → COMPACT → [summary replaces everything] → agent loses context
|
||||
```
|
||||
|
||||
HUD-aware flow:
|
||||
```
|
||||
[event log] → project to HUD state → render HUD → [inject into system prompt]
|
||||
[conversation] → COMPACT → [summary replaces conversation, but HUD persisted state carries forward]
|
||||
```
|
||||
|
||||
Before compaction:
|
||||
1. Agent records `compact.before` event with full HUD state
|
||||
2. HUD state is persisted to the event log (already append-only)
|
||||
3. Compaction prompt includes: "Your HUD state: {rendered HUD}. This will survive compaction."
|
||||
|
||||
After compaction:
|
||||
1. `filterCompacted` drops pre-compaction messages
|
||||
2. But the system prompt still contains the fully rendered HUD
|
||||
3. The agent sees its task, decisions, notes, and next steps — not just a narrative summary
|
||||
|
||||
The compaction summary becomes a **complement** to the HUD, not a **replacement** for lost context. The summary handles conversational continuity ("we were discussing..."), while the HUD provides structured persistent state.
|
||||
|
||||
### Comparison: Current vs HUD Architecture
|
||||
|
||||
| Aspect | Current (Compaction) | HUD Architecture |
|
||||
|--------|---------------------|------------------|
|
||||
| State management | Monolithic conversation log | Structured event log + HUD projection |
|
||||
| Context loss | All-or-nothing compaction cliff | Adaptive density that preserves key state |
|
||||
| Agent awareness | Must call memory tool | Ambient via system prompt injection |
|
||||
| Cache optimization | 2-part system prompt only | Multi-part with stable HUD layer |
|
||||
| Recovery from compaction | LLM-generated summary (lossy) | Structured HUD state (deterministic) |
|
||||
| Cognitive load | Full conversation always visible | Relevant state always visible, details on demand |
|
||||
| Token cost at 50% | Full conversation (~100k tokens of history) | Conversation + HUD (~100k + ~300 tokens) |
|
||||
| Token cost at 90% | Conversation (truncated by compaction, then grows again) | Conversation + compact HUD (truncated + ~50 tokens) |
|
||||
|
||||
---
|
||||
|
||||
## Implementation Plan
|
||||
|
||||
### Phase 1: Minimal Viable HUD (Plugin)
|
||||
|
||||
1. **Event log table**: Add `hud_event` table to opencode's SQLite database (via plugin)
|
||||
2. **Basic event types**: `task.set`, `task.update`, `decision.record`, `note.add`, `step.add`, `step.complete`
|
||||
3. **State projector**: Pure functions that fold events into `HudState`
|
||||
4. **Simple renderer**: Template-based markdown rendering (no JSX yet)
|
||||
5. **Plugin hooks**: `system.transform` for injection, `event` for derivation, `session.compacting` for pre-compaction snapshot
|
||||
6. **Two tools**: `hud` (router) and `hud_compact`
|
||||
|
||||
### Phase 2: JSX Rendering Pipeline
|
||||
|
||||
1. **Hono JSXNode walking**: Custom ` HudRenderer` that walks JSXNode trees directly to markdown
|
||||
2. **HUD components**: `<HUD>`, `<TaskSection>`, `<ContextBar>`, `<DecisionsList>`, `<NotesSection>`, `<NextSteps>`
|
||||
3. **Adaptive density**: Components that render differently based on `density` prop
|
||||
4. **Test rendering**: Snapshot tests comparing component output to expected markdown
|
||||
|
||||
### Phase 3: Cache Optimization
|
||||
|
||||
1. **HUD layer splitting**: Separate HUD into static layer (task, decisions, next steps) and dynamic layer (context bar, notes, active files)
|
||||
2. **Static layer diffing**: Only push to `output.system[]` when static content changes, reducing cache misses
|
||||
3. **Dynamic layer minimal updates**: Track what changed since last render, include only deltas
|
||||
4. **Measure cache hit rates**: Instrument provider token usage to verify caching improvements
|
||||
|
||||
### Phase 4: Advanced Features
|
||||
|
||||
1. **Derived events**: File tracking from tool calls, context snapshots from message events
|
||||
2. **Search integration**: `hud.search` operation using FTS5 instead of LIKE
|
||||
3. **Cross-session HUD**: Persist HUD state across sessions, enable resuming tasks
|
||||
4. **QuickJS scripting**: Use toolEnv's envProxy pattern to let agents customize their HUD components
|
||||
5. **Multi-agent HUD**: When `coord.spawn` creates sub-agents, share HUD state via parent session events
|
||||
|
||||
---
|
||||
|
||||
## Key Design Decisions
|
||||
|
||||
### Why event log over direct state mutation?
|
||||
|
||||
1. **Auditability**: Every state change has a trace. You can reconstruct the HUD at any point in time.
|
||||
2. **Compaction resilience**: Events survive compaction because they're in a separate table, not the conversation message stream.
|
||||
3. **No conflict resolution needed**: Unlike CRDTs, we're single-author (one agent per session). The event log is append-only with no merge conflicts.
|
||||
4. **Projection flexibility**: Different projectors can derive different views from the same events. The HUD is one projection; a task progress tracker could be another.
|
||||
|
||||
### Why JSX over Handlebars/templates?
|
||||
|
||||
1. **Component composition**: JSX supports arbitrary nesting and composition. `<CompactView>` can wrap `<DecisionsList>` with different rendering logic.
|
||||
2. **Conditional rendering**: `{density === 'full' && <NotesSection />}` is cleaner than `{{#if density.full}}...{{/if}}`.
|
||||
3. **Type safety**: Components are typed functions. Props, state, density — all compile-time checked.
|
||||
4. **Developer familiarity**: React-like patterns are widely understood.
|
||||
5. **Direct JSXNode walking avoids HTML roundtrip**: No `renderToStaticMarkup → parse → hast → mdast → markdown` needed.
|
||||
|
||||
### Why system prompt injection instead of a dedicated message role?
|
||||
|
||||
1. **Cache alignment**: OpenCode already manages the system prompt as a cache-aware structure. Injecting into `output.system[]` gives us immediate cache optimization.
|
||||
2. **No protocol change**: We don't need to change opencode's core messaging protocol. The plugin hook is sufficient.
|
||||
3. **Survives compaction**: System prompt is always included (it's never compacted). The HUD is always visible.
|
||||
4. **Provider compatibility**: System prompts work with every LLM provider. A custom message role might not.
|
||||
|
||||
### Why router pattern (1 tool) over separate tools?
|
||||
|
||||
From open-memory's research: each tool definition adds ~1-2k tokens to the system prompt. With 10+ HUD operations, that's 10-20k tokens of overhead. A single `hud` router tool costs ~2k tokens regardless of how many operations it supports. The `help` operation provides inline documentation.
|
||||
|
||||
---
|
||||
|
||||
## File Structure (Proposed)
|
||||
|
||||
```
|
||||
packages/hud-plugin/
|
||||
src/
|
||||
index.ts # Plugin entry point
|
||||
state/
|
||||
types.ts # HudState, HudEvent types
|
||||
projector.ts # Event → State projectors
|
||||
store.ts # SQLite read/write for hud_event
|
||||
renderer/
|
||||
components/
|
||||
HUD.tsx # Root HUD component
|
||||
TaskSection.tsx # Task display
|
||||
ContextBar.tsx # Context percentage bar
|
||||
DecisionsList.tsx # Key decisions
|
||||
NotesSection.tsx # Scratchpad
|
||||
NextSteps.tsx # Plan/next steps
|
||||
ActiveFiles.tsx # Currently active files
|
||||
WarningBanner.tsx # Context pressure warning
|
||||
renderer.ts # JSXNode → markdown walker
|
||||
density.ts # Adaptive density logic
|
||||
tools/
|
||||
hud.ts # hud router tool definition
|
||||
hud_compact.ts # hud_compact tool definition
|
||||
operations/ # Individual operations
|
||||
task.ts
|
||||
decisions.ts
|
||||
notes.ts
|
||||
steps.ts
|
||||
files.ts
|
||||
snapshot.ts
|
||||
history.ts
|
||||
hooks/
|
||||
system-transform.ts # experimental.chat.system.transform
|
||||
compacting.ts # experimental.session.compacting
|
||||
event.ts # event derivation (file tracking, context snapshots)
|
||||
context/
|
||||
tracker.ts # Context window tracking (from open-memory)
|
||||
thresholds.ts # Density thresholds
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Risks and Open Questions
|
||||
|
||||
1. **Token overhead at low context usage**: Even "minimal" HUD adds ~30-50 tokens. Is 0% context worth the overhead? Probably yes — the ROI is in the 70%+ range where the HUD prevents catastrophic compaction. At 0%, the full HUD costs ~300-500 tokens, which is ~0.25% of a 200k context. The payoff is avoiding a compaction event that loses the entire conversation.
|
||||
|
||||
2. **Agent compliance**: Will agents consistently use `hud` tools to update their state? The current approach relies on the agent choosing to call `hud` tools. Alternatives:
|
||||
- **Auto-derivation**: More events derived automatically from tool calls (file tracking is automatic, but decisions/notes require agent action)
|
||||
- **AGENTS.md prompt**: Include HUD tool usage in instructions
|
||||
- **Conventional prompting**: "Always update your HUD after completing a step or making a decision" in the compaction/system prompt
|
||||
|
||||
3. **Stale state**: If the agent doesn't update the HUD before compaction, the HUD might be stale. Mitigation: auto-snapshot HUD state before compaction in the `session.compacting` hook.
|
||||
|
||||
4. **JSX dependency weight**: Hono's JSX runtime adds a dependency. Alternative: use a lightweight custom JSX transform that doesn't need Hono. The renderer only needs `JSXNode` types and the walker — not the full Hono framework.
|
||||
|
||||
5. **Multi-agent sessions**: When sub-agents are spawned, should they share HUD state? The event log is per-session, but parent-child relationships in opencode could enable HUD state inheritance.
|
||||
|
||||
6. **Search vs. structured state**: Should agents search conversation history (like `memory({tool: "search"})`) or maintain structured state (like `hud({tool: "decisions.record"})`)? Both. The HUD is for state the agent actively maintains. Search is for recovering information the agent didn't think to record. They complement each other.
|
||||
|
||||
7. **Event log growth**: The `hud_event` table will grow. Needs a cleanup strategy — perhaps tied to compaction events (archive events older than the last compaction point).
|
||||
Reference in New Issue
Block a user