Roadmap
Implementation plan and feature backlog. Click status icons to cycle: planned → in progress → done.
Add tables: - session_logs: id, session_id, message_type, role, content jsonb, tool_name, file_paths text[], created_at - session_chunks: id, session_id, chunk_type (fix/implementation/decision/discovery), summary text, content text, file_paths text[], tool_names text[], embedding vector(1536), metadata jsonb, created_at Enable pgvector extension if not already enabled.
CONFIRMED: LiteLLM already stores everything we need. - proxy_server_request::jsonb->'messages' = full prompt messages (user + assistant turns) - response::jsonb = full response with choices, content, tool_use - 10K+ pass-through requests with Claude models - prompt_tokens, completion_tokens, spend, model all tracked Query pattern: SELECT proxy_server_request::jsonb->'messages', response::jsonb, model, spend FROM LiteLLM_SpendLogs WHERE call_type='pass_through_endpoint' AND status='success' Connection: SSH tunnel to REDIRECTOR or sidecar on FORBIDDEN server. Read-only user: litellm_readonly. NO NEED to capture in SDK bridge for persistence — LiteLLM is source of truth.
Process session_logs into session_chunks. Chunk types: - "fix": error → investigation → solution cycle - "implementation": file create/edit tool calls with context - "decision": when agent chose between approaches - "discovery": exploration that led to useful findings Extract file_paths and tool_names from tool_use blocks. Generate a human-readable summary per chunk.
Options (choose one): 1. LiteLLM proxy → embedding model (uses existing infrastructure) 2. Local Ollama with nomic-embed-text (zero API cost, works offline) 3. Supabase Edge Function triggered on insert Embed the summary + content of each chunk → store in embedding column. Batch process on session completion, not per-message.
GET /api/knowledge/search?q=...&limit=10&chunk_type=fix 1. Embed the query text 2. Vector similarity search on session_chunks (cosine distance) 3. Combine with full-text search (PostgreSQL tsvector) for keyword matching 4. Return ranked results with session context, file paths, summaries 5. Optional filters: chunk_type, date range, file path pattern
Replace cm CLI backend in /api/cass/* routes with our session_chunks search. Same search UI (query, filters, results cards) but powered by our own pgvector. Add chunk_type filter, file path filter, date range. Show session context (what was the agent working on).
In singleton.ts onResult callback: 1. Collect all messages for the completed session 2. Run chunker 3. Generate embeddings 4. Store in session_chunks Background job — don't block the result callback.
LiteLLM at https://litellm.301.ovh, DB on REDIRECTOR (162.19.221.34). Options: 1. LiteLLM REST API: GET /spend/logs (if exposed) 2. Direct PostgreSQL read (via sidecar or connection string) 3. Replicate relevant tables to our Supabase Decision: which approach fits the zero-trust architecture? The sidecar pattern from CLAUDE.md may apply here.
Replace in-memory cost tracking with LiteLLM data for the Usage page. LiteLLM_DailyUserSpend has: date, model, tokens, spend — aggregated daily. This gives us historical data we don't have in-memory. The costs/usage page reads from LiteLLM instead of our cost-sink.
LiteLLM_SpendLogs stores full messages and response JSON. If "Store Prompts in Spend Logs" is enabled in LiteLLM: → We can read session data from LiteLLM instead of capturing ourselves → This avoids duplicate storage → Knowledge base chunker reads from LiteLLM as source If disabled: we capture in SDK bridge and LiteLLM only has metadata. Check current LiteLLM config to decide.
DECISION MADE: LiteLLM is source of truth for ALL persistent data: - Full prompts (proxy_server_request->'messages') - Full responses (response jsonb) - Tokens, cost, model, timestamps - 30K+ historical rows ready to mine SDK bridge handles ONLY real-time: - WebSocket session management (browser ↔ CLI) - Permission approvals - Live streaming - In-memory session state (not persisted) Knowledge base reads from LiteLLM → chunks → embeds → stores in Supabase. Cost-sink in-memory data is for current-session display only. No double data. No duplicate storage.
New page or mode on the Sessions page. Grid of session panels (2x2, 3x2, etc.) — auto-layout based on session count. Each panel: streaming output, status bar, context %, message input. Use CSS Grid or a resizable panel library.
useSdkSession hook already handles one session. Create useMultiSession hook that manages N WebSocket connections. Each panel gets its own connection to /ws/browser/:sessionId. Shared state for cross-session features (batch permissions, conflict detection).
When multiple sessions request permissions: - "Allow all Read tool" button - "Allow all for session X" button - Per-tool-type batch approval - Permission queue showing all pending across sessions
Track which files each session is editing (from tool_use Edit/Write blocks). If two sessions touch the same file → flash warning in both panels. Optional: auto-pause second session, notify user.
Send a message to all running sessions at once. Use case: "stop, we're changing approach" or "pull latest from main". UI: text input at top of multi-pane view, sends to all connected sessions.
On the PRD page, after uploading/pasting PRD: - Send PRD content to Claude via SDK session - Parse response into epics + stories - Store in Supabase epics/stories tables - Show on kanban board automatically
From kanban or command centre: - Select stories to work on - Configure: model, permission mode, working directory - "Launch" → spawns one SDK session per story - Story description becomes the initial prompt - Multi-pane view opens automatically
When SDK session completes successfully → story moves to "Review" When session errors → story stays "In Progress" with error note When user manually approves → story moves to "Done" Links session_id to story for traceability.
MAX subscription = fixed monthly fee, so cost per token is irrelevant. Replace dollar amounts with volume metrics: - Token volume (input/output/total) - Session count and duration - Model distribution (Opus vs Sonnet vs Haiku) - Tool call frequency breakdown - Context window usage (% per session, warnings for >80%) - Lines changed (productivity metric)
Read aggregated daily metrics from LiteLLM instead of in-memory cost-sink. Gives us historical data across sessions and server restarts. Falls back to SDK bridge data for current session.
Per CLAUDE.md rules, submit deployment request: - App: fccf-dashboard - Subdomain: fccf.402.ovh (or similar) - Services needed: PostgreSQL (Supabase external), LiteLLM access - Environment: staging (PRE-MODULATOR first) Admin generates TAR with docker-compose + Vault sidecar.
The SDK bridge spawns claude CLI processes. Server needs: - Claude CLI installed and in PATH - Authenticated (API key via Vault sidecar) - ANTHROPIC_API_KEY provided by Vault at runtime - Or route through LiteLLM proxy (claude --base-url)
Two options: 1. Install cm binary on server for CASS integration 2. Skip CASS entirely — our knowledge base replaces it If knowledge base is built first, CASS becomes optional.
Separate from the SDK bridge. Watches for completed sessions. On session complete: 1. Fetch all messages (from session_logs or LiteLLM) 2. Run chunker 3. Generate embeddings 4. Store in session_chunks Could be a Bun script, cron job, or Supabase Edge Function trigger.
LiteLLM_SpendLogs has historical sessions. Run chunker + embeddings on past data to bootstrap the knowledge base. One-time batch job, then ongoing auto-index handles new sessions.
Store timestamped message stream per session. Replay UI: play/pause, speed control, jump to tool calls. Useful for reviewing what an agent did, learning, auditing.
Save named configurations: - "Frontend agent": cwd, model, permission mode, allowed tools - "Test runner": different config - "Researcher": plan mode only One-click spawn from template. Store in Supabase settings.
SDK bridge WebSocket supports multiple browser connections per session. Multiple people can watch the same agent working simultaneously. Each can send messages / approve permissions. Shared cost tracking in real-time. Already architecturally supported — just needs UI indicator showing connected viewers.
Keep cost metric but reframe: "API equivalent cost" not "your bill" (MAX = fixed fee). Show "You saved $X this month with MAX vs API pricing" as insight card. Read spend data from LiteLLM_DailyUserSpend (already aggregated per model per day). Also show volume metrics: tokens, sessions, model distribution, tool call frequency. Route LiteLLM spend data to the existing costs page cards.
Current sidebar has legacy items: - "NTM" → rename to "Sessions" - "Cost" → rename to "Usage" or merge with new costs page - "CASS" → rename to "Knowledge" when own KB is ready - "Beads" → keep for now but de-emphasize (kanban uses stories) - "RAM" → clarify what this is (currently links to /mail) - Add "Roadmap" link