Skip to main content

Observability

7 min read

DevTools

A transport-agnostic debugging interface for agent orchestration with 12 specialized views.

The DevTools server (@directive-run/ai) bridges your orchestrator's timeline, health, breakpoints, and state into a visual debugging interface via WebSocket, SSE, or any custom transport.

Directive DevTools
ws://localhost:4040
FlamechartGraphGoalHealthBreakpointsCompare
researcher
writer
reviewer
pattern
0s1s2s3s4s5s
10 events | 550 tokens | 5.2sDemo session (pre-loaded)

Setup

Server

import { connectDevTools } from '@directive-run/ai';

const orchestrator = createMultiAgentOrchestrator({
  runner,
  agents: { /* ... */ },
  debug: true,
});

// One-liner – creates WebSocket server and wires everything up
const server = await connectDevTools(orchestrator, { port: 4040 });

console.log(`DevTools server on ws://localhost:${4040}`);

Or wire up manually for full control:

import { createDevToolsServer, createWsTransport } from '@directive-run/ai';

const transport = await createWsTransport({ port: 4040 });

const server = createDevToolsServer({
  transport,
  timeline: orchestrator.timeline!,
  healthMonitor: orchestrator.healthMonitor,
  getSnapshot: () => buildSnapshot(orchestrator),
  getBreakpointState: () => orchestrator.getPendingBreakpoints(),
  onResumeBreakpoint: (id, mods) => orchestrator.resumeBreakpoint(id, mods),
  onCancelBreakpoint: (id, reason) => orchestrator.cancelBreakpoint(id, reason),
  getScratchpadState: () => orchestrator.scratchpad?.getAll() ?? {},
  getDerivedState: () => orchestrator.derived ?? {},
  maxClients: 50,
  batchSize: 1,
  batchIntervalMs: 50,
});

Configuration

OptionTypeDefaultDescription
portnumber4040WebSocket server port
maxClientsnumber50Maximum concurrent DevTools clients
batchSizenumber1Events per batch message
batchIntervalMsnumber50Batch flush interval (ms)

Views

The DevTools UI has 12 specialized views, accessible as tabs. A time format selector (ms / elapsed / clock) applies across all views.

1. Timeline

Horizontal lanes per agent with bar-per-event rendering and row packing to prevent overlap.

Filtering:

  • Agent filter chips – show/hide specific agents
  • Event type filter chips – filter by event type
  • Regex search across all event properties (150ms debounce, ReDoS-safe)
  • Error-only quick filter – show only error events
  • AND/OR filter mode toggle – combine filters with intersection or union

Navigation:

  • Zoom (1x–20x) with Ctrl+Scroll
  • Pan with click-and-drag (grab cursor when zoomed)
  • Canvas minimap for navigation (high-DPI, click-to-pan)
  • Time axis labels with configurable format

Live features:

  • Replay cursor line (red vertical) for stepping through events
  • Anomaly highlighting with red rings on anomalous events
  • Live token streaming panel – per-agent token preview (up to 500 chars) with count
  • Pause/resume button with pending event count badge

2. Cost

Token usage and estimated cost breakdown.

  • Total tokens and estimated cost ($0.01/1K tokens)
  • Stacked bar chart per agent with hover tooltips (golden-angle hue for unlimited agents)
  • Cost breakdown table: Agent, Runs, Total Tokens, Avg Tokens, Duration, % of Total
  • Sorted by highest token usage

3. State

Two sub-tabs with key count badges: Scratchpad and Derived.

  • Key-value display with syntax highlighting and search/filter
  • Live updates as values change
  • Refresh button with 600ms debounce feedback
  • "Edit & Fork" button – modify state values and fork the timeline from that point

Roadmap

These views are planned for future releases. The server protocol and data structures are in place — the rendering is not yet built.

Flamechart

Hierarchical flame graph visualization. Pairs start/end events into nested bars at three depth levels: Patterns → Agents → Resolvers.

  • Performance summary – total duration, critical path, parallelism ratio, slowest agent
  • Zoom and pan (shared 1x–20x zoom with Ctrl+Scroll)
  • Canvas minimap with viewport rectangle
  • Hover for tooltips with type, agent, and duration
  • Click to select and view detail panel (type, agent, duration, tokens, depth)
  • Point events (0ms) shown as thin vertical lines
  • Unclosed spans marked "(running)"

DAG

Directed acyclic graph using React Flow.

  • Topological layout with animated edges during execution
  • Click nodes for detail panel (status, tokens, run count)
  • Cycle detection with visual indication

Health

Agent health monitoring cards.

  • Circuit state indicator (closed/open/half-open)
  • Success rate percentage
  • Average latency
  • Health score (color-coded: green ≥ 70, amber ≥ 40, red < 40)
  • Summary stats across all agents
  • Token usage chart
  • Reroute event log

Breakpoints

Interactive breakpoint management.

  • Pending breakpoints list
  • Per-breakpoint cards with input modification and skip toggle
  • "Resume All" button
  • Resolved/cancelled history

Compare

Side-by-side comparison of saved session runs.

  • Run selectors (dropdown) with stale-selection cleanup when runs are deleted
  • Summary stats comparison (events, tokens, duration, agents)
  • Agent comparison table with color-coded deltas (red = increase, green = decrease)
  • Event type breakdown chart (stacked bars)
  • Mini timeline bars
  • Diff summary with delta calculations

Event Detail Panel

Clicking any event in the Timeline or Flamechart opens a detail panel (right sidebar, 320px). Press Escape to close.

Features:

  • Prompt/Completion viewer – Tabbed input/output display with token counts (inputTokens, outputTokens, totalTokens)
  • Copy to clipboard – Copy event ID or full event JSON
  • Replay from here – Start replay from the selected event
  • Fork from snapshot – Fork the timeline at this event's snapshot (with confirmation dialog)
  • Property rendering – Syntax-highlighted values (booleans, numbers, strings, objects) with depth-limiting
  • String expansion – "Show more/less" toggle for truncated content (>200 chars)

Replay Mode

Step through recorded events with playback controls. Uses frame-skipping to maintain real-time accuracy at faster speeds.

Controls:

  • Play/Pause (Space)
  • Step forward/backward (Arrow keys)
  • Seek to any position (cursor slider)
  • Jump to start/end (Home/End)
  • Exit replay (Escape)
  • Speed: 1x, 2x, 5x, 10x
  • Replay from event – right-click or use "Replay from here" in the detail panel

Anomaly Detection

The DevTools automatically detects anomalies in agent execution:

SeverityExamples
CriticalAgent errors, guardrail rejections
WarningRetries, duration outliers (>2x mean), token spikes (>2x mean)
InfoReroutes, circuit breaker state changes

Anomalies are highlighted in the timeline view and can be filtered.


Session Management

  • Export JSON – Save a session to JSON with version and timestamp metadata
  • Export HTML – Generate a standalone HTML trace viewer (no dependencies, no WebSocket – share with anyone)
  • Import – Load a saved session for replay (validates event types and structure, 50MB limit)
  • Auto-save – Toggle automatic saving to localStorage (5-second debounce, up to 5 runs, 10MB limit)
  • Compare – Save multiple runs and compare them side-by-side in the Compare view
  • Fork – Truncate timeline to a past point, optionally edit state, and replay from there

WebSocket Protocol

Server → Client

MessageDescription
welcomeConnection established
event / event_batchTimeline events
snapshotFull orchestrator state snapshot
healthAgent health data
breakpointsPending breakpoint state
scratchpad_state / scratchpad_updateScratchpad data
derived_state / derived_updateDerived values
token_stream / stream_doneLive token streaming
fork_completeFork operation completed
errorServer error
pongKeepalive response

Client → Server

MessageDescription
request_snapshotRequest current state
request_healthRequest health data
request_eventsRequest event history
request_breakpointsRequest breakpoint state
request_scratchpadRequest scratchpad state
request_derivedRequest derived values
resume_breakpointResume a paused breakpoint
cancel_breakpointCancel a paused breakpoint
fork_from_snapshotFork timeline at a snapshot
export_session / import_sessionSession persistence
pingKeepalive ping

Supported Event Types

The DevTools UI recognizes 25 event types grouped by category:

CategoryEvent Types
Agent lifecycleagent_start, agent_complete, agent_error, agent_retry
Constraintsconstraint_evaluate, resolver_start, resolver_complete, resolver_error
Governanceguardrail_check, approval_request, approval_response, breakpoint_hit, breakpoint_resumed
Patternspattern_start, pattern_complete, race_start, race_winner, race_cancelled, debate_round, reflection_iteration
Statederivation_update, scratchpad_update
Checkpointscheckpoint_save, checkpoint_restore
Infrastructurehandoff_start, handoff_complete, reroute, dag_node_update

Connection Details

  • Auto-reconnect: Exponential backoff up to 30s, max 20 attempts
  • Keepalive: Ping every 30 seconds
  • Event buffer: Max 5,000 events in memory, requestAnimationFrame flushing
  • Token streaming: Buffers up to 10KB per agent, 50 concurrent agents max, 5-minute inactivity timeout
  • Prototype pollution defense: __proto__, constructor, prototype blocked on all inbound messages
  • Input validation: All server messages validated against typed discriminator union before processing

DevToolsServer API

interface DevToolsServer {
  clientCount: number;
  broadcast(message: DevToolsServerMessage): void;
  pushHealth(): void;
  pushBreakpoints(): void;
  pushScratchpadUpdate(key: string, value: unknown): void;
  pushDerivedUpdate(id: string, value: unknown): void;
  pushTokenStream(agentId: string, tokens: string, tokenCount: number): void;
  pushStreamDone(agentId: string, totalTokens: number): void;
  close(): void;
}

  • DevTools Plugin — Console API and floating panel for debugging any Directive system's facts, derivations, and events.
  • DevTools Live — Try the visual debugger in your browser.

Next Steps

Previous
Breakpoints & Checkpoints

We care about your data. We'll never share your email.

Powered by Directive. This signup uses a Directive module with facts, derivations, constraints, and resolvers – zero useState, zero useEffect. Read how it works

Directive - Constraint-Driven State Management for TypeScript