The sandbox executes a Directive snippet inside a bounded worker_threads process and returns what happened: captured console.log output, the post-settle() facts snapshot, derivation values, and any errors. It's the engine behind two surfaces – the run_in_sandbox MCP tool and the playground's live transcript panel – and it's available as a standalone npm package for anyone building their own playground, CI gate, or teaching tool.

Why a sandbox

LLMs and humans both generate Directive code. The natural next question is "does it actually work?" Without an execution layer, the answer is "probably; run it locally and see." The sandbox closes that loop:

Agent-side – the MCP tool returns the transcript so the AI can self-correct based on observed behavior rather than just static lint output.
Web-side – the playground's Run button hits the sandbox and renders the same transcript inline, so a user can iterate from edits in the StackBlitz embed without leaving the page.
CI-side – a docs site that wants to verify every example actually settles can run them all through the sandbox in a single pass.

What the sandbox isn't: a general-purpose code runner. It's tuned for Directive – it knows how to read facts via $store.toObject(), how to walk system.derive, how to wait through settle(). It refuses imports outside the @directive-run/* allowlist by design.

API

Install:

pnpm add @directive-run/sandbox

The package has esbuild and ts-morph as optional dependencies; install them too if your runtime doesn't already vendor them.

The single entry point is runInSandbox:

import { runInSandbox } from "@directive-run/sandbox";

const result = await runInSandbox({
  files: [
    { path: "src/counter.ts", source: moduleSource },
    { path: "src/main.ts", source: runnerSource },
  ],
  timeoutMs: 5000,
});

console.log(result.logs);     // ["[log] [start] count= 0", "[log] [settled] count= 2"]
console.log(result.facts);    // { count: 2 }
console.log(result.derived);  // { isPositive: true }
console.log(result.errors);   // []

The response shape:

interface SandboxResult {
  logs: string[];                       // captured console.log/warn/error lines
  facts: Record<string, unknown>;       // system.facts.$store.toObject() snapshot
  derived: Record<string, unknown>;     // system.derive[key] snapshot per declared key
  errors: string[];                     // structured error messages
  durationMs: number;
  timedOut: boolean;
}

For already-runnable snippets (the kind get_example or fix_code returns), pass { source: "..." } instead and the package maps it onto src/main.ts internally.

Sandbox boundary

Three defensive layers, in order:

AST allowlist validator (powered by ts-morph). Pre-flights every file before the bundler:
- Imports must match the @directive-run/* curated set – core, ai, query, react, vue, svelte, solid, lit, el, optimistic, timeline, mutator, knowledge, scaffold, claude-plugin, lint, sources – or a relative ./*.js path inside the payload. Anything else (node:fs, express, @sizls/*) is rejected. The canonical, drift-proof list is ALLOWED_DIRECTIVE_PACKAGES in packages/sandbox/src/validator.ts.
- @directive-run/{cli,mcp,sandbox,vite-plugin-api-proxy} are explicitly denied. They're build / CLI / sandbox-meta tooling with no legitimate use inside a sandboxed demo.
- Free identifier references to process, require, fetch, Buffer, eval, setTimeout, etc. are denied.
- Property-access bypass chains are rejected too (v0.3.0+): globalThis.process, globalThis.fetch, globalThis["X"] bracket syntax, .constructor access on any value, Function(...) calls, Reflect.get(globalThis, …) smuggle. These were the holes that an earlier "skip property-key positions" rule unintentionally opened up; an adversarial security review traced the chains and v0.3.0 closes them.
esbuild bundler with absolute-path rewriting. The payload is virtualized into a single ESM string with @directive-run/* imports rewritten to absolute file:// URLs of the host's resolved node_modules paths. The worker can then import the bundle from /tmp without needing a node_modules chain above it – which means Vercel, AWS Lambda, Cloud Run, and similar read-only-FS deploy targets all work.
worker_threads.Worker with resourceLimits. 32 MB heap, 16 MB code, clamped wall-clock budget of [100ms, 10s] (default 5s). The worker is hard-terminated on overrun via worker.terminate() – no cooperative cancellation needed. The host writes the bundle to a fresh tmp directory per call and cleans up in finally, so leaked workers can't accumulate disk.
Outbound fetch wrapper. The worker patches globalThis.fetch before importing the bundle. The wrapper blocks loopback (127.0.0.0/8, ::1, localhost), link-local (169.254.0.0/16 – includes the AWS/GCP/Azure IMDS endpoint at .169.254), RFC-1918 private (10/8, 172.16-31/12, 192.168/16), multicast, IPv4-mapped IPv6 in literal AND hex form, and non-HTTP(S) protocols. The user's snippet can't call fetch (validator blocks it as a free identifier and as globalThis.fetch access), but @directive-run/query's internal fetch calls run inside the package's module body and the validator never sees them – the wrapper is the only place that can intercept.

What the boundary doesn't cover

Network access to public hosts. A snippet that imports @directive-run/query can still hit any public URL it can spell. The wrapper blocks the private ranges, not the public internet.
CPU starvation outside V8 heap. resourceLimits is V8-heap-only. A snippet that allocates a giant Buffer (blocked at the validator), spins microtasks, or builds deeply-nested structures can still exhaust memory until the wall-clock kills it.
Trust boundary inversion. The sandbox protects the host from the snippet, not the snippet from the host. If you embed the sandbox in a server, the server's process.env, node_modules layout, and child-process state are visible to your own code paths even though the snippet can't touch them.

The full threat-model coverage map – defended-vs-not-defended class by class – lives in the June 2026 sandbox security audit.

Two ways to use it

As an MCP tool

The @directive-run/mcp server exposes run_in_sandbox as a tool. AI clients (Claude Desktop, Cursor, Cline) call it directly after generating code:

"Generate a Directive counter module, then run it and tell me what facts I see."

The LLM calls generate_module, pipes the paired output through run_in_sandbox, and reads the transcript back. The response includes a playgroundUrl for click-through editing if the user wants to iterate further.

As a docs-site API route

The directive.run/playground page hits an internal /api/sandbox Next.js route that wraps runInSandbox. The DevTools panel's Run button POSTs the source + runner files, renders the returned transcript inline. Same boundary, same execution stack, different UI.

For a sense of what's live: open the playground, paste a small Directive module, click Run, and watch the Facts / Logs / Errors tabs populate.

When to use what

You want to demo a module to a user – use playground_link. The user gets a clickable URL that boots a real running project in StackBlitz; they can edit live.
You want the AI to see what the module did and self-correct – use run_in_sandbox. The transcript comes back in-chat so the next reasoning step has the observed behavior to work with.
You're building a CI gate that asserts every example settles cleanly – use the standalone @directive-run/sandbox package and check result.errors.length === 0 && !result.timedOut.
You're shipping a teaching tool – wrap runInSandbox behind your own UI, surface result.logs and result.facts next to the user's editor.

Cost model

Cold start (first call after a process boot) – ts-morph and esbuild load lazily, so first-call latency is in the 300–800 ms range.
Warm path – about 50–70 ms per call for a typical Directive snippet, plus whatever the snippet itself spends in settle().
Memory – workers are not pooled. Each call spins a fresh worker and tears it down, so memory state never carries between calls.

If you're driving high-QPS traffic the package isn't the right fit; it's tuned for interactive use and CI gates, not as a hot-path code runner.

Sandbox