CLI Reference

calibra <command> [options]

calibra validate

Validate a campaign configuration file without running anything.

calibra validate <config>

The config argument is a path to a campaign TOML file.

Validation checks TOML syntax and required fields, task directory structure (that task.md exists, env/ is a directory, verify.sh is executable), uniqueness of matrix dimension labels, validity of constraint references, session option keys and types (rejecting harness-managed keys, unknown keys, and type mismatches), and price coverage if require_price_coverage = true.

uv run calibra validate experiments/model-shootout.toml

Output on success:

Config valid. 10 variants x 5 tasks x 5 repeats = 250 trials.

calibra run

Execute a campaign.

calibra run <config> [--workers N] [--dry-run] [--filter EXPR] [--task NAME] [--resume] [--output DIR] [--keep-workdirs] [-v]

The config argument is a path to a campaign TOML file.

Option Default Description
--workers N 1 Number of parallel worker threads
--dry-run off Print trial plan without executing
--filter EXPR none Filter variants (e.g., "model=sonnet,skills=full")
--task NAME all Run only specified task(s); repeatable
--resume off Skip trials with existing valid results
--output DIR results Output directory for trial reports
--keep-workdirs off Preserve temporary workspace directories
-v, --verbose off Show detailed trial output (counters, timing, event timeline)
# Basic run
uv run calibra run experiments/config.toml

# Parallel with filtering
uv run calibra run experiments/config.toml --workers 4 --filter "model=sonnet"

# Run a single task
uv run calibra run experiments/config.toml --task hello-world

# Run two specific tasks
uv run calibra run experiments/config.toml --task hello-world --task fix-typo

# Resume an interrupted run
uv run calibra run experiments/config.toml --resume --workers 4

# Dry run to preview
uv run calibra run experiments/config.toml --dry-run

# Debug a failing trial
uv run calibra run experiments/config.toml --keep-workdirs --filter "model=haiku"

calibra analyze

Aggregate trial results into statistical summaries.

calibra analyze <results_dir> [--output DIR]

The results_dir argument is a path to a campaign's results directory.

Option Default Description
--output DIR same as results_dir Where to write summary files

Produces three files: summary.json (full machine-readable aggregate data), summary.md (human-readable Markdown report), and summary.csv (spreadsheet format).

uv run calibra analyze results/model-shootout
uv run calibra analyze results/model-shootout --output reports/

calibra show

Pretty-print a single trial report.

calibra show <report.json>

The argument is a path to a trial JSON file. Output includes the task name, variant label, outcome, verification status, wall time, turns, LLM calls, total tool calls, tool failures, LLM time, tool time, compactions, and a per-tool usage breakdown.

uv run calibra show results/model-shootout/hello-world/sonnet_default_none_none_base_0.json

calibra compare

Compare two campaign result directories.

calibra compare <dir_a> <dir_b> [--output DIR]
Option Default Description
--output DIR parent of dir_a Where to write comparison output

Finds variants common to both campaigns and computes the pass rate delta (B minus A), Cliff's delta effect size and magnitude, and a token usage comparison.

uv run calibra compare results/run-v1 results/run-v2

calibra diff

Diff two trial report JSON files side by side in the browser. This starts a local web server and opens the diff view automatically.

calibra diff <file_a> <file_b> [--port N] [--export FILE]

Both arguments are paths to trial report JSON files (as produced by swival --report or found in results/<campaign>/<task>/).

Option Default Description
--port N 8118 Port to bind
--export FILE none Export diff as a self-contained HTML file

By default, calibra diff starts a local web server and opens the diff in your browser. With --export, it writes a self-contained HTML file instead of launching a server. The server binds to 127.0.0.1 only (not configurable) since it reads arbitrary local files.

The diff view shows KPI deltas (wall time, turns, tokens, LLM time, tool time, LLM calls, tool calls, compactions), outcome and verification status, settings differences, per-tool usage comparison, side-by-side event timelines, and raw JSON.

uv run calibra diff /tmp/report-a.json /tmp/report-b.json
uv run calibra diff results/run-a/task/variant_0.json results/run-b/task/variant_0.json --port 9000
uv run calibra diff /tmp/report-a.json /tmp/report-b.json --export diff.html

calibra web serve

Launch the interactive web dashboard.

calibra web serve <results_dir> [--port N] [--host ADDR] [--open]

The results_dir argument is the directory containing campaign result folders.

Option Default Description
--port N 8118 Port to bind
--host ADDR 127.0.0.1 Host address to bind
--open off Open browser automatically
uv run calibra web serve results/ --open
uv run calibra web serve results/ --host 0.0.0.0 --port 9000

calibra web build

Export a static HTML dashboard.

calibra web build <results_dir> [--output DIR]
Option Default Description
--output DIR <results_dir>/web Output directory for static HTML
uv run calibra web build results/ --output docs.md/dashboard/

Exit codes

Calibra exits with 0 on success and 1 on error, whether that's a configuration problem (invalid TOML, missing files, bad constraints) or a runtime failure (all trials failed, budget exceeded).

Environment variables

Calibra inherits environment variables for provider authentication. The specific variables depend on which providers you use in your matrix. For example, ANTHROPIC_API_KEY for Anthropic models.