Quick Start
This guide walks you through creating a task, writing a campaign config, running it, and viewing the results.
1. Create a task
Tasks live in a directory (typically tasks/). Each task needs at minimum a prompt file and a workspace directory.
mkdir -p tasks/hello-world/env
Write the prompt:
cat > tasks/hello-world/task.md << 'EOF'
Write a Python script called `hello.py` that prints "Hello, World!" to stdout.
EOF
Add a verification script:
cat > tasks/hello-world/verify.sh << 'EOF'
#!/bin/sh
python3 hello.py | grep -qx "Hello, World!"
EOF
chmod +x tasks/hello-world/verify.sh
The env/ directory is empty here because the task starts from a blank workspace. For tasks that need starter files, you'd put them in env/.
2. Write a campaign config
Campaign configs are TOML files. Create experiments/my-first-campaign.toml:
[campaign]
name = "my-first-campaign"
description = "Testing a single model on hello-world"
tasks_dir = "tasks"
repeat = 3
timeout_s = 120
[[matrix.model]]
provider = "anthropic"
model = "claude-sonnet-4.6"
label = "sonnet"
[[matrix.agent_instructions]]
label = "default"
agents_md = "AGENTS.md"
This is the simplest possible campaign: one model, one set of instructions, no skills, no MCP, no environment overlay. The three optional dimensions (skills, mcp, environment) get default values of none, none, and base. Setting repeat = 3 runs each variant+task combination three times to measure consistency.
3. Validate the config
Before running, check that everything is wired up correctly:
uv run calibra validate experiments/my-first-campaign.toml
This checks the config structure, discovers tasks, expands the matrix, and reports the trial plan:
Config valid. 1 variants x 1 tasks x 3 repeats = 3 trials.
4. Dry run
See exactly what would execute without running anything:
uv run calibra run experiments/my-first-campaign.toml --dry-run
This prints each variant label along with summary counts (tasks, repeats, total trials).
5. Run the campaign
uv run calibra run experiments/my-first-campaign.toml --workers 2
Calibra sets up an isolated workspace for each trial, runs the Swival agent with the configured model, executes verify.sh to check the result, and writes a JSON report. Results land in results/my-first-campaign/hello-world/.
6. Inspect a trial
Look at a single trial result:
uv run calibra show results/my-first-campaign/hello-world/sonnet_default_none_none_base_0.json
This shows a formatted summary with the task name, variant, outcome, verification status, wall time, turns, tool calls, and more. The file naming convention is {variant_label}_{repeat_index}.json, where the variant label joins dimension labels with underscores: model_agent_skills_mcp_environment.
7. Analyze the campaign
Generate aggregate reports:
uv run calibra analyze results/my-first-campaign
This produces three files in results/my-first-campaign/: summary.json (machine-readable aggregate metrics), summary.md (a human-readable Markdown report with rankings), and summary.csv (spreadsheet-friendly format).
8. View in the web dashboard
For a richer experience, launch the interactive dashboard:
uv run calibra web serve results/ --open
This starts a local server at http://127.0.0.1:8118 and opens your browser. You'll see your campaign with charts, heatmaps, and drill-down views.
Next steps
From here, read Writing Tasks to learn how to build more complex tasks, Campaign Configuration to explore all config options, Running Campaigns for parallelism, filtering, and resuming, or Advanced Topics for constraints, sampling modes, and budgets.