Evaluation - Valiqor

All evaluation commands live under the eval group.

valiqor eval <run|status|result|list|metrics>

Common Flags

Every eval subcommand accepts:

Flag	Type	Default	Description
`--api-key`	`string`	—	API key override
`--project`	`string`	—	Project name override
`--base-url`	`string`	—	Backend URL override
`--openai-api-key`	`string`	—	OpenAI key for LLM-based metrics

eval run

Run an evaluation on a dataset or trace file.

valiqor eval run --metrics <metrics> [--dataset <path> | --trace-file <path>] [flags]

Flag	Type	Default	Description
`--dataset`	`string`	—	Path to a dataset JSON file
`--trace-file`	`string`	—	Path to a trace JSON file
`--metrics`	`string`	required	Comma-separated metric keys
`--run-name`	`string`	—	Name for this evaluation run
`--model`	`string`	—	Model name metadata
`--async`	`flag`	`false`	Submit asynchronously and return a job ID

--dataset and --trace-file are mutually exclusive. At least one is required.

Dataset Format

Each item in the dataset JSON array must include input and output fields:

[
  {
    "input": "What is the capital of France?",
    "output": "The capital of France is Paris."
  }
]

Examples

# Synchronous evaluation
valiqor eval run --dataset data.json --metrics factual_accuracy,coherence

# Evaluate a trace file
valiqor eval run --trace-file trace.json --metrics hallucination

# Async evaluation (returns job ID)
valiqor eval run --dataset data.json --metrics coherence --async

eval status

Check the status of an evaluation run.

valiqor eval status --run-id <id> [flags]

Flag	Type	Default	Description
`--run-id`	`string`	required	Evaluation run or job ID
`--wait` / `-w`	`flag`	`false`	Poll until completion with a live progress bar

Examples

# Check status once
valiqor eval status --run-id abc123

# Wait for completion with progress bar
valiqor eval status --run-id abc123 --wait

In wait mode, the CLI polls every 3 seconds until the run completes, fails, or is cancelled.

eval result

Fetch and display evaluation results.

valiqor eval result --run-id <id> [flags]

Flag	Type	Default	Description
`--run-id`	`string`	required	Evaluation run ID
`--json`	`flag`	`false`	Output raw JSON instead of formatted display

Output

The formatted display includes:

Run ID, project, status, items evaluated
Overall score with indicator (≥0.8 ✅, ≥0.5 ⚠️, <0.5 ❌)
Per-metric aggregate scores
Per-metric detail scores for each item

# Formatted output
valiqor eval result --run-id abc123

# Raw JSON for scripting
valiqor eval result --run-id abc123 --json

eval list

List evaluation runs for the configured project.

valiqor eval list

Displays run ID, status, score, items count, name, and created date.

eval metrics

List all available evaluation metric templates.

valiqor eval metrics

Shows each metric’s key, display name, value type, and category. Use metric keys with --metrics in eval run.

​Common Flags

​eval run

​Dataset Format

​Examples

​eval status

​Examples

​eval result

​Output

​eval list

​eval metrics

Common Flags

eval run

Dataset Format

Examples

eval status

Examples

eval result

Output

eval list

eval metrics