Skip to main content
All evaluation commands live under the eval group.
valiqor eval <run|status|result|list|metrics>

Common Flags

Every eval subcommand accepts:
FlagTypeDefaultDescription
--api-keystringAPI key override
--projectstringProject name override
--base-urlstringBackend URL override
--openai-api-keystringOpenAI key for LLM-based metrics

eval run

Run an evaluation on a dataset or trace file.
valiqor eval run --metrics <metrics> [--dataset <path> | --trace-file <path>] [flags]
FlagTypeDefaultDescription
--datasetstringPath to a dataset JSON file
--trace-filestringPath to a trace JSON file
--metricsstringrequiredComma-separated metric keys
--run-namestringName for this evaluation run
--modelstringModel name metadata
--asyncflagfalseSubmit asynchronously and return a job ID
--dataset and --trace-file are mutually exclusive. At least one is required.

Dataset Format

Each item in the dataset JSON array must include input and output fields:
[
  {
    "input": "What is the capital of France?",
    "output": "The capital of France is Paris."
  }
]

Examples

# Synchronous evaluation
valiqor eval run --dataset data.json --metrics factual_accuracy,coherence

# Evaluate a trace file
valiqor eval run --trace-file trace.json --metrics hallucination

# Async evaluation (returns job ID)
valiqor eval run --dataset data.json --metrics coherence --async

eval status

Check the status of an evaluation run.
valiqor eval status --run-id <id> [flags]
FlagTypeDefaultDescription
--run-idstringrequiredEvaluation run or job ID
--wait / -wflagfalsePoll until completion with a live progress bar

Examples

# Check status once
valiqor eval status --run-id abc123

# Wait for completion with progress bar
valiqor eval status --run-id abc123 --wait
In wait mode, the CLI polls every 3 seconds until the run completes, fails, or is cancelled.

eval result

Fetch and display evaluation results.
valiqor eval result --run-id <id> [flags]
FlagTypeDefaultDescription
--run-idstringrequiredEvaluation run ID
--jsonflagfalseOutput raw JSON instead of formatted display

Output

The formatted display includes:
  • Run ID, project, status, items evaluated
  • Overall score with indicator (≥0.8 ✅, ≥0.5 ⚠️, <0.5 ❌)
  • Per-metric aggregate scores
  • Per-metric detail scores for each item
# Formatted output
valiqor eval result --run-id abc123

# Raw JSON for scripting
valiqor eval result --run-id abc123 --json

eval list

List evaluation runs for the configured project.
valiqor eval list
Displays run ID, status, score, items count, name, and created date.

eval metrics

List all available evaluation metric templates.
valiqor eval metrics
Shows each metric’s key, display name, value type, and category. Use metric keys with --metrics in eval run.