All evaluation commands live under the eval group.
valiqor eval <run|status|result|list|metrics>
Common Flags
Every eval subcommand accepts:
| Flag | Type | Default | Description |
|---|
--api-key | string | — | API key override |
--project | string | — | Project name override |
--base-url | string | — | Backend URL override |
--openai-api-key | string | — | OpenAI key for LLM-based metrics |
eval run
Run an evaluation on a dataset or trace file.
valiqor eval run --metrics <metrics> [--dataset <path> | --trace-file <path>] [flags]
| Flag | Type | Default | Description |
|---|
--dataset | string | — | Path to a dataset JSON file |
--trace-file | string | — | Path to a trace JSON file |
--metrics | string | required | Comma-separated metric keys |
--run-name | string | — | Name for this evaluation run |
--model | string | — | Model name metadata |
--async | flag | false | Submit asynchronously and return a job ID |
--dataset and --trace-file are mutually exclusive. At least one is required.
Each item in the dataset JSON array must include input and output fields:
[
{
"input": "What is the capital of France?",
"output": "The capital of France is Paris."
}
]
Examples
# Synchronous evaluation
valiqor eval run --dataset data.json --metrics factual_accuracy,coherence
# Evaluate a trace file
valiqor eval run --trace-file trace.json --metrics hallucination
# Async evaluation (returns job ID)
valiqor eval run --dataset data.json --metrics coherence --async
eval status
Check the status of an evaluation run.
valiqor eval status --run-id <id> [flags]
| Flag | Type | Default | Description |
|---|
--run-id | string | required | Evaluation run or job ID |
--wait / -w | flag | false | Poll until completion with a live progress bar |
Examples
# Check status once
valiqor eval status --run-id abc123
# Wait for completion with progress bar
valiqor eval status --run-id abc123 --wait
In wait mode, the CLI polls every 3 seconds until the run completes, fails, or is cancelled.
eval result
Fetch and display evaluation results.
valiqor eval result --run-id <id> [flags]
| Flag | Type | Default | Description |
|---|
--run-id | string | required | Evaluation run ID |
--json | flag | false | Output raw JSON instead of formatted display |
Output
The formatted display includes:
- Run ID, project, status, items evaluated
- Overall score with indicator (≥0.8 ✅, ≥0.5 ⚠️, <0.5 ❌)
- Per-metric aggregate scores
- Per-metric detail scores for each item
# Formatted output
valiqor eval result --run-id abc123
# Raw JSON for scripting
valiqor eval result --run-id abc123 --json
eval list
List evaluation runs for the configured project.
Displays run ID, status, score, items count, name, and created date.
eval metrics
List all available evaluation metric templates.
Shows each metric’s key, display name, value type, and category. Use metric keys with --metrics in eval run.