Evaluation Models

All Valiqor data models are Python dataclass objects with to_dict() and from_dict(cls, data) methods for serialization. Paginated models support len() and iteration.

`EvaluationResult`

Returned by evaluate(), evaluate_trace(), and get_run().

Field	Type	Description
`run_id`	`str`	Unique evaluation run identifier.
`project_id`	`str`	Project this run belongs to.
`status`	`str`	Run status: `"completed"`, `"running"`, `"failed"`.
`overall_score`	`Optional[float]`	Weighted overall score (0.0–1.0).
`aggregate_scores`	`Dict[str, float]`	Per-metric aggregate scores.
`total_items`	`int`	Total items in the dataset.
`items_evaluated`	`int`	Items successfully evaluated.
`metadata`	`Dict[str, Any]`	Custom metadata attached to the run.

`JobHandle`

Returned by evaluate_async(). Wraps an async evaluation job with status polling and result retrieval.

Property/Method	Return Type	Description
`job_id`	`str`	The async job identifier.
`job_type`	`str`	Always `"evaluation"`.
`status()`	`JobStatus`	Poll current status.
`is_running()`	`bool`	Whether the job is still running.
`is_completed()`	`bool`	Whether the job completed successfully.
`cancel()`	`Dict[str, Any]`	Cancel the job.
`result()`	`EvaluationResult`	Block until complete and return result.
`wait(poll_interval, timeout, on_progress)`	`JobStatus`	Poll with optional progress callback.

handle = client.eval.evaluate_async(dataset=data, metrics=["hallucination"])
handle.wait(poll_interval=3.0, on_progress=lambda s: print(f"{s.progress_percent}%"))
result = handle.result()

`JobStatus`

Status of an async evaluation job.

Field	Type	Description
`job_id`	`str`	Job identifier.
`job_type`	`str`	`"evaluation"`.
`status`	`str`	`"queued"`, `"running"`, `"completed"`, `"failed"`, `"cancelled"`.
`progress_percent`	`float`	Progress (0.0–100.0).
`current_item`	`int`	Current item being processed.
`total_items`	`int`	Total items.
`started_at`	`Optional[str]`	ISO timestamp.
`finished_at`	`Optional[str]`	ISO timestamp.
`error`	`Optional[str]`	Error message if failed.
`result`	`Optional[EvaluationResult]`	Result if completed.

Properties: is_running, is_completed, is_failed (all bool).

`RunMetric`

Per-metric score for an evaluation run.

Field	Type	Description
`key`	`str`	Metric key (e.g. `"hallucination"`).
`display_name`	`str`	Human-readable name.
`score`	`float`	Aggregate score.
`value_type`	`str`	`"numeric"` (default).

`EvalItemDetail`

Per-item evaluation detail with scores and explanations.

Field	Type	Description
`id`	`str`	Item identifier.
`run_id`	`str`	Parent run ID.
`input`	`str`	Input text.
`output`	`str`	Output text.
`context`	`Optional[str]`	Context text.
`expected`	`Optional[str]`	Expected output.
`overall_score`	`Optional[float]`	Item-level overall score.
`metric_scores`	`Dict[str, float]`	Per-metric scores for this item.
`explanations`	`Dict[str, str]`	Per-metric explanations.
`metadata`	`Dict[str, Any]`	Item metadata.

`EvalItemsPage`

Paginated list of EvalItemDetail. Supports len() and iteration.

Field	Type
`items`	`List[EvalItemDetail]`
`total`	`int`
`page`	`int`
`page_size`	`int`

`EvalTrendPoint`

A single data point in an evaluation trend.

Field	Type
`date`	`str`
`metric`	`str`
`score`	`float`
`run_count`	`int`

`EvalRunComparison`

Result of comparing multiple evaluation runs.

Field	Type
`runs`	`List[Dict[str, Any]]`
`metrics`	`List[Dict[str, Any]]`
`overall_scores`	`List[float]`

`MetricInfo`

Metric template or project metric configuration.

Field	Type	Default
`key`	`str`	—
`display_name`	`str`	—
`definition`	`Optional[str]`	`None`
`value_type`	`str`	`"numeric"`
`category`	`Optional[str]`	`None`

`ProjectInfo`

Field	Type	Default
`id`	`str`	—
`name`	`str`	—
`key`	`Optional[str]`	`None`
`model_name`	`Optional[str]`	`None`
`created_at`	`Optional[str]`	`None`

`AuthInfo`

Field	Type	Default
`valid`	`bool`	—
`user_id`	`Optional[str]`	`None`
`org_id`	`Optional[str]`	`None`
`org_name`	`Optional[str]`	`None`
`plan`	`Optional[str]`	`None`

`CancelResponse`

Field	Type	Default
`status`	`str`	—
`message`	`str`	`""`
`job_id`	`Optional[str]`	`None`

EvalClient — Methods returning these models
Failure Analysis Models — FA data models
Security Models — Security data models

​EvaluationResult

​JobHandle

​JobStatus

​RunMetric

​EvalItemDetail

​EvalItemsPage

​EvalTrendPoint

​EvalRunComparison

​MetricInfo

​ProjectInfo

​AuthInfo

​CancelResponse

​Related