Skip to main content
All Valiqor data models are Python dataclass objects with to_dict() and from_dict(cls, data) methods for serialization. Paginated models support len() and iteration.

EvaluationResult

Returned by evaluate(), evaluate_trace(), and get_run().
FieldTypeDescription
run_idstrUnique evaluation run identifier.
project_idstrProject this run belongs to.
statusstrRun status: "completed", "running", "failed".
overall_scoreOptional[float]Weighted overall score (0.0–1.0).
aggregate_scoresDict[str, float]Per-metric aggregate scores.
total_itemsintTotal items in the dataset.
items_evaluatedintItems successfully evaluated.
metadataDict[str, Any]Custom metadata attached to the run.

JobHandle

Returned by evaluate_async(). Wraps an async evaluation job with status polling and result retrieval.
Property/MethodReturn TypeDescription
job_idstrThe async job identifier.
job_typestrAlways "evaluation".
status()JobStatusPoll current status.
is_running()boolWhether the job is still running.
is_completed()boolWhether the job completed successfully.
cancel()Dict[str, Any]Cancel the job.
result()EvaluationResultBlock until complete and return result.
wait(poll_interval, timeout, on_progress)JobStatusPoll with optional progress callback.
handle = client.eval.evaluate_async(dataset=data, metrics=["hallucination"])
handle.wait(poll_interval=3.0, on_progress=lambda s: print(f"{s.progress_percent}%"))
result = handle.result()

JobStatus

Status of an async evaluation job.
FieldTypeDescription
job_idstrJob identifier.
job_typestr"evaluation".
statusstr"queued", "running", "completed", "failed", "cancelled".
progress_percentfloatProgress (0.0–100.0).
current_itemintCurrent item being processed.
total_itemsintTotal items.
started_atOptional[str]ISO timestamp.
finished_atOptional[str]ISO timestamp.
errorOptional[str]Error message if failed.
resultOptional[EvaluationResult]Result if completed.
Properties: is_running, is_completed, is_failed (all bool).

RunMetric

Per-metric score for an evaluation run.
FieldTypeDescription
keystrMetric key (e.g. "hallucination").
display_namestrHuman-readable name.
scorefloatAggregate score.
value_typestr"numeric" (default).

EvalItemDetail

Per-item evaluation detail with scores and explanations.
FieldTypeDescription
idstrItem identifier.
run_idstrParent run ID.
inputstrInput text.
outputstrOutput text.
contextOptional[str]Context text.
expectedOptional[str]Expected output.
overall_scoreOptional[float]Item-level overall score.
metric_scoresDict[str, float]Per-metric scores for this item.
explanationsDict[str, str]Per-metric explanations.
metadataDict[str, Any]Item metadata.

EvalItemsPage

Paginated list of EvalItemDetail. Supports len() and iteration.
FieldType
itemsList[EvalItemDetail]
totalint
pageint
page_sizeint

EvalTrendPoint

A single data point in an evaluation trend.
FieldType
datestr
metricstr
scorefloat
run_countint

EvalRunComparison

Result of comparing multiple evaluation runs.
FieldType
runsList[Dict[str, Any]]
metricsList[Dict[str, Any]]
overall_scoresList[float]

MetricInfo

Metric template or project metric configuration.
FieldTypeDefault
keystr
display_namestr
definitionOptional[str]None
value_typestr"numeric"
categoryOptional[str]None

ProjectInfo

FieldTypeDefault
idstr
namestr
keyOptional[str]None
model_nameOptional[str]None
created_atOptional[str]None

AuthInfo

FieldTypeDefault
validbool
user_idOptional[str]None
org_idOptional[str]None
org_nameOptional[str]None
planOptional[str]None

CancelResponse

FieldTypeDefault
statusstr
messagestr""
job_idOptional[str]None