Skip to main content

FARunResult

The full result of a failure analysis run.
FieldTypeDescription
run_idstrRun identifier.
statusstr"completed", "processing", "failed".
modestr"minimal" or "full".
input_typestr"trace", "json", or "csv".
feature_kindstr"rag", "agent", "agentic_rag", "generic_llm".
summaryFASummaryAggregated summary statistics.
failure_tagsList[FATag]All detected failure tags.
eval_metricsOptional[Dict[str, float]]Evaluation metric values (if run_eval=True).
eval_run_idOptional[str]Linked evaluation run ID.
security_flagsOptional[Dict[str, str]]Per-category security flags (if run_security=True). Values are "fail" or "pass" keyed by S-category code (e.g. {"S1": "pass", "S9": "fail"}).
security_batch_idOptional[str]Linked security batch ID.
detectors_runList[str]Which detectors were executed.
detectors_skippedList[str]Which detectors were skipped.
duration_msintTotal processing time in milliseconds.
tokens_usedintTotal tokens consumed.
created_atOptional[str]ISO timestamp.
inputsList[FARunInput]Input items with per-item stats.
Property: tags — alias for failure_tags. Methods:
  • get_item_tags(item_index: int) -> List[FATag] — tags for a specific input item
  • get_failed_items() -> List[FARunInput] — items with at least one failure
  • get_clean_items() -> List[FARunInput] — items with no failures

FASummary

Aggregated failure summary.
FieldTypeDefaultDescription
total_failures_detectedint0Number of failure tags with decision="fail".
total_passesint0Checks that passed.
total_uncertainint0Checks with decision="unsure".
overall_severityfloat0.0Aggregate severity (0.0–5.0).
overall_confidencefloat0.0Aggregate confidence (0.0–1.0).
primary_failureOptional[str]NoneID of the most severe failure subcategory.
primary_failure_nameOptional[str]NoneName of the primary failure.
buckets_affectedList[str][]Bucket IDs with failures.
should_alertboolFalseWhether this warrants an alert.
should_gate_ciboolFalseWhether this should block CI.
needs_human_reviewboolFalseWhether human review is recommended.
total_itemsint0Total input items analyzed.
items_with_failuresint0Items with at least one failure.
items_all_passedint0Items with all checks passed.

FATag

A single failure detection result.
FieldTypeDescription
tag_idstrUnique tag identifier.
bucket_idstrParent failure bucket ID.
bucket_namestrParent failure bucket name.
subcategory_idstrFailure subcategory ID.
subcategory_namestrFailure subcategory name.
decisionstr"pass", "fail", or "unsure".
severityfloatSeverity score (0.0–5.0).
confidencefloatConfidence score (0.0–1.0).
detector_type_usedstr"deterministic", "llm_judge", or "hybrid".
judge_rationaleOptional[str]LLM judge’s explanation (if applicable).
scoring_breakdownOptional[FAScoringBreakdown]Detailed scoring components.
eval_metric_valuesDict[str, float]Related eval metric values.
evidence_itemsList[FAEvidenceItem]Supporting evidence.
item_indexOptional[int]Which input item this tag applies to.
is_reviewedboolWhether this tag has been marked as reviewed. Default False.
reviewed_atOptional[str]ISO timestamp of when the tag was marked reviewed.
issue_urlOptional[str]URL linking this tag to an external issue tracker.

FARunInput

An input item with per-item failure statistics.
FieldTypeDescription
item_indexintPosition in the dataset.
input_textstrThe input text.
output_textstrThe output text.
input_previewstrTruncated input preview.
output_previewstrTruncated output preview.
contextOptional[List[str]]Context documents.
tool_callsOptional[List[Dict]]Tool call data.
trace_idOptional[str]Linked trace ID.
failure_countintNumber of failures for this item.
pass_countintNumber of passed checks.
unsure_countintNumber of uncertain checks.
max_severityfloatHighest severity among failures.
Properties: total_checks -> int, has_failures -> bool.

FABucket

A failure bucket in the taxonomy.
FieldTypeDescription
bucket_idstrBucket identifier.
bucket_namestrDisplay name.
descriptionstrWhat this bucket covers.
subcategoriesList[FASubcategory]Subcategories within this bucket.

FASubcategory

A failure subcategory within a bucket.
FieldTypeDescription
subcategory_idstrSubcategory identifier.
subcategory_namestrDisplay name.
descriptionstrWhat this subcategory detects.
detection_approachstr"deterministic", "llm_judge", or "hybrid".
applies_toList[str]Feature kinds this applies to.
requires_retrievalboolWhether retrieval context is needed.
requires_toolsboolWhether tool call data is needed.
minimal_mode_compatibleboolWhether this works in minimal mode.
required_inputsList[str]Required input fields.
related_eval_metricsList[str]Related evaluation metrics.
related_security_categoriesList[str]Related security categories.
impact_scoreintImpact severity (1–5).
risk_categorystr"low", "medium", "high", "critical".

FARunListPage

Paginated list of FA runs. Supports len() and iteration.
FieldType
itemsList[FARunListItem]
totalint
pageint
page_sizeint
has_morebool

FARunListItem

Summary of an FA run in a list.
FieldTypeDescription
run_idstrRun identifier.
statusstrRun status.
input_typestrInput type.
minimal_modeboolWhether minimal mode was used.
dataset_item_countintNumber of items.
created_atstrISO timestamp.
feature_kindOptional[str]Feature kind.
total_failures_detectedOptional[int]Failure count.
overall_severityOptional[float]Overall severity.

FAInsightsSummary

Aggregated failure insights over a time period.
FieldTypeDescription
total_runsintTotal FA runs.
total_items_analyzedintTotal items analyzed.
overall_failure_ratefloatFailure rate (0.0–1.0).
average_severityfloatAverage severity.
period_daysintAnalysis period in days.
top_recurring_failuresList[FARecurringFailure]Most common failures.
bucket_distributionList[FABucketDistribution]Failure distribution by bucket.

FATrends

Failure trend data over time.
FieldType
period_daysint
project_nameOptional[str]
trend_dataList[FATrendDataPoint]

FAPlaygroundResult

Result from the playground endpoint.
FieldTypeDescription
run_idstrRun identifier.
summaryFASummaryFailure summary.
failure_tagsList[FATag]Detected failures.
checks_passedintNumber of passed checks.
security_flagsDict[str, str]Per-category security flags. Values are "fail" or "pass" keyed by S-category code.
duration_msintProcessing time.
playground_runs_todayintRuns used today.
playground_limit_per_dayintDaily limit (10).

FAJobStatus

Status of an async FA job.
FieldTypeDescription
job_idstrJob identifier.
job_typestr"failure_analysis".
statusstr"queued", "running", "completed", "failed", "cancelled".
progress_percentfloatProgress (0–100).
current_itemintCurrent item being processed.
total_itemsintTotal items.
estimated_remaining_secondsOptional[int]Estimated time remaining.
errorOptional[str]Error message if failed.
Properties: is_running, is_completed, is_failed (all bool).