FARunResult
The full result of a failure analysis run.
| Field | Type | Description |
|---|---|---|
run_id | str | Run identifier. |
status | str | "completed", "processing", "failed". |
mode | str | "minimal" or "full". |
input_type | str | "trace", "json", or "csv". |
feature_kind | str | "rag", "agent", "agentic_rag", "generic_llm". |
summary | FASummary | Aggregated summary statistics. |
failure_tags | List[FATag] | All detected failure tags. |
eval_metrics | Optional[Dict[str, float]] | Evaluation metric values (if run_eval=True). |
eval_run_id | Optional[str] | Linked evaluation run ID. |
security_flags | Optional[Dict[str, str]] | Per-category security flags (if run_security=True). Values are "fail" or "pass" keyed by S-category code (e.g. {"S1": "pass", "S9": "fail"}). |
security_batch_id | Optional[str] | Linked security batch ID. |
detectors_run | List[str] | Which detectors were executed. |
detectors_skipped | List[str] | Which detectors were skipped. |
duration_ms | int | Total processing time in milliseconds. |
tokens_used | int | Total tokens consumed. |
created_at | Optional[str] | ISO timestamp. |
inputs | List[FARunInput] | Input items with per-item stats. |
tags — alias for failure_tags.
Methods:
get_item_tags(item_index: int) -> List[FATag]— tags for a specific input itemget_failed_items() -> List[FARunInput]— items with at least one failureget_clean_items() -> List[FARunInput]— items with no failures
FASummary
Aggregated failure summary.
| Field | Type | Default | Description |
|---|---|---|---|
total_failures_detected | int | 0 | Number of failure tags with decision="fail". |
total_passes | int | 0 | Checks that passed. |
total_uncertain | int | 0 | Checks with decision="unsure". |
overall_severity | float | 0.0 | Aggregate severity (0.0–5.0). |
overall_confidence | float | 0.0 | Aggregate confidence (0.0–1.0). |
primary_failure | Optional[str] | None | ID of the most severe failure subcategory. |
primary_failure_name | Optional[str] | None | Name of the primary failure. |
buckets_affected | List[str] | [] | Bucket IDs with failures. |
should_alert | bool | False | Whether this warrants an alert. |
should_gate_ci | bool | False | Whether this should block CI. |
needs_human_review | bool | False | Whether human review is recommended. |
total_items | int | 0 | Total input items analyzed. |
items_with_failures | int | 0 | Items with at least one failure. |
items_all_passed | int | 0 | Items with all checks passed. |
FATag
A single failure detection result.
| Field | Type | Description |
|---|---|---|
tag_id | str | Unique tag identifier. |
bucket_id | str | Parent failure bucket ID. |
bucket_name | str | Parent failure bucket name. |
subcategory_id | str | Failure subcategory ID. |
subcategory_name | str | Failure subcategory name. |
decision | str | "pass", "fail", or "unsure". |
severity | float | Severity score (0.0–5.0). |
confidence | float | Confidence score (0.0–1.0). |
detector_type_used | str | "deterministic", "llm_judge", or "hybrid". |
judge_rationale | Optional[str] | LLM judge’s explanation (if applicable). |
scoring_breakdown | Optional[FAScoringBreakdown] | Detailed scoring components. |
eval_metric_values | Dict[str, float] | Related eval metric values. |
evidence_items | List[FAEvidenceItem] | Supporting evidence. |
item_index | Optional[int] | Which input item this tag applies to. |
is_reviewed | bool | Whether this tag has been marked as reviewed. Default False. |
reviewed_at | Optional[str] | ISO timestamp of when the tag was marked reviewed. |
issue_url | Optional[str] | URL linking this tag to an external issue tracker. |
FARunInput
An input item with per-item failure statistics.
| Field | Type | Description |
|---|---|---|
item_index | int | Position in the dataset. |
input_text | str | The input text. |
output_text | str | The output text. |
input_preview | str | Truncated input preview. |
output_preview | str | Truncated output preview. |
context | Optional[List[str]] | Context documents. |
tool_calls | Optional[List[Dict]] | Tool call data. |
trace_id | Optional[str] | Linked trace ID. |
failure_count | int | Number of failures for this item. |
pass_count | int | Number of passed checks. |
unsure_count | int | Number of uncertain checks. |
max_severity | float | Highest severity among failures. |
total_checks -> int, has_failures -> bool.
FABucket
A failure bucket in the taxonomy.
| Field | Type | Description |
|---|---|---|
bucket_id | str | Bucket identifier. |
bucket_name | str | Display name. |
description | str | What this bucket covers. |
subcategories | List[FASubcategory] | Subcategories within this bucket. |
FASubcategory
A failure subcategory within a bucket.
| Field | Type | Description |
|---|---|---|
subcategory_id | str | Subcategory identifier. |
subcategory_name | str | Display name. |
description | str | What this subcategory detects. |
detection_approach | str | "deterministic", "llm_judge", or "hybrid". |
applies_to | List[str] | Feature kinds this applies to. |
requires_retrieval | bool | Whether retrieval context is needed. |
requires_tools | bool | Whether tool call data is needed. |
minimal_mode_compatible | bool | Whether this works in minimal mode. |
required_inputs | List[str] | Required input fields. |
related_eval_metrics | List[str] | Related evaluation metrics. |
related_security_categories | List[str] | Related security categories. |
impact_score | int | Impact severity (1–5). |
risk_category | str | "low", "medium", "high", "critical". |
FARunListPage
Paginated list of FA runs. Supports len() and iteration.
| Field | Type |
|---|---|
items | List[FARunListItem] |
total | int |
page | int |
page_size | int |
has_more | bool |
FARunListItem
Summary of an FA run in a list.
| Field | Type | Description |
|---|---|---|
run_id | str | Run identifier. |
status | str | Run status. |
input_type | str | Input type. |
minimal_mode | bool | Whether minimal mode was used. |
dataset_item_count | int | Number of items. |
created_at | str | ISO timestamp. |
feature_kind | Optional[str] | Feature kind. |
total_failures_detected | Optional[int] | Failure count. |
overall_severity | Optional[float] | Overall severity. |
FAInsightsSummary
Aggregated failure insights over a time period.
| Field | Type | Description |
|---|---|---|
total_runs | int | Total FA runs. |
total_items_analyzed | int | Total items analyzed. |
overall_failure_rate | float | Failure rate (0.0–1.0). |
average_severity | float | Average severity. |
period_days | int | Analysis period in days. |
top_recurring_failures | List[FARecurringFailure] | Most common failures. |
bucket_distribution | List[FABucketDistribution] | Failure distribution by bucket. |
FATrends
Failure trend data over time.
| Field | Type |
|---|---|
period_days | int |
project_name | Optional[str] |
trend_data | List[FATrendDataPoint] |
FAPlaygroundResult
Result from the playground endpoint.
| Field | Type | Description |
|---|---|---|
run_id | str | Run identifier. |
summary | FASummary | Failure summary. |
failure_tags | List[FATag] | Detected failures. |
checks_passed | int | Number of passed checks. |
security_flags | Dict[str, str] | Per-category security flags. Values are "fail" or "pass" keyed by S-category code. |
duration_ms | int | Processing time. |
playground_runs_today | int | Runs used today. |
playground_limit_per_day | int | Daily limit (10). |
FAJobStatus
Status of an async FA job.
| Field | Type | Description |
|---|---|---|
job_id | str | Job identifier. |
job_type | str | "failure_analysis". |
status | str | "queued", "running", "completed", "failed", "cancelled". |
progress_percent | float | Progress (0–100). |
current_item | int | Current item being processed. |
total_items | int | Total items. |
estimated_remaining_seconds | Optional[int] | Estimated time remaining. |
error | Optional[str] | Error message if failed. |
is_running, is_completed, is_failed (all bool).
Related
- FailureAnalysisClient — Methods returning these models
- Evaluation Models — Eval data models
- Security Models — Security data models