How Valiqor categorizes failures: buckets, subcategories, detector types, and scope.
Valiqor’s Failure Analysis engine classifies every failure into a
two-level taxonomy: top-level buckets (L1) group related failure types,
and subcategories (L2) identify the specific failure mode.This taxonomy is the foundation that powers severity scoring, root cause
detection, and actionable remediation suggestions.
The applies_to field controls which subcategories are active based on your
application type. Set this via the feature_kind parameter in run():
Type
Value
Description
All
all
Universal — applies to every app type
RAG
rag
Retrieval-augmented generation
Agent
agent
Autonomous agent with tool use
Agentic RAG
agentic_rag
Agent + retrieval
Tool Agent
tool_agent
Agent focused on tool orchestration
Multimodal
multimodal
Multi-modal applications
When you specify feature_kind="rag", only subcategories that apply to RAG
apps (plus universal ones) are evaluated. This keeps results relevant and
reduces LLM judge calls.
Each subcategory uses one of three detection approaches:
Deterministic
Rule-based checks that don’t require an LLM. Fast, cheap, and
highly reproducible. Used for format validation, tool argument
checking, and pattern matching.
LLM Judge
An LLM evaluates the input/output against the subcategory definition.
Used for semantic analysis like hallucination detection, intent
resolution, and relevance assessment.
Hybrid
Deterministic pre-filter followed by LLM judge confirmation.
Combines speed of rules with accuracy of LLM judgment. Used for
fabrication detection and citation verification.
Each subcategory is linked to evaluation metrics that provide supporting
evidence. For example, unsupported_factual_claim correlates with metrics
like hallucination and factual_accuracy.When Failure Analysis detects a failure, it cross-references evaluation
metric scores to strengthen or weaken its confidence in the classification.
This means running evaluations alongside Failure Analysis produces
higher-quality results.
The taxonomy is versioned and frozen to ensure reproducibility:
v0 — 4 buckets, 15 subcategories (current)
New subcategories can be added in future versions without breaking
existing classifications
Each FATag result includes the taxonomy version used
The taxonomy is designed to be extensible. Future versions may add new
buckets (e.g. multi-modal failures) or subcategories without changing
existing classifications.
from valiqor import ValiqorClientclient = ValiqorClient()fa = client.failure_analysis# Get full taxonomytaxonomy = fa.get_taxonomy()# Get subcategories for a specific bucketsubs = fa.get_subcategories(bucket_id="hallucination_grounding")# Get details for a specific bucketdetails = fa.get_bucket_details(bucket_id="tool_failures")