Run failure analysis on your AI app’s outputs to identify root causes, failure patterns, and get actionable fixes.
Failure Analysis (FA) classifies your AI outputs into failure buckets and subcategories, scores severity, and explains root causes — so you can fix the actual problem, not just observe metrics.
New to Valiqor? Start with the 5-minute quickstart to see FA in action before diving into this full guide.
Pass your existing AI inputs/outputs directly — no tracing required.
from valiqor import ValiqorClientclient = ValiqorClient(api_key="vq_...", project_name="my-app")result = client.failure_analysis.run( dataset=[ { "input": "What is the capital of France?", "output": "The capital of France is Berlin.", "context": ["The capital of France is Paris."], }, { "input": "Summarize the climate report.", "output": "The report discusses economic trends in Asia.", "context": ["Global temperatures rose 1.1°C since pre-industrial times."], }, ])
result = client.failure_analysis.run( # Source — provide one of: trace_id=None, # Trace ID for full trace analysis dataset=None, # List of {input, output, context} dicts # Configuration project_name=None, # Overrides client-level project_name feature_kind=None, # "rag", "agent", "agentic_rag", "generic_llm" run_eval=True, # Run evaluation metrics alongside FA run_security=True, # Run security audit alongside FA run_scan=True, # Attach AST scan data if available # Filtering mandatory_eval_metrics=None, # Eval metrics that must run mandatory_security_categories=None, # Security categories that must run subcategories=None, # Filter to specific failure subcategories buckets=None, # Filter to specific failure buckets # BYOK openai_api_key=None, # Your OpenAI key for LLM judges)
run() is transparent-async — if the backend returns HTTP 202, the SDK auto-polls until complete. For explicit async control, use run_async() instead.
# Eval metrics from the same runprint(f"Eval run ID: {result.eval_run_id}")print(f"Eval metrics: {result.eval_metrics}")# Security flags — values are "fail" or "pass" per S-categoryprint(f"Security batch: {result.security_batch_id}")print(f"Security flags: {result.security_flags}")# Example output: {'S1': 'pass', 'S9': 'fail', 'S10': 'pass', ...}
# Get full taxonomy (buckets + subcategories)taxonomy = client.failure_analysis.get_taxonomy()for bucket in taxonomy: print(f"\n{bucket['name']} ({bucket['id']})") for sub in bucket.get("subcategories", []): print(f" - {sub['name']} ({sub['scope']}, {sub['detector_type']})")
# Get subcategories for a specific bucketsubs = client.failure_analysis.get_subcategories( bucket_id="hallucination", scope="rag", # Filter by scope detector_type="llm_judge", # Filter by detector)
# Deep-dive into a bucketdetails = client.failure_analysis.get_bucket_details(bucket_id="hallucination")
# List past runsruns = client.failure_analysis.list_runs( project_name="my-app", limit=10, offset=0,)# Count runscount = client.failure_analysis.count_runs(project_name="my-app")# Get original inputs for a runinputs = client.failure_analysis.get_run_inputs(run_id="run_abc123")# Get tags for a runtags = client.failure_analysis.get_tags(run_id="run_abc123")
Quick single-item analysis for testing — rate-limited to 10/day, 2/min:
result = client.failure_analysis.playground( input_text="What medications interact with warfarin?", output_text="Warfarin has no known drug interactions.", context=["Warfarin interacts with aspirin, ibuprofen, and many antibiotics."],)
# Run FA from a dataset filevaliqor fa run --dataset test_data.json --project-name my-app# Run FA from a tracevaliqor fa run --trace-id tr_abc123# Check statusvaliqor fa status --run-id run_xyz# Get resultsvaliqor fa result --run-id run_xyz --output results.json# List past runsvaliqor fa list --project-name my-app --limit 10
Fix the Failure →
Learn how to interpret results and fix root causes.
Failure Taxonomy →
Understand buckets, subcategories, and classification.