Use this file to discover all available pages before exploring further.
Failure Analysis (FA) classifies your AI outputs into failure buckets and subcategories, scores severity, and explains root causes — so you can fix the actual problem, not just observe metrics.
New to Valiqor? Start with the 5-minute quickstart to see FA in action before diving into this full guide.
Pass your existing AI inputs/outputs directly — no tracing required.
from valiqor import ValiqorClientclient = ValiqorClient(api_key="vq_...", project_name="my-app")result = client.failure_analysis.run( dataset=[ { "input": "What is the capital of France?", "output": "The capital of France is Berlin.", "context": ["The capital of France is Paris."], }, { "input": "Summarize the climate report.", "output": "The report discusses economic trends in Asia.", "context": ["Global temperatures rose 1.1°C since pre-industrial times."], }, ])
result = client.failure_analysis.run( # Source — provide one of: trace_id=None, # Trace ID for full trace analysis dataset=None, # List of {input, output, context} dicts # Configuration project_name=None, # Overrides client-level project_name feature_kind=None, # "rag", "agent", "agentic_rag", "generic_llm" run_eval=True, # Run evaluation metrics alongside FA run_security=True, # Run security audit alongside FA run_scan=True, # Attach AST scan data if available # Filtering mandatory_eval_metrics=None, # Eval metrics that must run mandatory_security_categories=None, # Security categories that must run subcategories=None, # Filter to specific failure subcategories buckets=None, # Filter to specific failure buckets # BYOK openai_api_key=None, # Your OpenAI key for LLM judges)
run() is transparent-async — if the backend returns HTTP 202, the SDK auto-polls until complete. For explicit async control, use run_async() instead.
# Eval metrics from the same runprint(f"Eval run ID: {result.eval_run_id}")print(f"Eval metrics: {result.eval_metrics}")# Security flags — values are "fail" or "pass" per S-categoryprint(f"Security batch: {result.security_batch_id}")print(f"Security flags: {result.security_flags}")# Example output: {'S1': 'pass', 'S9': 'fail', 'S10': 'pass', ...}
# Get full taxonomy (buckets + subcategories)taxonomy = client.failure_analysis.get_taxonomy()for bucket in taxonomy: print(f"\n{bucket['name']} ({bucket['id']})") for sub in bucket.get("subcategories", []): print(f" - {sub['name']} ({sub['scope']}, {sub['detector_type']})")
# Get subcategories for a specific bucketsubs = client.failure_analysis.get_subcategories( bucket_id="hallucination", scope="rag", # Filter by scope detector_type="llm_judge", # Filter by detector)
# Deep-dive into a bucketdetails = client.failure_analysis.get_bucket_details(bucket_id="hallucination")
# List past runsruns = client.failure_analysis.list_runs( project_name="my-app", limit=10, offset=0,)# Count runscount = client.failure_analysis.count_runs(project_name="my-app")# Get original inputs for a runinputs = client.failure_analysis.get_run_inputs(run_id="run_abc123")# Get tags for a runtags = client.failure_analysis.get_tags(run_id="run_abc123")
Quick single-item analysis for testing — rate-limited to 10/day, 2/min:
result = client.failure_analysis.playground( input_text="What medications interact with warfarin?", output_text="Warfarin has no known drug interactions.", context=["Warfarin interacts with aspirin, ibuprofen, and many antibiotics."],)
# Run FA from a dataset filevaliqor fa run --dataset test_data.json --project-name my-app# Run FA from a tracevaliqor fa run --trace-id tr_abc123# Check statusvaliqor fa status --run-id run_xyz# Get resultsvaliqor fa result --run-id run_xyz --output results.json# List past runsvaliqor fa list --project-name my-app --limit 10
Fix the Failure →
Learn how to interpret results and fix root causes.
Failure Taxonomy →
Understand buckets, subcategories, and classification.