Go from zero to your first failure analysis result in under 5 minutes — no tracing or instrumentation required.
Run Failure Analysis on your existing AI data in three steps. No tracing, no instrumentation, no OpenAI key needed.
This quickstart uses dataset mode — you pass your AI inputs and outputs directly. If you already have tracing set up, see Failure Analysis from traces instead.
from valiqor import ValiqorClientclient = ValiqorClient(api_key="vq_your_api_key_here", project_name="quickstart")# Run FA on existing data — no tracing requiredresult = client.failure_analysis.run( dataset=[ { "input": "What is the capital of France?", "output": "The capital of France is Berlin.", "context": [ "France is a country in Western Europe.", "The capital of France is Paris.", "Paris is known as the City of Light.", ], } ])# Print the summaryprint(f"Status: {result.status}")print(f"Failures detected: {result.summary.total_failures_detected}")print(f"Overall severity: {result.summary.overall_severity}/5")print(f"Should alert: {result.summary.should_alert}")print()# Print each failurefor tag in result.failure_tags: if tag.decision == "fail": print(f"❌ [{tag.bucket_name}] {tag.subcategory_name}") print(f" Severity: {tag.severity}/5 Confidence: {tag.confidence}") if tag.judge_rationale: print(f" Rationale: {tag.judge_rationale}") print()
The output intentionally passes a wrong answer (“Berlin” instead of “Paris”) so you can see a real failure detected:
Copy
Status: completedFailures detected: 1Overall severity: 4/5Should alert: True❌ [Hallucination] Entity Fabrication Severity: 4/5 Confidence: 0.95 Rationale: The output states Berlin is the capital of France, which directly contradicts the provided context that identifies Paris as the capital. This is a factual fabrication.
The exact bucket names, severity scores, and rationale text may vary slightly depending on your backend configuration and LLM judge version.
You can pass multiple input/output pairs in a single call:
Copy
result = client.failure_analysis.run( dataset=[ { "input": "What is the capital of France?", "output": "The capital of France is Paris.", "context": ["The capital of France is Paris."], }, { "input": "Summarize the article about climate change.", "output": "The article discusses economic policy in the 1990s.", "context": [ "Climate change is causing rising sea levels worldwide.", "The Paris Agreement aims to limit warming to 1.5°C.", ], }, { "input": "What medications interact with warfarin?", "output": "Warfarin interacts with aspirin and ibuprofen.", "context": [ "Warfarin interacts with aspirin, ibuprofen, and many antibiotics.", "Always consult a healthcare provider for drug interactions.", ], }, ])print(f"Items analyzed: {result.summary.total_items}")print(f"Items with failures: {result.summary.items_with_failures}")print(f"Items all passed: {result.summary.items_all_passed}")
Once you’re comfortable with dataset mode, you can add auto-tracing to capture LLM calls in production and run Failure Analysis on traces:
Copy
import valiqor.auto # Auto-instruments OpenAI, Anthropic, LangChain# Your normal LLM calls are now traced automatically# Then run FA on the captured trace:result = client.failure_analysis.run(trace_id="tr_abc123")