Skip to main content
Run Failure Analysis on your existing AI data in three steps. No tracing, no instrumentation, no OpenAI key needed.
This quickstart uses dataset mode — you pass your AI inputs and outputs directly. If you already have tracing set up, see Failure Analysis from traces instead.

Step 1: Install the SDK

pip install valiqor

Step 2: Set your API key

export VALIQOR_API_KEY="vq_your_api_key_here"
Get your API key from the Valiqor Dashboard. You can also pass it directly via ValiqorClient(api_key="vq_...").

Step 3: Run Failure Analysis

Create a file called quickstart.py and paste:
quickstart.py
from valiqor import ValiqorClient

client = ValiqorClient(api_key="vq_your_api_key_here", project_name="quickstart")

# Run FA on existing data — no tracing required
result = client.failure_analysis.run(
    dataset=[
        {
            "input": "What is the capital of France?",
            "output": "The capital of France is Berlin.",
            "context": [
                "France is a country in Western Europe.",
                "The capital of France is Paris.",
                "Paris is known as the City of Light.",
            ],
        }
    ]
)

# Print the summary
print(f"Status: {result.status}")
print(f"Failures detected: {result.summary.total_failures_detected}")
print(f"Overall severity: {result.summary.overall_severity}/5")
print(f"Should alert: {result.summary.should_alert}")
print()

# Print each failure
for tag in result.failure_tags:
    if tag.decision == "fail":
        print(f"❌ [{tag.bucket_name}] {tag.subcategory_name}")
        print(f"   Severity: {tag.severity}/5  Confidence: {tag.confidence}")
        if tag.judge_rationale:
            print(f"   Rationale: {tag.judge_rationale}")
        print()
Run it:
python quickstart.py

What you’ll see

The output intentionally passes a wrong answer (“Berlin” instead of “Paris”) so you can see a real failure detected:
Status: completed
Failures detected: 1
Overall severity: 4/5
Should alert: True

❌ [Hallucination] Entity Fabrication
   Severity: 4/5  Confidence: 0.95
   Rationale: The output states Berlin is the capital of France,
   which directly contradicts the provided context that identifies
   Paris as the capital. This is a factual fabrication.
The exact bucket names, severity scores, and rationale text may vary slightly depending on your backend configuration and LLM judge version.

Understanding the result

The result object is an FARunResult with these key fields:
FieldTypeWhat it means
result.summary.total_failures_detectedintTotal number of failure tags with decision == "fail"
result.summary.overall_severityfloatWorst-case severity across all failures (0–5)
result.summary.should_alertboolWhether this result warrants an alert
result.failure_tagslist[FailureTag]Individual failure classifications
result.failure_tags[i].decisionstr"pass", "fail", or "unsure"
result.failure_tags[i].judge_rationaleOptional[str]Human-readable explanation of the failure (may be None for pass tags)

Next: Fix the failure →

Learn how to interpret the result, understand root causes, and fix your prompt.

Analyze multiple items at once

You can pass multiple input/output pairs in a single call:
result = client.failure_analysis.run(
    dataset=[
        {
            "input": "What is the capital of France?",
            "output": "The capital of France is Paris.",
            "context": ["The capital of France is Paris."],
        },
        {
            "input": "Summarize the article about climate change.",
            "output": "The article discusses economic policy in the 1990s.",
            "context": [
                "Climate change is causing rising sea levels worldwide.",
                "The Paris Agreement aims to limit warming to 1.5°C.",
            ],
        },
        {
            "input": "What medications interact with warfarin?",
            "output": "Warfarin interacts with aspirin and ibuprofen.",
            "context": [
                "Warfarin interacts with aspirin, ibuprofen, and many antibiotics.",
                "Always consult a healthcare provider for drug interactions.",
            ],
        },
    ]
)

print(f"Items analyzed: {result.summary.total_items}")
print(f"Items with failures: {result.summary.items_with_failures}")
print(f"Items all passed: {result.summary.items_all_passed}")

Optional: Add tracing later

Once you’re comfortable with dataset mode, you can add auto-tracing to capture LLM calls in production and run Failure Analysis on traces:
import valiqor.auto  # Auto-instruments OpenAI, Anthropic, LangChain

# Your normal LLM calls are now traced automatically
# Then run FA on the captured trace:
result = client.failure_analysis.run(trace_id="tr_abc123")
See the Tracing guide for full setup instructions.