Skip to main content

Valiqor finds why your AI app fails

Most evaluation tools tell you something is wrong. Valiqor tells you what went wrong, why it happened, and how to fix it — with severity scoring, root-cause analysis, and actionable remediation.
You can get started in 5 minutes — no tracing or instrumentation needed. Just pass your existing AI inputs and outputs to Failure Analysis and see results immediately.

Five modules, one SDK

Valiqor’s Python SDK exposes five modules through a single client:
from valiqor import ValiqorClient

client = ValiqorClient(api_key="vq_...")

client.failure_analysis   # Root-cause failure detection
client.eval               # Metric-based evaluation
client.security           # Red-team audits & vulnerability scanning
client.trace              # Auto-instrumentation for OpenAI, Anthropic, LangChain
client.scanner            # Repository & prompt scanning
ModuleWhat it doesWhen to use it
Failure AnalysisClassifies failures into buckets & subcategories, scores severity (0–5), explains root causeFirst thing to run — works on any input/output data
EvaluationRuns metric-based checks (hallucination, relevance, coherence, etc.)When you need granular metric scores
SecurityRuns red-team attacks, checks for prompt injection, data leakage, jailbreaksBefore deploying to production
TracingAuto-captures LLM calls, tool use, retrieval steps as structured tracesWhen you want continuous monitoring in production
ScannerScans repositories and prompts for best-practice violationsDuring code review or CI

Two ways to get started


Who is Valiqor for?

Valiqor is built for AI engineers and teams who build applications on top of:
  • LLM providers — OpenAI, Anthropic, Google, Mistral, Cohere, and any OpenAI-compatible API
  • Orchestration frameworks — LangChain, CrewAI, LlamaIndex, Haystack
  • Custom pipelines — RAG systems, agent loops, multi-step chains
Whether you’re prototyping or running in production, Valiqor helps you catch and fix failures before your users do.

What you get

1

Failure taxonomy

Every failure is classified into buckets (e.g., Hallucination, Context Ignorance) and subcategories (e.g., Entity Fabrication, Contradicts Source).
2

Root-cause analysis

Each failure includes a judge_rationale explaining exactly what went wrong, plus a scoring_breakdown with evidence.
3

Severity & confidence scoring

Severity from 0 (no issue) to 5 (critical). Confidence from 0.0 to 1.0. Filter and prioritize failures automatically.
4

Actionable remediation

Fix your prompts, retrieval pipeline, or guardrails based on concrete failure evidence — not guesswork.

Ready? See your first failure →

Run Failure Analysis on your existing data in under 5 minutes.