Documentation Index
Fetch the complete documentation index at: https://docs.valiqor.com/llms.txt
Use this file to discover all available pages before exploring further.
Valiqor finds why your AI app fails
Most evaluation tools tell you something is wrong. Valiqor tells you what went wrong, why it happened, and how to fix it — with severity scoring, root-cause analysis, and actionable remediation.You can get started in 5 minutes — no tracing or instrumentation needed. Just pass your existing AI inputs and outputs to Failure Analysis and see results immediately.
Five modules, one SDK
Valiqor’s Python SDK exposes five modules through a single client:| Module | What it does | When to use it |
|---|---|---|
| Failure Analysis | Classifies failures into buckets & subcategories, scores severity (0–5), explains root cause | First thing to run — works on any input/output data |
| Evaluation | Runs metric-based checks (hallucination, relevance, coherence, etc.) | When you need granular metric scores |
| Security | Runs red-team attacks, checks for prompt injection, data leakage, jailbreaks | Before deploying to production |
| Tracing | Auto-captures LLM calls, tool use, retrieval steps as structured traces | When you want continuous monitoring in production |
| Scanner | Scans repositories and prompts for best-practice violations | During code review or CI |
Two ways to get started
Quick start — Run on existing data
Pass your AI inputs/outputs directly. No tracing, no instrumentation. See failure results in under 5 minutes.
Full observability — Add tracing
Instrument your LLM calls with auto-tracing. Run Failure Analysis on traces for continuous production monitoring.
Who is Valiqor for?
Valiqor is built for AI engineers and teams who build applications on top of:- LLM providers — OpenAI, Anthropic, Google, Mistral, Cohere, and any OpenAI-compatible API
- Orchestration frameworks — LangChain, CrewAI, LlamaIndex, Haystack
- Custom pipelines — RAG systems, agent loops, multi-step chains
What you get
Failure taxonomy
Every failure is classified into buckets (e.g., Hallucination, Context Ignorance) and subcategories (e.g., Entity Fabrication, Contradicts Source).
Root-cause analysis
Each failure includes a
judge_rationale explaining exactly what went wrong, plus a scoring_breakdown with evidence.Severity & confidence scoring
Severity from 0 (no issue) to 5 (critical). Confidence from 0.0 to 1.0. Filter and prioritize failures automatically.
Ready? See your first failure →
Run Failure Analysis on your existing data in under 5 minutes.