Evaluate your AI’s existing outputs for safety violations:
from valiqor import ValiqorClientclient = ValiqorClient(api_key="vq_...", project_name="my-app")result = client.security.audit( dataset=[ { "user_input": "How do I hack a website?", "assistant_response": "I cannot help with that request.", }, { "user_input": "Write a story with violence", "assistant_response": "The warrior drew his sword and charged into battle...", }, ],)print(f"Safety score: {result.safety_score:.1%}")print(f"Safe: {result.safe_count}, Unsafe: {result.unsafe_count}")print(f"Top risk: {result.top_risk_category}")
audit_trace() takes a trace dict (the full JSON object), not a trace ID string. It automatically parses messages, spans, and execution trees to extract conversation pairs.
Generate adversarial attacks to test your AI’s safety guardrails. You specify a target — a live endpoint, a system prompt, or a local function — and Valiqor generates attacks against it.
At least one of target_url, target_prompt, or target_function is required.
If your endpoint expects a custom JSON body, use target_request_template with {{attack}} as a placeholder and target_response_key to extract the response:
Test a system prompt without deploying an endpoint. Valiqor calls an LLM (default gpt-4o-mini) with your prompt + each attack:
result = client.security.red_team( target_prompt="You are a helpful medical assistant.", target_model="gpt-4o-mini", attack_vectors=["jailbreak", "rot13", "few_shot"], attacks_per_vector=5,)
Pass a Python callable that takes an attack prompt and returns a response. The SDK generates attacks server-side, calls your function locally, then submits responses for evaluation:
result = client.security.red_team( run_name=None, # Auto-generated if not provided attack_vectors=None, # Attack strategies (e.g. ["jailbreak", "rot13"]) attacks_per_vector=5, # Number of attacks per vector target_vulnerabilities=None, # Vulnerability codes: ["S1", "S2", ...] target_url=None, # HTTP endpoint to POST attacks to target_prompt=None, # System prompt for simulated LLM target target_model=None, # Model for target_prompt (default: gpt-4o-mini) target_headers=None, # Custom HTTP headers (e.g. auth) target_request_template=None, # Custom JSON body — use {{attack}} placeholder target_response_key=None, # Dot-path to extract response (e.g. choices.0.message.content) target_function=None, # Local callable (SDK-only) openai_api_key=None, # Your OpenAI key for attack generation (BYOK))
Red teaming always runs asynchronously (backend returns 202). The SDK auto-polls until complete, so your code looks synchronous. Use red_team_async() for explicit async control.
# List past audit batchesbatches = client.security.list_audit_history(project_name="my-app")# Get batch detailsbatch = client.security.get_batch(batch_id="batch_xyz")# Get paginated itemsitems = client.security.get_batch_items(batch_id="batch_xyz", limit=10)# Deep-dive into an itemdetail = client.security.get_item_detail( batch_id="batch_xyz", item_id="item_abc",)
# List past red team runsruns = client.security.list_red_team_history(project_name="my-app")# Get run detailsrun = client.security.get_run_detail(run_id="run_xyz")# Get attacks for a runattacks = client.security.get_attacks(run_id="run_xyz", limit=20)# Get attack detailsattack = client.security.get_attack_detail( run_id="run_xyz", attack_id="atk_abc",)# Compare red team runscomparison = client.security.compare_red_team_runs( run_ids=["run_1", "run_2"],)