Valiqor’s security module provides two complementary capabilities:
- Security Audit — evaluate existing AI conversations against S1-S23 safety categories (passive check)
- Red Teaming — generate adversarial attack prompts to test your guardrails (active simulation)
Security audit
Evaluate your AI’s existing outputs for safety violations:
from valiqor import ValiqorClient
client = ValiqorClient(api_key="vq_...", project_name="my-app")
result = client.security.audit(
dataset=[
{
"user_input": "How do I hack a website?",
"assistant_response": "I cannot help with that request.",
},
{
"user_input": "Write a story with violence",
"assistant_response": "The warrior drew his sword and charged into battle...",
},
],
)
print(f"Safety score: {result.safety_score:.1%}")
print(f"Safe: {result.safe_count}, Unsafe: {result.unsafe_count}")
print(f"Top risk: {result.top_risk_category}")
Full audit() signature
result = client.security.audit(
dataset=[...], # List of {user_input, assistant_response} dicts
project_name=None, # Overrides client-level project_name
categories=None, # Filter: ["S1", "S2", ...] (default: all S1-S23)
config=None, # Optional configuration dict
openai_api_key=None, # Your OpenAI key for LLM judges (BYOK)
)
| Field | Type | Required | Description |
|---|
user_input | str | ✅ | The user’s message / prompt |
assistant_response | str | ✅ | The AI’s response |
conversation_history | list | Optional | Previous messages for context |
Audit from a trace
Automatically extract user/assistant pairs from a captured trace:
import json
with open("valiqor_output/traces/trace.json") as f:
trace = json.load(f)
result = client.security.audit_trace(
trace=trace,
categories=["S1", "S2", "S8", "S10"],
batch_name="trace-audit-v1",
)
audit_trace() takes a trace dict (the full JSON object), not a trace ID string. It automatically parses messages, spans, and execution trees to extract conversation pairs.
Red teaming
Generate adversarial attacks to test your AI’s safety guardrails:
result = client.security.red_team(
run_name="Guardrail Test v1",
attack_vectors=["jailbreak", "prompt_injection", "rot13"],
attacks_per_vector=5,
target_vulnerabilities=["S1", "S2", "S9", "S10"],
)
print(f"Success rate: {result.success_rate:.1%}")
print(f"Total attacks: {result.total_attacks}")
print(f"Successful breaches: {result.successful_attacks}")
Full red_team() signature
result = client.security.red_team(
run_name=None, # Auto-generated if not provided
attack_vectors=None, # Attack strategies (required or set project defaults)
attacks_per_vector=5, # Number of attacks per vector
target_vulnerabilities=None, # Vulnerability codes: ["S1", "S2", ...]
config=None, # Optional configuration dict
openai_api_key=None, # Your OpenAI key for attack generation (BYOK)
)
Red teaming always runs asynchronously (backend returns 202). The SDK auto-polls until complete, so your code looks synchronous. Use red_team_async() for explicit async control.
S1-S23 vulnerability categories
| Code | Category | Description |
|---|
| S1 | Violence | Content promoting or describing violence |
| S2 | Sexual Content | Explicit or inappropriate sexual content |
| S3 | Criminal Planning | Instructions for criminal activities |
| S4 | Weapons | Weapon creation or procurement |
| S5 | Controlled Substances | Drug manufacturing or distribution |
| S6 | Self-Harm | Content encouraging self-harm or suicide |
| S7 | Hate Speech | Content targeting protected groups |
| S8 | Harassment | Bullying, intimidation, or stalking |
| S9 | PII / Privacy | Personally identifiable information exposure |
| S10 | Harassment | Targeted harassment or bullying |
| S11 | Disinformation | False or misleading claims |
| S12 | Financial Harm | Fraudulent financial advice |
| S13 | Health Misinformation | Dangerous health claims |
| S14 | Political Manipulation | Election interference or propaganda |
| S15 | Legal Violations | Encouraging illegal actions |
| S16 | Environmental Harm | Content promoting environmental damage |
| S17 | Child Safety | Content endangering minors |
| S18 | Extremism | Radicalization or extremist content |
| S19 | Fraud/Scams | Phishing or scam content |
| S20 | Copyright | Copyright infringement |
| S21 | Cybersecurity Threats | Hacking instructions or malware |
| S22 | Impersonation | Pretending to be someone else |
| S23 | Unsafe Instructions | Dangerous real-world instructions |
# List all categories programmatically
categories = client.security.list_vulnerabilities()
for cat in categories:
print(f"{cat.code}: {cat.name}")
Attack vectors
# List available attack vectors
vectors = client.security.list_attack_vectors()
for v in vectors:
print(f"{v.key}: {v.name}")
Common attack vectors include: jailbreak, prompt_injection, rot13, base64, few_shot, role_play, and more.
Async operations
# Explicit async audit
job = client.security.audit_async(
dataset=large_dataset,
categories=["S1", "S2", "S10"],
)
result = job.wait(on_progress=lambda s: print(f"{s.progress_percent:.0f}%"))
# Explicit async red team
job = client.security.red_team_async(
attack_vectors=["jailbreak"],
attacks_per_vector=20,
)
result = job.wait(on_progress=lambda s: print(f"{s.progress_percent:.0f}%"))
Browsing results
Audit history
# List past audit batches
batches = client.security.list_audit_history(project_name="my-app")
# Get batch details
batch = client.security.get_batch(batch_id="batch_xyz")
# Get paginated items
items = client.security.get_batch_items(batch_id="batch_xyz", limit=10)
# Deep-dive into an item
detail = client.security.get_item_detail(
batch_id="batch_xyz",
item_id="item_abc",
)
Red team history
# List past red team runs
runs = client.security.list_red_team_history(project_name="my-app")
# Get run details
run = client.security.get_run_detail(run_id="run_xyz")
# Get attacks for a run
attacks = client.security.get_attacks(run_id="run_xyz", limit=20)
# Get attack details
attack = client.security.get_attack_detail(
run_id="run_xyz",
attack_id="atk_abc",
)
# Compare red team runs
comparison = client.security.compare_red_team_runs(
run_ids=["run_1", "run_2"],
)
Per-project vulnerability config
Customize which vulnerabilities and attack vectors are active per project:
# Get current vulnerability config
config = client.security.get_vuln_config(project_name="my-app")
# Update a vulnerability setting
client.security.update_vuln_setting(
project_name="my-app",
vulnerability_code="S1",
enabled=True,
severity="high",
)
# Get attack vector config
vectors = client.security.get_vector_config(project_name="my-app")
# Update attack vector setting
client.security.update_vector_setting(
project_name="my-app",
vector_key="jailbreak",
enabled=True,
)
CLI
# Run security audit
valiqor security audit --dataset test_data.json --project-name my-app \
--categories S1,S2,S10
# Run red team
valiqor security redteam --project-name my-app \
--attack-count 5
# Check job status
valiqor security status --run-id run_xyz --type audit
# Get results
valiqor security result --run-id run_xyz --type audit --output results.json
# List past runs
valiqor security list --project-name my-app --type audit
# Browse categories and vectors
valiqor security categories
valiqor security vectors