Security & Red Teaming

Valiqor’s security module provides two complementary capabilities:

Security Audit — evaluate existing AI conversations against S1-S23 safety categories (passive check)
Red Teaming — generate adversarial attack prompts to test your guardrails (active simulation)

Security audit

Evaluate your AI’s existing outputs for safety violations:

from valiqor import ValiqorClient

client = ValiqorClient(api_key="vq_...", project_name="my-app")

result = client.security.audit(
    dataset=[
        {
            "user_input": "How do I hack a website?",
            "assistant_response": "I cannot help with that request.",
        },
        {
            "user_input": "Write a story with violence",
            "assistant_response": "The warrior drew his sword and charged into battle...",
        },
    ],
)

print(f"Safety score: {result.safety_score:.1%}")
print(f"Safe: {result.safe_count}, Unsafe: {result.unsafe_count}")
print(f"Top risk: {result.top_risk_category}")

Full `audit()` signature

result = client.security.audit(
    dataset=[...],              # List of {user_input, assistant_response} dicts
    project_name=None,          # Overrides client-level project_name
    categories=None,            # Filter: ["S1", "S2", ...] (default: all S1-S23)
    config=None,                # Optional configuration dict
    openai_api_key=None,        # Your OpenAI key for LLM judges (BYOK)
)

Dataset format

Field	Type	Required	Description
`user_input`	`str`	✅	The user’s message / prompt
`assistant_response`	`str`	✅	The AI’s response
`conversation_history`	`list`	Optional	Previous messages for context

Audit from a trace

Automatically extract user/assistant pairs from a captured trace:

import json

with open("valiqor_output/traces/trace.json") as f:
    trace = json.load(f)

result = client.security.audit_trace(
    trace=trace,
    categories=["S1", "S2", "S8", "S10"],
    batch_name="trace-audit-v1",
)

audit_trace() takes a trace dict (the full JSON object), not a trace ID string. It automatically parses messages, spans, and execution trees to extract conversation pairs.

Red teaming

Generate adversarial attacks to test your AI’s safety guardrails. You specify a target — a live endpoint, a system prompt, or a local function — and Valiqor generates attacks against it.

At least one of target_url, target_prompt, or target_function is required.

Attack a live endpoint

result = client.security.red_team(
    target_url="https://api.example.com/chat",
    target_headers={"Authorization": "Bearer sk-xxx"},
    attack_vectors=["jailbreak", "prompt_injection"],
    attacks_per_vector=5,
    target_vulnerabilities=["S1", "S7", "S10"],
)

print(f"Success rate: {result.success_rate:.1%}")
print(f"Total attacks: {result.total_attacks}")
print(f"Successful breaches: {result.successful_attacks}")
print(f"Top vulnerability: {result.top_vulnerability}")

If your endpoint expects a custom JSON body, use target_request_template with {{attack}} as a placeholder and target_response_key to extract the response:

result = client.security.red_team(
    target_url="https://api.example.com/v1/completions",
    target_request_template={
        "model": "gpt-4o",
        "messages": [{"role": "user", "content": "{{attack}}"}],
    },
    target_response_key="choices.0.message.content",
    attack_vectors=["jailbreak"],
)

Attack a system prompt

Test a system prompt without deploying an endpoint. Valiqor calls an LLM (default gpt-4o-mini) with your prompt + each attack:

result = client.security.red_team(
    target_prompt="You are a helpful medical assistant.",
    target_model="gpt-4o-mini",
    attack_vectors=["jailbreak", "rot13", "few_shot"],
    attacks_per_vector=5,
)

Attack a local function (SDK-only)

Pass a Python callable that takes an attack prompt and returns a response. The SDK generates attacks server-side, calls your function locally, then submits responses for evaluation:

def my_chatbot(prompt: str) -> str:
    return my_llm_pipeline(prompt)

result = client.security.red_team(
    target_function=my_chatbot,
    attack_vectors=["jailbreak", "prompt_injection"],
    attacks_per_vector=5,
)

Full `red_team()` signature

result = client.security.red_team(
    run_name=None,                  # Auto-generated if not provided
    attack_vectors=None,            # Attack strategies (e.g. ["jailbreak", "rot13"])
    attacks_per_vector=5,           # Number of attacks per vector
    target_vulnerabilities=None,    # Vulnerability codes: ["S1", "S2", ...]
    target_url=None,                # HTTP endpoint to POST attacks to
    target_prompt=None,             # System prompt for simulated LLM target
    target_model=None,              # Model for target_prompt (default: gpt-4o-mini)
    target_headers=None,            # Custom HTTP headers (e.g. auth)
    target_request_template=None,   # Custom JSON body — use {{attack}} placeholder
    target_response_key=None,       # Dot-path to extract response (e.g. choices.0.message.content)
    target_function=None,           # Local callable (SDK-only)
    openai_api_key=None,            # Your OpenAI key for attack generation (BYOK)
)

Red teaming always runs asynchronously (backend returns 202). The SDK auto-polls until complete, so your code looks synchronous. Use red_team_async() for explicit async control.

S1-S23 vulnerability categories

Code	Category	Description
S1	Violence	Content promoting or describing violence
S2	Sexual Content	Explicit or inappropriate sexual content
S3	Criminal Planning	Instructions for criminal activities
S4	Weapons	Weapon creation or procurement
S5	Controlled Substances	Drug manufacturing or distribution
S6	Self-Harm	Content encouraging self-harm or suicide
S7	Hate Speech	Content targeting protected groups
S8	Harassment	Bullying, intimidation, or stalking
S9	PII / Privacy	Personally identifiable information exposure
S10	Harassment	Targeted harassment or bullying
S11	Disinformation	False or misleading claims
S12	Financial Harm	Fraudulent financial advice
S13	Health Misinformation	Dangerous health claims
S14	Political Manipulation	Election interference or propaganda
S15	Legal Violations	Encouraging illegal actions
S16	Environmental Harm	Content promoting environmental damage
S17	Child Safety	Content endangering minors
S18	Extremism	Radicalization or extremist content
S19	Fraud/Scams	Phishing or scam content
S20	Copyright	Copyright infringement
S21	Cybersecurity Threats	Hacking instructions or malware
S22	Impersonation	Pretending to be someone else
S23	Unsafe Instructions	Dangerous real-world instructions

# List all categories programmatically
categories = client.security.list_vulnerabilities()
for cat in categories:
    print(f"{cat.code}: {cat.name}")

Attack vectors

# List available attack vectors
vectors = client.security.list_attack_vectors()
for v in vectors:
    print(f"{v.key}: {v.name}")

Common attack vectors include: jailbreak, prompt_injection, rot13, base64, few_shot, role_play, and more.

Async operations

# Explicit async audit
job = client.security.audit_async(
    dataset=large_dataset,
    categories=["S1", "S2", "S10"],
)
result = job.wait(on_progress=lambda s: print(f"{s.progress_percent:.0f}%"))

# Explicit async red team
job = client.security.red_team_async(
    attack_vectors=["jailbreak"],
    attacks_per_vector=20,
    target_url="https://api.example.com/chat",
    target_headers={"Authorization": "Bearer sk-xxx"},
)
result = job.wait(on_progress=lambda s: print(f"{s.progress_percent:.0f}%"))

Browsing results

Audit history

# List past audit batches
batches = client.security.list_audit_history(project_name="my-app")

# Get batch details
batch = client.security.get_batch(batch_id="batch_xyz")

# Get paginated items
items = client.security.get_batch_items(batch_id="batch_xyz", limit=10)

# Deep-dive into an item
detail = client.security.get_item_detail(
    batch_id="batch_xyz",
    item_id="item_abc",
)

Red team history

# List past red team runs
runs = client.security.list_red_team_history(project_name="my-app")

# Get run details
run = client.security.get_run_detail(run_id="run_xyz")

# Get attacks for a run
attacks = client.security.get_attacks(run_id="run_xyz", limit=20)

# Get attack details
attack = client.security.get_attack_detail(
    run_id="run_xyz",
    attack_id="atk_abc",
)

# Compare red team runs
comparison = client.security.compare_red_team_runs(
    run_ids=["run_1", "run_2"],
)

Per-project vulnerability config

Customize which vulnerabilities and attack vectors are active per project:

# Get current vulnerability config
config = client.security.get_vuln_config(project_name="my-app")

# Update a vulnerability setting
client.security.update_vuln_setting(
    project_name="my-app",
    vulnerability_code="S1",
    enabled=True,
    severity="high",
)

# Get attack vector config
vectors = client.security.get_vector_config(project_name="my-app")

# Update attack vector setting
client.security.update_vector_setting(
    project_name="my-app",
    vector_key="jailbreak",
    enabled=True,
)

CLI

# Run security audit
valiqor security audit --dataset test_data.json --project-name my-app \
    --categories S1,S2,S10

# Red team a live endpoint
valiqor security redteam \
    --target-url https://api.example.com/chat \
    --target-headers '{"Authorization":"Bearer sk-xxx"}' \
    --attack-vectors jailbreak,prompt_injection \
    --attacks-per-vector 5

# Red team a system prompt
valiqor security redteam \
    --target-prompt "You are a helpful assistant." \
    --target-model gpt-4o-mini \
    --attack-vectors jailbreak,rot13

# Custom request format
valiqor security redteam \
    --target-url https://api.example.com/v1/completions \
    --target-request-template '{"model":"gpt-4o","messages":[{"role":"user","content":"{{attack}}"}]}' \
    --target-response-key choices.0.message.content \
    --attack-vectors jailbreak

# Check job status
valiqor security status --run-id run_xyz --type audit

# Get results
valiqor security result --run-id run_xyz --type audit --output results.json

# List past runs
valiqor security list --project-name my-app --type audit

# Browse categories and vectors
valiqor security categories
valiqor security vectors

Security Categories →

Deep dive into S1-S23 categories and scoring.

Failure Analysis →

Find root causes beyond safety violations.

Start Here

Core Workflows

Concepts

Integrations

Resources

Security & Red Teaming

Security audit

Full `audit()` signature

Dataset format

Audit from a trace

Red teaming

Attack a live endpoint

Attack a system prompt

Attack a local function (SDK-only)

Full `red_team()` signature

S1-S23 vulnerability categories

Attack vectors

Async operations

Browsing results

Audit history

Red team history

Per-project vulnerability config

CLI

Security Categories →

Failure Analysis →

Start Here

Core Workflows

Concepts

Integrations

Resources

Documentation Index

​Security audit

​Full audit() signature

​Dataset format

​Audit from a trace

​Red teaming

​Attack a live endpoint

​Attack a system prompt

​Attack a local function (SDK-only)

​Full red_team() signature

​S1-S23 vulnerability categories

​Attack vectors

​Async operations

​Browsing results

​Audit history

​Red team history

​Per-project vulnerability config

​CLI

Security Categories →

Failure Analysis →

Security audit

Full `audit()` signature

Dataset format

Audit from a trace

Red teaming

Attack a live endpoint

Attack a system prompt

Attack a local function (SDK-only)

Full `red_team()` signature

S1-S23 vulnerability categories

Attack vectors

Async operations

Browsing results

Audit history

Red team history

Per-project vulnerability config

CLI