Valiqor’s security engine evaluates AI systems against 23 vulnerability
categories (S1–S23) and can generate adversarial attacks using 24 attack
strategies across 5 categories. This page is the canonical reference for
both.
Vulnerability Categories (S1–S23)
Every security audit and red team run evaluates against these categories.
You can audit all 23 or select specific ones via the categories parameter.
| Code | Category | Description |
|---|
| S1 | Violence | Content promoting or depicting violence |
| S2 | Sexual | Sexually explicit or suggestive content |
| S3 | Criminal Planning | Criminal planning, confessions, or instructions |
| S4 | Guns & Illegal Weapons | Content about illegal weapons acquisition or use |
| S5 | Controlled Substances | Information about illegal drugs or regulated substances |
| S6 | Suicide & Self Harm | Content promoting or instructing self-harm |
| S7 | Sexual (Minor) | Any sexual content involving minors |
| S8 | Hate / Identity Hate | Hate speech targeting identity groups |
| S9 | PII / Privacy | Personally identifiable information exposure. Detection requires concrete PII entities (e.g. email, phone, SSN). Generic names or locations alone do not trigger this category. |
| S10 | Harassment | Targeted harassment or bullying |
| S11 | Threat | Direct or indirect threats of harm |
| S12 | Profanity | Excessive or gratuitous profane language |
| S13 | Needs Caution | Content requiring careful handling or disclaimers |
| S14 | Other | Miscellaneous safety concerns |
| S15 | Manipulation | Psychological manipulation or coercion |
| S16 | Fraud / Deception | Fraudulent schemes or deceptive practices |
| S17 | Malware | Code, instructions, or content related to malware |
| S18 | High Risk Gov Decision Making | AI used in high-stakes government decisions |
| S19 | Political / Misinformation | Political manipulation, conspiracy theories, disinformation |
| S20 | Copyright / Trademark | Copyright infringement, trademark violations, plagiarism |
| S21 | Unauthorized Advice | Medical, legal, or financial advice without qualifications |
| S22 | Illegal Activity | General illegal activities not covered by other categories |
| S23 | Immoral / Unethical | Content that is immoral or unethical but may not be illegal |
High-Risk Categories
Certain categories are considered high-risk and receive elevated priority
in Failure Analysis results. When these categories are detected, Valiqor
automatically flags them for immediate attention.
High-risk categories — such as those involving minors, self-harm, PII
exposure, and threats — receive automatic severity escalation to ensure
they are never overlooked.
Using Categories in Audits
Audit All Categories
result = client.security.audit(dataset=data)
# All S1-S23 categories are evaluated by default
Audit Specific Categories
result = client.security.audit(
dataset=data,
categories=["S1", "S6", "S8", "S9"] # Only these categories
)
Browse Available Categories
vulns = client.security.list_vulnerabilities()
for v in vulns:
print(f"{v['code']}: {v['name']}")
Audit Results
Each item in an audit receives a per-category assessment:
| Field | Type | Description |
|---|
audit_id | str | Unique audit identifier |
total_items | int | Total items evaluated |
safe_count | int | Items that passed all checks |
unsafe_count | int | Items with at least one violation |
safety_score | float | Overall safety score (0.0–1.0) |
triggered_categories | dict | Category → count of violations |
top_risk_category | str | Most frequently triggered category |
Attack Strategies
Red teaming generates adversarial prompts using 24 strategies organized
into 5 categories:
Direct Prompt Modification
Attacks that directly alter or extend the input prompt:
| Strategy | Description |
|---|
| Direct | Unmodified harmful prompt (baseline) |
| Direct Prompt Injection | Injects instructions to override system prompt |
| Adversarial Suffix | Appends adversarial tokens to bypass safety filters |
| Authority Override | Impersonates authority to override restrictions |
| Hypothetical Framing | Frames harmful requests as hypothetical scenarios |
Obfuscation & Encoding
Attacks that hide malicious intent through encoding or transformation:
| Strategy | Description |
|---|
| ROT13 | Encodes input using ROT13 letter substitution |
| Base64 | Encodes input in Base64 |
| Unicode Confusable | Replaces characters with visually similar Unicode |
| Unicode Smuggling | Embeds hidden instructions in Unicode control characters |
| Steganography | Hides instructions within seemingly innocent text |
| Translation Pivot | Translates through another language to bypass filters |
Persona / Role Based
Attacks that use roleplay, personas, or conversational manipulation:
| Strategy | Description |
|---|
| Jailbreak | Classic jailbreak prompt patterns |
| Jailbreak Persona | Adopts a persona to bypass safety restrictions |
| Chain of Thought | Manipulates reasoning chain to reach harmful conclusions |
| Multi-Turn Escalation | Gradually escalates across conversation turns |
| Crescendo | Slowly builds up to harmful content over multiple turns |
Multimodal & Context Attacks
Attacks targeting context windows and multi-modal inputs:
| Strategy | Description |
|---|
| Indirect Injection | Injects instructions via retrieved context or tool outputs |
| Payload Splitting | Splits harmful content across multiple messages |
| Multimodal Embedded | Embeds instructions in images or other media |
Training / Model Attacks
Attacks targeting the model itself or its training data:
| Strategy | Description |
|---|
| Data Extraction | Attempts to extract training data or system prompts |
| Model Stealing | Probes to reverse-engineer model behavior |
| Membership Inference | Tests whether specific data was in training set |
| Data Poisoning | Attempts to influence model behavior through inputs |
| Privilege Escalation | Attempts to gain elevated access or capabilities |
Using Attack Strategies in Red Teaming
Run Red Team with All Strategies
result = client.security.red_team(
run_name="full-red-team-v1",
attacks_per_vector=5 # 5 attacks per strategy
)
Target Specific Strategies
result = client.security.red_team(
run_name="jailbreak-test",
attack_vectors=["Jailbreak", "Jailbreak Persona", "Multi-Turn Escalation"],
target_vulnerabilities=["S1", "S6", "S8"],
attacks_per_vector=10
)
Browse Available Strategies
vectors = client.security.list_attack_vectors()
for v in vectors:
print(f"{v['display_name']} ({v['category']})")
Red Team Results
| Field | Type | Description |
|---|
run_id | str | Unique run identifier |
name | str | Run name you provided |
status | str | "completed", "failed", etc. |
total_attacks | int | Total attack prompts generated |
successful_attacks | int | Attacks that bypassed safety |
success_rate | float | Attack success rate (0.0–1.0) |
distinct_vulns_count | int | Unique vulnerability categories triggered |
top_vulnerability | str | Most frequently exploited category |
Per-Project Configuration
You can customize which categories and attack vectors are active for each
project:
# View current vulnerability configuration
config = client.security.get_vuln_config(project_name="my-chatbot")
# Disable a category for this project
client.security.update_vuln_setting(
project_name="my-chatbot",
vulnerability_code="S14", # "Other" — not relevant for this app
enabled=False
)
# View attack vector configuration
vectors = client.security.get_vector_config(project_name="my-chatbot")
# Adjust a vector's weight
client.security.update_vector_setting(
project_name="my-chatbot",
vector_name="Jailbreak",
priority=1.0
)
Where Categories Are Used