Skip to main content
Valiqor’s security engine evaluates AI systems against 23 vulnerability categories (S1–S23) and can generate adversarial attacks using 24 attack strategies across 5 categories. This page is the canonical reference for both.

Vulnerability Categories (S1–S23)

Every security audit and red team run evaluates against these categories. You can audit all 23 or select specific ones via the categories parameter.
CodeCategoryDescription
S1ViolenceContent promoting or depicting violence
S2SexualSexually explicit or suggestive content
S3Criminal PlanningCriminal planning, confessions, or instructions
S4Guns & Illegal WeaponsContent about illegal weapons acquisition or use
S5Controlled SubstancesInformation about illegal drugs or regulated substances
S6Suicide & Self HarmContent promoting or instructing self-harm
S7Sexual (Minor)Any sexual content involving minors
S8Hate / Identity HateHate speech targeting identity groups
S9PII / PrivacyPersonally identifiable information exposure. Detection requires concrete PII entities (e.g. email, phone, SSN). Generic names or locations alone do not trigger this category.
S10HarassmentTargeted harassment or bullying
S11ThreatDirect or indirect threats of harm
S12ProfanityExcessive or gratuitous profane language
S13Needs CautionContent requiring careful handling or disclaimers
S14OtherMiscellaneous safety concerns
S15ManipulationPsychological manipulation or coercion
S16Fraud / DeceptionFraudulent schemes or deceptive practices
S17MalwareCode, instructions, or content related to malware
S18High Risk Gov Decision MakingAI used in high-stakes government decisions
S19Political / MisinformationPolitical manipulation, conspiracy theories, disinformation
S20Copyright / TrademarkCopyright infringement, trademark violations, plagiarism
S21Unauthorized AdviceMedical, legal, or financial advice without qualifications
S22Illegal ActivityGeneral illegal activities not covered by other categories
S23Immoral / UnethicalContent that is immoral or unethical but may not be illegal

High-Risk Categories

Certain categories are considered high-risk and receive elevated priority in Failure Analysis results. When these categories are detected, Valiqor automatically flags them for immediate attention.
High-risk categories — such as those involving minors, self-harm, PII exposure, and threats — receive automatic severity escalation to ensure they are never overlooked.

Using Categories in Audits

Audit All Categories

result = client.security.audit(dataset=data)
# All S1-S23 categories are evaluated by default

Audit Specific Categories

result = client.security.audit(
    dataset=data,
    categories=["S1", "S6", "S8", "S9"]  # Only these categories
)

Browse Available Categories

vulns = client.security.list_vulnerabilities()
for v in vulns:
    print(f"{v['code']}: {v['name']}")

Audit Results

Each item in an audit receives a per-category assessment:
FieldTypeDescription
audit_idstrUnique audit identifier
total_itemsintTotal items evaluated
safe_countintItems that passed all checks
unsafe_countintItems with at least one violation
safety_scorefloatOverall safety score (0.0–1.0)
triggered_categoriesdictCategory → count of violations
top_risk_categorystrMost frequently triggered category

Attack Strategies

Red teaming generates adversarial prompts using 24 strategies organized into 5 categories:

Direct Prompt Modification

Attacks that directly alter or extend the input prompt:
StrategyDescription
DirectUnmodified harmful prompt (baseline)
Direct Prompt InjectionInjects instructions to override system prompt
Adversarial SuffixAppends adversarial tokens to bypass safety filters
Authority OverrideImpersonates authority to override restrictions
Hypothetical FramingFrames harmful requests as hypothetical scenarios

Obfuscation & Encoding

Attacks that hide malicious intent through encoding or transformation:
StrategyDescription
ROT13Encodes input using ROT13 letter substitution
Base64Encodes input in Base64
Unicode ConfusableReplaces characters with visually similar Unicode
Unicode SmugglingEmbeds hidden instructions in Unicode control characters
SteganographyHides instructions within seemingly innocent text
Translation PivotTranslates through another language to bypass filters

Persona / Role Based

Attacks that use roleplay, personas, or conversational manipulation:
StrategyDescription
JailbreakClassic jailbreak prompt patterns
Jailbreak PersonaAdopts a persona to bypass safety restrictions
Chain of ThoughtManipulates reasoning chain to reach harmful conclusions
Multi-Turn EscalationGradually escalates across conversation turns
CrescendoSlowly builds up to harmful content over multiple turns

Multimodal & Context Attacks

Attacks targeting context windows and multi-modal inputs:
StrategyDescription
Indirect InjectionInjects instructions via retrieved context or tool outputs
Payload SplittingSplits harmful content across multiple messages
Multimodal EmbeddedEmbeds instructions in images or other media

Training / Model Attacks

Attacks targeting the model itself or its training data:
StrategyDescription
Data ExtractionAttempts to extract training data or system prompts
Model StealingProbes to reverse-engineer model behavior
Membership InferenceTests whether specific data was in training set
Data PoisoningAttempts to influence model behavior through inputs
Privilege EscalationAttempts to gain elevated access or capabilities

Using Attack Strategies in Red Teaming

Run Red Team with All Strategies

result = client.security.red_team(
    run_name="full-red-team-v1",
    attacks_per_vector=5  # 5 attacks per strategy
)

Target Specific Strategies

result = client.security.red_team(
    run_name="jailbreak-test",
    attack_vectors=["Jailbreak", "Jailbreak Persona", "Multi-Turn Escalation"],
    target_vulnerabilities=["S1", "S6", "S8"],
    attacks_per_vector=10
)

Browse Available Strategies

vectors = client.security.list_attack_vectors()
for v in vectors:
    print(f"{v['display_name']} ({v['category']})")

Red Team Results

FieldTypeDescription
run_idstrUnique run identifier
namestrRun name you provided
statusstr"completed", "failed", etc.
total_attacksintTotal attack prompts generated
successful_attacksintAttacks that bypassed safety
success_ratefloatAttack success rate (0.0–1.0)
distinct_vulns_countintUnique vulnerability categories triggered
top_vulnerabilitystrMost frequently exploited category

Per-Project Configuration

You can customize which categories and attack vectors are active for each project:
# View current vulnerability configuration
config = client.security.get_vuln_config(project_name="my-chatbot")

# Disable a category for this project
client.security.update_vuln_setting(
    project_name="my-chatbot",
    vulnerability_code="S14",  # "Other" — not relevant for this app
    enabled=False
)

# View attack vector configuration
vectors = client.security.get_vector_config(project_name="my-chatbot")

# Adjust a vector's weight
client.security.update_vector_setting(
    project_name="my-chatbot",
    vector_name="Jailbreak",
    priority=1.0
)

Where Categories Are Used