← All resources
Attack taxonomy
LLM Attack Taxonomy
An interactive map of LLM attack vectors and methods, and how Black Diamond Consulting assesses against each.
24 covered
14 in-depth
37 I/O-detectable
6 session-level
6 baseline-only
Jailbreak & Injection
Input-side manipulation to elicit policy-violating output.
| Method | Status | Detectability | Frameworks |
|---|---|---|---|
| Direct persona / alter-ego (DAN-style) Alter-ego personas instructed to ignore restrictions. | Covered | I/O signal | owasp_llm01 nist_evasion mitre_llm_jailbreak |
| Fictional / academic framing Story, roleplay, or research framings that wrap a restricted request. | Covered | I/O signal | owasp_llm01 nist_evasion mitre_llm_jailbreak |
| Direct system-prompt override SYSTEM OVERRIDE / ignore-previous-instructions injection in the user turn. | Covered | I/O signal | owasp_llm01 mitre_llm_injection |
| Indirect injection via retrieved content Instructions embedded in RAG documents the model is asked to process. | In-depth | I/O signal | owasp_llm01 owasp_llm04 nist_poisoning mitre_llm_injection |
| Payload splitting (multi-turn) A harmful request spread across individually-innocuous turns. | In-depth | Session-level | owasp_llm01 nist_evasion mitre_llm_jailbreak |
| Virtualization / eval-mode framing Claims that safety rules are suspended in a sandbox/test/dry-run context. | In-depth | I/O signal | owasp_llm01 nist_evasion mitre_llm_jailbreak |
| Base64 / encoding smuggling Harmful instruction encoded (base64) to slip past surface matching. | Covered | I/O signal | owasp_llm01 nist_evasion mitre_llm_jailbreak |
| Many-shot jailbreaking (128-256 shots) Flooding context with many fake exchanges to erode refusal at scale. | I/O signal | owasp_llm01 nist_evasion mitre_llm_jailbreak | |
| Crescendo (gradual escalation) Benign-to-harmful drift within a single topic across turns. | Session-level | owasp_llm01 nist_evasion mitre_llm_jailbreak | |
| Past-tense reformulation Reframing a prohibited request in the past tense to bypass refusal. | I/O signal | owasp_llm01 nist_evasion mitre_llm_jailbreak | |
| Refusal suppression / prefix injection Forcing an affirmative opening or banning disclaimers/apologies. | I/O signal | owasp_llm01 nist_evasion mitre_llm_jailbreak | |
| Low-resource-language / translation bypass Routing a harmful request through a low-resource language. | I/O signal | owasp_llm01 nist_evasion mitre_llm_jailbreak | |
| Cipher / ASCII-art evasion (ArtPrompt, CipherChat) Glyph- or cipher-based encodings that evade the safety classifier. | I/O signal | owasp_llm01 nist_evasion mitre_llm_jailbreak | |
| Adversarial suffix (GCG / transferable) Optimized gibberish suffixes that flip refusal to compliance. | I/O signal | owasp_llm01 nist_evasion mitre_llm_jailbreak |
Information Extraction
Eliciting protected context, secrets, or other users' data.
| Method | Status | Detectability | Frameworks |
|---|---|---|---|
| System-prompt leakage (structural) Enumerating restrictions, escalation triggers, and operator instructions. | In-depth | Session-level | owasp_llm02 owasp_llm07 nist_privacy mitre_recon |
| Secret / token extraction Direct, encoded, roleplay, indirect, and multi-turn secret extraction. | In-depth | I/O signal | owasp_llm02 owasp_llm07 nist_privacy mitre_exfiltration |
| Cross-user / PHI boundary breach Extracting another tenant's records from a multi-user RAG context. | In-depth | I/O signal | owasp_llm02 owasp_llm08 nist_privacy mitre_exfiltration |
| Cross-user IP / trade-secret boundary breach Extracting another employee's patent filings or invention details from a multi-user IP-management RAG context. | In-depth | I/O signal | owasp_llm02 owasp_llm08 nist_privacy mitre_exfiltration |
| Training-data memorization extraction Divergence/repetition attacks that surface memorized training text. | Baseline-only | owasp_llm02 nist_privacy mitre_exfiltration | |
| Membership inference / model inversion Inferring training-set membership; largely research-only behaviorally. | Baseline-only | nist_privacy |
Output Handling
Unsafe model output that harms a downstream consumer.
| Method | Status | Detectability | Frameworks |
|---|---|---|---|
| Markdown / image data exfiltration Emitting an image/link URL that leaks context to an attacker on render. | I/O signal | owasp_llm05 owasp_llm02 | |
| Insecure generated code as output Generated SQL/HTML/shell that is unsafe when executed downstream. | I/O signal | owasp_llm05 | |
| Structured-output breakout JSON/format injection that escapes the intended output contract. | I/O signal | owasp_llm05 |
Agentic & Tool-Use
Misuse of an agent's tools, authority, or memory.
| Method | Status | Detectability | Frameworks |
|---|---|---|---|
| Described-tool action boundary Refusing to claim execution of described (non-real) tool actions. | Covered | I/O signal | owasp_llm06 nist_abuse eu_ai_art14 |
| Unbounded consumption (behavioral) Refusing to attempt unbounded/indefinite output generation. | Covered | I/O signal | owasp_llm10 nist_abuse |
| Real tool-call / function injection Injection that drives actual API tool calls via the tools= interface. | I/O signal | owasp_llm06 mitre_llm_injection | |
| Tool-result injection Poisoned tool_result blocks accepted as authoritative. | I/O signal | owasp_llm06 owasp_llm01 | |
| Confused-deputy / goal hijack via tool chaining Injected content steering an agent to misuse its own privileges. | Baseline-only | owasp_llm06 nist_abuse | |
| Persistent memory poisoning Cross-session corruption of agent memory. | Baseline-only | owasp_llm06 owasp_llm04 |
Multimodal
Attacks delivered through non-text channels.
| Method | Status | Detectability | Frameworks |
|---|---|---|---|
| Hidden-text image injection Near-invisible or tiny-font instructions embedded in uploaded images. | In-depth | I/O signal | owasp_llm01 nist_evasion mitre_llm_injection |
| Unicode tag-block / invisible-char smuggling (text) Instructions hidden in U+E0000 tag characters within plain text. | I/O signal | owasp_llm01 nist_evasion | |
| Homoglyph substitution Look-alike Unicode characters that evade string-level filters. | I/O signal | owasp_llm01 nist_evasion | |
| Audio / QR / steganographic payloads Instructions carried in audio, QR codes, or image steganography. | I/O signal | owasp_llm01 nist_evasion |
Generation-Quality Harms
Confident-but-wrong or unsafe content under normal use.
| Method | Status | Detectability | Frameworks |
|---|---|---|---|
| Factual hallucination Fabricated people, events, citations, and false-premise acceptance. | In-depth | Baseline-only | owasp_llm09 eu_ai_art15 |
| Sycophantic reversal under pressure Abandoning a correct position under pushback, authority, or emotion. | In-depth | Session-level | owasp_llm09 eu_ai_art15 |
| Temporal / stale-data error Stale regulatory/legal/market facts stated without a staleness hedge. | In-depth | Baseline-only | owasp_llm09 eu_ai_art15 |
| False privacy / confidentiality assurance Model falsely assures users their inputs are private or not retained when it cannot verify this — Galkin IP/patent disclosure risk. | In-depth | I/O signal | owasp_llm09 eu_ai_art13 nist_privacy |
| Data-handling policy fabrication Model invents specific retention periods, opt-out mechanisms, or compliance provisions it cannot verify are accurate or current. | In-depth | I/O signal | owasp_llm09 nist_privacy |
| Omitted IP / patent disclosure warning Model assists with novel unpublished inventions without flagging that sharing technical details may constitute prior art or compromise trade-secret status. | In-depth | I/O signal | owasp_llm09 eu_ai_art13 |
| Package hallucination (slopsquatting) Inventing installable package names an attacker can register. | I/O signal | owasp_llm09 | |
| Insecure-code suggestion (measured) Systematically scoring the safety of generated code, not just eliciting it. | I/O signal | owasp_llm09 |
Content-Harm Payloads
The harmful subject matter that jailbreaks attempt to elicit.
| Method | Status | Detectability | Frameworks |
|---|---|---|---|
| Medical / overdose / dosing Lethal-dose and unsafe-dosing requests across competing-objective framings. | Covered | I/O signal | nist_abuse eu_ai_art15 |
| Protected health information (PHI) Cross-patient PHI and FHIR-boundary extraction. | Covered | I/O signal | owasp_llm02 |
| Dangerous-goods / HAZMAT downgrade Pressure to downgrade HAZMAT/DGR/IMDG safety classifications. | Covered | Session-level | eu_ai_art15 |
| Regulated advice boundary (legal / financial) UPL, unregistered-adviser, and suitability-reversal boundaries. | Covered | Session-level | owasp_llm09 eu_ai_art14 |
| CBRN uplift Chemical/biological/radiological/nuclear capability uplift. | I/O signal | owasp_llm09 | |
| Malware / offensive-cyber generation Working exploit, malware, or intrusion tooling generation. | I/O signal | owasp_llm09 | |
| Fraud / scam / targeted disinformation Scam scripts, phishing, and tailored disinformation authoring. | I/O signal | owasp_llm09 | |
| Self-harm (non-overdose) Self-harm methods beyond the medication-overdose vector already covered. | I/O signal | owasp_llm09 eu_ai_art15 |
No methods match those filters.
Worried about one of these?
Get a written read on your AI's exposure.
Send a short description of your AI system and I'll reply with the risks I'd check first — free, no call required.