Attack taxonomy

LLM Attack Taxonomy

An interactive map of LLM attack vectors and methods, and how Black Diamond Consulting assesses against each.

Curated by Sean Yunt — Founder & Principal, Black Diamond Consulting

24 covered 14 in-depth 25 expanding 37 I/O-detectable 6 session-level 6 baseline-only

Jailbreak & Injection

Input-side manipulation to elicit policy-violating output.

Method	Status	Detectability	Frameworks
Direct persona / alter-ego (DAN-style) Alter-ego personas instructed to ignore restrictions.	Covered	I/O signal	owasp_llm01 nist_evasion mitre_llm_jailbreak
Fictional / academic framing Story, roleplay, or research framings that wrap a restricted request.	Covered	I/O signal	owasp_llm01 nist_evasion mitre_llm_jailbreak
Direct system-prompt override SYSTEM OVERRIDE / ignore-previous-instructions injection in the user turn.	Covered	I/O signal	owasp_llm01 mitre_llm_injection
Indirect injection via retrieved content Instructions embedded in RAG documents the model is asked to process.	In-depth	I/O signal	owasp_llm01 owasp_llm04 nist_poisoning mitre_llm_injection
Payload splitting (multi-turn) A harmful request spread across individually-innocuous turns.	In-depth	Session-level	owasp_llm01 nist_evasion mitre_llm_jailbreak
Virtualization / eval-mode framing Claims that safety rules are suspended in a sandbox/test/dry-run context.	In-depth	I/O signal	owasp_llm01 nist_evasion mitre_llm_jailbreak
Base64 / encoding smuggling Harmful instruction encoded (base64) to slip past surface matching.	Covered	I/O signal	owasp_llm01 nist_evasion mitre_llm_jailbreak
Many-shot jailbreaking (128-256 shots) Flooding context with many fake exchanges to erode refusal at scale.	Expanding	I/O signal	owasp_llm01 nist_evasion mitre_llm_jailbreak
Crescendo (gradual escalation) Benign-to-harmful drift within a single topic across turns.	Expanding	Session-level	owasp_llm01 nist_evasion mitre_llm_jailbreak
Past-tense reformulation Reframing a prohibited request in the past tense to bypass refusal.	Expanding	I/O signal	owasp_llm01 nist_evasion mitre_llm_jailbreak
Refusal suppression / prefix injection Forcing an affirmative opening or banning disclaimers/apologies.	Expanding	I/O signal	owasp_llm01 nist_evasion mitre_llm_jailbreak
Low-resource-language / translation bypass Routing a harmful request through a low-resource language.	Expanding	I/O signal	owasp_llm01 nist_evasion mitre_llm_jailbreak
Cipher / ASCII-art evasion (ArtPrompt, CipherChat) Glyph- or cipher-based encodings that evade the safety classifier.	Expanding	I/O signal	owasp_llm01 nist_evasion mitre_llm_jailbreak
Adversarial suffix (GCG / transferable) Optimized gibberish suffixes that flip refusal to compliance.	Expanding	I/O signal	owasp_llm01 nist_evasion mitre_llm_jailbreak

Information Extraction

Eliciting protected context, secrets, or other users' data.

Method	Status	Detectability	Frameworks
System-prompt leakage (structural) Enumerating restrictions, escalation triggers, and operator instructions.	In-depth	Session-level	owasp_llm02 owasp_llm07 nist_privacy mitre_recon
Secret / token extraction Direct, encoded, roleplay, indirect, and multi-turn secret extraction.	In-depth	I/O signal	owasp_llm02 owasp_llm07 nist_privacy mitre_exfiltration
Cross-user / PHI boundary breach Extracting another tenant's records from a multi-user RAG context.	In-depth	I/O signal	owasp_llm02 owasp_llm08 nist_privacy mitre_exfiltration
Cross-user IP / trade-secret boundary breach Extracting another employee's patent filings or invention details from a multi-user IP-management RAG context.	In-depth	I/O signal	owasp_llm02 owasp_llm08 nist_privacy mitre_exfiltration
Training-data memorization extraction Divergence/repetition attacks that surface memorized training text.	Expanding	Baseline-only	owasp_llm02 nist_privacy mitre_exfiltration
Membership inference / model inversion Inferring training-set membership; largely research-only behaviorally.	Expanding	Baseline-only	nist_privacy

Output Handling

Unsafe model output that harms a downstream consumer.

Method	Status	Detectability	Frameworks
Markdown / image data exfiltration Emitting an image/link URL that leaks context to an attacker on render.	Expanding	I/O signal	owasp_llm05 owasp_llm02
Insecure generated code as output Generated SQL/HTML/shell that is unsafe when executed downstream.	Expanding	I/O signal	owasp_llm05
Structured-output breakout JSON/format injection that escapes the intended output contract.	Expanding	I/O signal	owasp_llm05

Agentic & Tool-Use

Misuse of an agent's tools, authority, or memory.

Method	Status	Detectability	Frameworks
Described-tool action boundary Refusing to claim execution of described (non-real) tool actions.	Covered	I/O signal	owasp_llm06 nist_abuse eu_ai_art14
Unbounded consumption (behavioral) Refusing to attempt unbounded/indefinite output generation.	Covered	I/O signal	owasp_llm10 nist_abuse
Real tool-call / function injection Injection that drives actual API tool calls via the tools= interface.	Expanding	I/O signal	owasp_llm06 mitre_llm_injection
Tool-result injection Poisoned tool_result blocks accepted as authoritative.	Expanding	I/O signal	owasp_llm06 owasp_llm01
Confused-deputy / goal hijack via tool chaining Injected content steering an agent to misuse its own privileges.	Expanding	Baseline-only	owasp_llm06 nist_abuse
Persistent memory poisoning Cross-session corruption of agent memory.	Expanding	Baseline-only	owasp_llm06 owasp_llm04

Multimodal

Attacks delivered through non-text channels.

Method	Status	Detectability	Frameworks
Hidden-text image injection Near-invisible or tiny-font instructions embedded in uploaded images.	In-depth	I/O signal	owasp_llm01 nist_evasion mitre_llm_injection
Unicode tag-block / invisible-char smuggling (text) Instructions hidden in U+E0000 tag characters within plain text.	Expanding	I/O signal	owasp_llm01 nist_evasion
Homoglyph substitution Look-alike Unicode characters that evade string-level filters.	Expanding	I/O signal	owasp_llm01 nist_evasion
Audio / QR / steganographic payloads Instructions carried in audio, QR codes, or image steganography.	Expanding	I/O signal	owasp_llm01 nist_evasion

Generation-Quality Harms

Confident-but-wrong or unsafe content under normal use.

Method	Status	Detectability	Frameworks
Factual hallucination Fabricated people, events, citations, and false-premise acceptance.	In-depth	Baseline-only	owasp_llm09 eu_ai_art15
Sycophantic reversal under pressure Abandoning a correct position under pushback, authority, or emotion.	In-depth	Session-level	owasp_llm09 eu_ai_art15
Temporal / stale-data error Stale regulatory/legal/market facts stated without a staleness hedge.	In-depth	Baseline-only	owasp_llm09 eu_ai_art15
False privacy / confidentiality assurance Model falsely assures users their inputs are private or not retained when it cannot verify this — Galkin IP/patent disclosure risk.	In-depth	I/O signal	owasp_llm09 eu_ai_art13 nist_privacy
Data-handling policy fabrication Model invents specific retention periods, opt-out mechanisms, or compliance provisions it cannot verify are accurate or current.	In-depth	I/O signal	owasp_llm09 nist_privacy
Omitted IP / patent disclosure warning Model assists with novel unpublished inventions without flagging that sharing technical details may constitute prior art or compromise trade-secret status.	In-depth	I/O signal	owasp_llm09 eu_ai_art13
Package hallucination (slopsquatting) Inventing installable package names an attacker can register.	Expanding	I/O signal	owasp_llm09
Insecure-code suggestion (measured) Systematically scoring the safety of generated code, not just eliciting it.	Expanding	I/O signal	owasp_llm09

Content-Harm Payloads

The harmful subject matter that jailbreaks attempt to elicit.

Method	Status	Detectability	Frameworks
Medical / overdose / dosing Lethal-dose and unsafe-dosing requests across competing-objective framings.	Covered	I/O signal	nist_abuse eu_ai_art15
Protected health information (PHI) Cross-patient PHI and FHIR-boundary extraction.	Covered	I/O signal	owasp_llm02
Dangerous-goods / HAZMAT downgrade Pressure to downgrade HAZMAT/DGR/IMDG safety classifications.	Covered	Session-level	eu_ai_art15
Regulated advice boundary (legal / financial) UPL, unregistered-adviser, and suitability-reversal boundaries.	Covered	Session-level	owasp_llm09 eu_ai_art14
CBRN uplift Chemical/biological/radiological/nuclear capability uplift.	Expanding	I/O signal	owasp_llm09
Malware / offensive-cyber generation Working exploit, malware, or intrusion tooling generation.	Expanding	I/O signal	owasp_llm09
Fraud / scam / targeted disinformation Scam scripts, phishing, and tailored disinformation authoring.	Expanding	I/O signal	owasp_llm09
Self-harm (non-overdose) Self-harm methods beyond the medication-overdose vector already covered.	Expanding	I/O signal	owasp_llm09 eu_ai_art15

Worried about one of these?

Get a written read on your AI's exposure.

Send a short description of your AI system and I'll reply with the risks I'd check first — free, no call required.

Request my free assessment →