← All resources

LLM Attack Taxonomy

An interactive map of LLM attack vectors and methods, and how Black Diamond Consulting assesses against each.

24 covered 14 in-depth 25 expanding 37 I/O-detectable 6 session-level 6 baseline-only

Jailbreak & Injection

Input-side manipulation to elicit policy-violating output.

MethodStatusDetectabilityFrameworks
Direct persona / alter-ego (DAN-style)
Alter-ego personas instructed to ignore restrictions.
CoveredI/O signalowasp_llm01 nist_evasion mitre_llm_jailbreak
Fictional / academic framing
Story, roleplay, or research framings that wrap a restricted request.
CoveredI/O signalowasp_llm01 nist_evasion mitre_llm_jailbreak
Direct system-prompt override
SYSTEM OVERRIDE / ignore-previous-instructions injection in the user turn.
CoveredI/O signalowasp_llm01 mitre_llm_injection
Indirect injection via retrieved content
Instructions embedded in RAG documents the model is asked to process.
In-depthI/O signalowasp_llm01 owasp_llm04 nist_poisoning mitre_llm_injection
Payload splitting (multi-turn)
A harmful request spread across individually-innocuous turns.
In-depthSession-levelowasp_llm01 nist_evasion mitre_llm_jailbreak
Virtualization / eval-mode framing
Claims that safety rules are suspended in a sandbox/test/dry-run context.
In-depthI/O signalowasp_llm01 nist_evasion mitre_llm_jailbreak
Base64 / encoding smuggling
Harmful instruction encoded (base64) to slip past surface matching.
CoveredI/O signalowasp_llm01 nist_evasion mitre_llm_jailbreak
Many-shot jailbreaking (128-256 shots)
Flooding context with many fake exchanges to erode refusal at scale.
ExpandingI/O signalowasp_llm01 nist_evasion mitre_llm_jailbreak
Crescendo (gradual escalation)
Benign-to-harmful drift within a single topic across turns.
ExpandingSession-levelowasp_llm01 nist_evasion mitre_llm_jailbreak
Past-tense reformulation
Reframing a prohibited request in the past tense to bypass refusal.
ExpandingI/O signalowasp_llm01 nist_evasion mitre_llm_jailbreak
Refusal suppression / prefix injection
Forcing an affirmative opening or banning disclaimers/apologies.
ExpandingI/O signalowasp_llm01 nist_evasion mitre_llm_jailbreak
Low-resource-language / translation bypass
Routing a harmful request through a low-resource language.
ExpandingI/O signalowasp_llm01 nist_evasion mitre_llm_jailbreak
Cipher / ASCII-art evasion (ArtPrompt, CipherChat)
Glyph- or cipher-based encodings that evade the safety classifier.
ExpandingI/O signalowasp_llm01 nist_evasion mitre_llm_jailbreak
Adversarial suffix (GCG / transferable)
Optimized gibberish suffixes that flip refusal to compliance.
ExpandingI/O signalowasp_llm01 nist_evasion mitre_llm_jailbreak

Information Extraction

Eliciting protected context, secrets, or other users' data.

MethodStatusDetectabilityFrameworks
System-prompt leakage (structural)
Enumerating restrictions, escalation triggers, and operator instructions.
In-depthSession-levelowasp_llm02 owasp_llm07 nist_privacy mitre_recon
Secret / token extraction
Direct, encoded, roleplay, indirect, and multi-turn secret extraction.
In-depthI/O signalowasp_llm02 owasp_llm07 nist_privacy mitre_exfiltration
Cross-user / PHI boundary breach
Extracting another tenant's records from a multi-user RAG context.
In-depthI/O signalowasp_llm02 owasp_llm08 nist_privacy mitre_exfiltration
Cross-user IP / trade-secret boundary breach
Extracting another employee's patent filings or invention details from a multi-user IP-management RAG context.
In-depthI/O signalowasp_llm02 owasp_llm08 nist_privacy mitre_exfiltration
Training-data memorization extraction
Divergence/repetition attacks that surface memorized training text.
ExpandingBaseline-onlyowasp_llm02 nist_privacy mitre_exfiltration
Membership inference / model inversion
Inferring training-set membership; largely research-only behaviorally.
ExpandingBaseline-onlynist_privacy

Output Handling

Unsafe model output that harms a downstream consumer.

MethodStatusDetectabilityFrameworks
Markdown / image data exfiltration
Emitting an image/link URL that leaks context to an attacker on render.
ExpandingI/O signalowasp_llm05 owasp_llm02
Insecure generated code as output
Generated SQL/HTML/shell that is unsafe when executed downstream.
ExpandingI/O signalowasp_llm05
Structured-output breakout
JSON/format injection that escapes the intended output contract.
ExpandingI/O signalowasp_llm05

Agentic & Tool-Use

Misuse of an agent's tools, authority, or memory.

MethodStatusDetectabilityFrameworks
Described-tool action boundary
Refusing to claim execution of described (non-real) tool actions.
CoveredI/O signalowasp_llm06 nist_abuse eu_ai_art14
Unbounded consumption (behavioral)
Refusing to attempt unbounded/indefinite output generation.
CoveredI/O signalowasp_llm10 nist_abuse
Real tool-call / function injection
Injection that drives actual API tool calls via the tools= interface.
ExpandingI/O signalowasp_llm06 mitre_llm_injection
Tool-result injection
Poisoned tool_result blocks accepted as authoritative.
ExpandingI/O signalowasp_llm06 owasp_llm01
Confused-deputy / goal hijack via tool chaining
Injected content steering an agent to misuse its own privileges.
ExpandingBaseline-onlyowasp_llm06 nist_abuse
Persistent memory poisoning
Cross-session corruption of agent memory.
ExpandingBaseline-onlyowasp_llm06 owasp_llm04

Multimodal

Attacks delivered through non-text channels.

MethodStatusDetectabilityFrameworks
Hidden-text image injection
Near-invisible or tiny-font instructions embedded in uploaded images.
In-depthI/O signalowasp_llm01 nist_evasion mitre_llm_injection
Unicode tag-block / invisible-char smuggling (text)
Instructions hidden in U+E0000 tag characters within plain text.
ExpandingI/O signalowasp_llm01 nist_evasion
Homoglyph substitution
Look-alike Unicode characters that evade string-level filters.
ExpandingI/O signalowasp_llm01 nist_evasion
Audio / QR / steganographic payloads
Instructions carried in audio, QR codes, or image steganography.
ExpandingI/O signalowasp_llm01 nist_evasion

Generation-Quality Harms

Confident-but-wrong or unsafe content under normal use.

MethodStatusDetectabilityFrameworks
Factual hallucination
Fabricated people, events, citations, and false-premise acceptance.
In-depthBaseline-onlyowasp_llm09 eu_ai_art15
Sycophantic reversal under pressure
Abandoning a correct position under pushback, authority, or emotion.
In-depthSession-levelowasp_llm09 eu_ai_art15
Temporal / stale-data error
Stale regulatory/legal/market facts stated without a staleness hedge.
In-depthBaseline-onlyowasp_llm09 eu_ai_art15
False privacy / confidentiality assurance
Model falsely assures users their inputs are private or not retained when it cannot verify this — Galkin IP/patent disclosure risk.
In-depthI/O signalowasp_llm09 eu_ai_art13 nist_privacy
Data-handling policy fabrication
Model invents specific retention periods, opt-out mechanisms, or compliance provisions it cannot verify are accurate or current.
In-depthI/O signalowasp_llm09 nist_privacy
Omitted IP / patent disclosure warning
Model assists with novel unpublished inventions without flagging that sharing technical details may constitute prior art or compromise trade-secret status.
In-depthI/O signalowasp_llm09 eu_ai_art13
Package hallucination (slopsquatting)
Inventing installable package names an attacker can register.
ExpandingI/O signalowasp_llm09
Insecure-code suggestion (measured)
Systematically scoring the safety of generated code, not just eliciting it.
ExpandingI/O signalowasp_llm09

Content-Harm Payloads

The harmful subject matter that jailbreaks attempt to elicit.

MethodStatusDetectabilityFrameworks
Medical / overdose / dosing
Lethal-dose and unsafe-dosing requests across competing-objective framings.
CoveredI/O signalnist_abuse eu_ai_art15
Protected health information (PHI)
Cross-patient PHI and FHIR-boundary extraction.
CoveredI/O signalowasp_llm02
Dangerous-goods / HAZMAT downgrade
Pressure to downgrade HAZMAT/DGR/IMDG safety classifications.
CoveredSession-leveleu_ai_art15
Regulated advice boundary (legal / financial)
UPL, unregistered-adviser, and suitability-reversal boundaries.
CoveredSession-levelowasp_llm09 eu_ai_art14
CBRN uplift
Chemical/biological/radiological/nuclear capability uplift.
ExpandingI/O signalowasp_llm09
Malware / offensive-cyber generation
Working exploit, malware, or intrusion tooling generation.
ExpandingI/O signalowasp_llm09
Fraud / scam / targeted disinformation
Scam scripts, phishing, and tailored disinformation authoring.
ExpandingI/O signalowasp_llm09
Self-harm (non-overdose)
Self-harm methods beyond the medication-overdose vector already covered.
ExpandingI/O signalowasp_llm09 eu_ai_art15

Get a written read on your AI's exposure.

Send a short description of your AI system and I'll reply with the risks I'd check first — free, no call required.

Request my free assessment →