Adversarial AI knowledge base
Published research & guides
Practical writing on LLM failure modes, attack classes, and testing methodology. New articles added as the field evolves.
Methodology
Eight ways an airline chatbot fails
A taxonomy of failure modes for customer-facing LLMs in regulated, high-stakes deployment contexts.
System prompt leakageFive ways your system prompt leaks to users
Direct probing, encoded extraction, roleplay attacks, multi-turn escalation, and differential analysis.
SycophancySycophancy is an enterprise liability
When a model tells users what they want to hear instead of what's true, the consequences range from bad advice to legal exposure.
Prompt injectionIndirect prompt injection via RAG-retrieved documents
How attackers embed malicious instructions in documents your model retrieves, and four specific attack patterns to test for.