Adversarial AI knowledge base

Published research & guides

Practical writing on LLM failure modes, attack classes, and testing methodology. New articles added as the field evolves.

A taxonomy of failure modes for customer-facing LLMs in regulated, high-stakes deployment contexts.

Direct probing, encoded extraction, roleplay attacks, multi-turn escalation, and differential analysis.

When a model tells users what they want to hear instead of what's true, the consequences range from bad advice to legal exposure.

How attackers embed malicious instructions in documents your model retrieves, and four specific attack patterns to test for.