Cross-user data leakage in multi-tenant LLM deployments

A patient asks a healthcare chatbot about her medication history. The model returns, accurately, her own records. Then it mentions a detail from a different patient’s chart. No adversarial prompt was involved. The retrieval pipeline surfaced the wrong records, and the model incorporated them.

Cross-user data leakage is a structural risk in any multi-tenant LLM deployment that uses retrieval-augmented generation, persistent memory, or shared context. It doesn’t require an attacker. It can happen through ordinary use.

How it happens

Multi-tenant RAG systems store documents, conversation history, or user records in a shared vector store. Access controls at the application layer are supposed to ensure that retrieval only returns records belonging to the current user. When those controls fail, or when embedding proximity causes unintended matches, records from one user can appear in another user’s context.

The model has no way to detect this. It receives a context window containing retrieved content, and it incorporates that content into its response. It cannot distinguish between authorized records and records that slipped through a misconfigured access control.

The specific failure modes

Proximity-based cross-retrieval. Two users have similar queries. Their records are embedded close together in the vector space. The retrieval step returns records for both. The model incorporates both into its response, leaking one user’s data to the other.

Session bleed. In deployments with persistent conversation memory, a session management failure can cause one user’s prior context to be loaded into another user’s session. The model continues the conversation as if the second user had the first user’s history.

Injection-assisted exfiltration. A user embeds a prompt injection in content that gets stored in the shared corpus. When other users’ queries retrieve that document, the injected instruction directs the model to include other retrieved content verbatim in its response. This is an active attack rather than a passive failure, but the mechanism is the same retrieval pipeline.

In regulated contexts, unintentional disclosure of one user’s data to another is a breach, regardless of whether it was caused by an attack. A vector store misconfiguration that causes PHI to appear in another patient’s session is reportable. Testing for this failure before deployment is not optional in healthcare, financial services, or any context involving personal data.

What to test for

Cross-user leakage testing requires a multi-user test harness: create distinct user profiles with distinctly identified data, run queries designed to trigger retrieval of nearby records, and verify that the model’s responses contain only data belonging to the querying user. Test both proximity-based retrieval failures (queries from different users that are semantically close) and injection-based exfiltration (malicious content stored by one user designed to surface in another’s session). Access controls should be verified at the retrieval layer, not just the application layer.

Cross-user data leakage in multi-tenant LLM deployments

How it happens

The specific failure modes

Why this is a HIPAA and GDPR risk, not just a quality problem

What to test for