
Auditing LLM Access Control in Multi-Tenant Applications
Multi-tenant LLM features fail in the same dull ways as the rest of web apps: broken identity binding, weak authorization, and too much trust in client-side context. The only twist is that the bug often hides inside prompt assembly, retrieval filters, or agent tools.
Why LLM access control fails in multi-tenant systems
I usually start from one assumption: the model is not the security boundary. The backend is.
In a multi-tenant app, the LLM may see chat history, documents, tickets, or CRM records from one workspace. If the app builds that context from the wrong tenant, or if a tool call can reach the wrong tenant, the model will happily summarize whatever it receives. That is not a model failure. It is an access-control failure that happened before inference.
The failure modes I see most often are:
- the UI passes
workspaceId, but the API trusts it - retrieval filters are applied in one code path but not another
- export or admin routes skip the same authorization checks as chat
- agent tools run with global credentials instead of tenant-scoped ones
What to audit first: tenant identity, session context, and tool boundaries
Check how the app binds user, org, and workspace IDs
Trace the request from login to LLM call. You want to know where the app decides, “this user belongs to this tenant.”
A safe audit pattern is to inspect the session object and compare it to request parameters:
function buildTenantContext(req) {
return {
userId: req.session.user.id,
orgId: req.session.org.id,
workspaceId: req.session.workspace.id,
};
}
If the code instead does this, you have a problem:
const workspaceId = req.body.workspaceId;
That value must be validated against server-side membership, not accepted because the client sent it.
Verify whether prompt context can cross tenant boundaries
Prompt injection gets too much attention here. The more common issue is accidental prompt contamination across tenants.
Check whether the app caches:
- summaries
- embeddings
- recent messages
- system prompts
- tool results
If any of those are keyed only by userId, session token, or a shared cache entry, one tenant can leak into another. In one audit, a “helpful” conversation summary was reused across workspaces because the cache key ignored orgId. The model did nothing wrong; the app assembled the wrong context.
Testing the API layer for isolation gaps
Look for missing authorization on chat, retrieval, and export routes
You need to test every route that touches tenant data, not just the main chat endpoint. I look for:
POST /chatPOST /retrieval/searchGET /documents/:idPOST /exportsPOST /agent/run
The bug class is usually inconsistent enforcement. The chat route checks membership, but the export route only checks that the caller is authenticated.
A simple review table helps:
| Route type | What to verify | Typical failure |
|---|---|---|
| Chat | workspace membership | trusts client workspace ID |
| Retrieval | document ownership | returns cross-tenant hits |
| Export | same policy as read access | bypasses document ACLs |
| Agent tools | scoped credentials | global write access |
Reproduce IDOR-style failures with safe tenant fixtures
Use two test tenants with clearly separated fixtures: tenant-a and tenant-b. Do not use production data.
Then try the boring attacks first:
- Log in as a user in tenant A.
- Send a request with tenant B's
workspaceId. - Request a document ID from tenant B.
- Trigger an export for a record you should not access.
If the response changes from 403 to data, you have an isolation failure. If the app returns a generic summary built from another tenant's content, that is still a leak.
Auditing RAG and vector search permissions
Confirm filters are enforced before retrieval
For RAG systems, the critical question is where filtering happens.
Bad pattern:
const hits = await vectorSearch(query);
const allowed = hits.filter((doc) => doc.workspaceId === ctx.workspaceId);
This looks fine until the search backend already exposed unrelated text in ranking metadata, snippets, or debug logs. The filter should be part of the retrieval query whenever possible.
Better:
const hits = await vectorSearch(query, {
filter: { workspaceId: ctx.workspaceId },
});
I also check whether embeddings are stored in shared indexes with metadata-only filtering. That can work, but only if the filter is enforced server-side and cannot be removed by the caller.
Compare application-side checks with database-side constraints
Application checks are necessary, but they are not enough by themselves. If the database can be queried directly, the database should still prevent cross-tenant reads.
Look for:
- row-level security
- tenant-scoped views
- composite indexes including
workspaceId - foreign keys that include tenant ownership
If all the protection lives in JavaScript, one missed code path is enough to break isolation.
Reviewing tool calls and agent actions
Check whether tools inherit the correct tenant scope
Tools should receive tenant context from the server, not from the model output. That means the tool runner should inject orgId, workspaceId, and user role before the call executes.
A weak pattern looks like this:
await tools.sendEmail({
to: modelArgs.to,
body: modelArgs.body,
workspaceId: modelArgs.workspaceId,
});
The model should not choose its own scope. It should only operate inside the scope already assigned by the backend.
Test for overbroad write actions and cross-tenant side effects
Read leaks are bad. Write leaks are worse.
Check whether a tool can:
- update another tenant's record
- send notifications outside the workspace
- change billing or permissions
- write to a shared knowledge base
A good test is to verify that every tool call is rejected if the target object does not belong to the current tenant, even when the model generates a valid-looking action.
Concrete defense patterns that hold up
Enforce authorization in the backend, not the prompt
Prompt instructions can explain policy, but they cannot enforce it. The backend must verify:
- authenticated user identity
- tenant membership
- object ownership
- role-based permission
- action-specific policy
If a tool or route changes state, require the same checks as a normal API endpoint. Do not rely on “the assistant was told not to do that.”
Add per-tenant test cases and regression checks
I like tests that fail loudly when isolation breaks:
- request tenant B data while authenticated as tenant A
- reuse a cached prompt summary across tenants
- call a tool with a foreign object ID
- export a record from a workspace the user cannot access
Write these as automated regression tests. Multi-tenant bugs come back fast when the codebase grows.
What a good audit report should include
A useful report should show:
- the exact tenant boundary that was crossed
- the request or tool path involved
- the server-side check that was missing or bypassed
- the impact in plain terms
- a backend fix, not just a prompt tweak
If the issue is cross-tenant access, say so directly. The strongest finding is often not “the model was tricked,” but “the application failed to bind identity to every retrieval and tool action.”
That is the real boundary to audit.


