Auditing LLM Access Control in Multi-Tenant Applications

AI Usage (88%)

Multi-tenant LLM features fail in the same dull ways as the rest of web apps: broken identity binding, weak authorization, and too much trust in client-side context. The only twist is that the bug often hides inside prompt assembly, retrieval filters, or agent tools.

Why LLM access control fails in multi-tenant systems

I usually start from one assumption: the model is not the security boundary. The backend is.

In a multi-tenant app, the LLM may see chat history, documents, tickets, or CRM records from one workspace. If the app builds that context from the wrong tenant, or if a tool call can reach the wrong tenant, the model will happily summarize whatever it receives. That is not a model failure. It is an access-control failure that happened before inference.

The failure modes I see most often are:

the UI passes workspaceId, but the API trusts it
retrieval filters are applied in one code path but not another
export or admin routes skip the same authorization checks as chat
agent tools run with global credentials instead of tenant-scoped ones

What to audit first: tenant identity, session context, and tool boundaries

Check how the app binds user, org, and workspace IDs

Trace the request from login to LLM call. You want to know where the app decides, “this user belongs to this tenant.”

A safe audit pattern is to inspect the session object and compare it to request parameters:

function buildTenantContext(req) {
  return {
    userId: req.session.user.id,
    orgId: req.session.org.id,
    workspaceId: req.session.workspace.id,
  };
}

If the code instead does this, you have a problem:

const workspaceId = req.body.workspaceId;

That value must be validated against server-side membership, not accepted because the client sent it.

Verify whether prompt context can cross tenant boundaries

Prompt injection gets too much attention here. The more common issue is accidental prompt contamination across tenants.

Check whether the app caches:

summaries
embeddings
recent messages
system prompts
tool results

If any of those are keyed only by userId, session token, or a shared cache entry, one tenant can leak into another. In one audit, a “helpful” conversation summary was reused across workspaces because the cache key ignored orgId. The model did nothing wrong; the app assembled the wrong context.

Testing the API layer for isolation gaps

Look for missing authorization on chat, retrieval, and export routes

You need to test every route that touches tenant data, not just the main chat endpoint. I look for:

POST /chat
POST /retrieval/search
GET /documents/:id
POST /exports
POST /agent/run

The bug class is usually inconsistent enforcement. The chat route checks membership, but the export route only checks that the caller is authenticated.

A simple review table helps:

Route type	What to verify	Typical failure
Chat	workspace membership	trusts client workspace ID
Retrieval	document ownership	returns cross-tenant hits
Export	same policy as read access	bypasses document ACLs
Agent tools	scoped credentials	global write access

Reproduce IDOR-style failures with safe tenant fixtures

Use two test tenants with clearly separated fixtures: tenant-a and tenant-b. Do not use production data.

Then try the boring attacks first:

Log in as a user in tenant A.
Send a request with tenant B's workspaceId.
Request a document ID from tenant B.
Trigger an export for a record you should not access.

If the response changes from 403 to data, you have an isolation failure. If the app returns a generic summary built from another tenant's content, that is still a leak.

Auditing RAG and vector search permissions

Confirm filters are enforced before retrieval

For RAG systems, the critical question is where filtering happens.

Bad pattern:

const hits = await vectorSearch(query);
const allowed = hits.filter((doc) => doc.workspaceId === ctx.workspaceId);

This looks fine until the search backend already exposed unrelated text in ranking metadata, snippets, or debug logs. The filter should be part of the retrieval query whenever possible.

Better:

const hits = await vectorSearch(query, {
  filter: { workspaceId: ctx.workspaceId },
});

I also check whether embeddings are stored in shared indexes with metadata-only filtering. That can work, but only if the filter is enforced server-side and cannot be removed by the caller.

Compare application-side checks with database-side constraints

Application checks are necessary, but they are not enough by themselves. If the database can be queried directly, the database should still prevent cross-tenant reads.

Look for:

row-level security
tenant-scoped views
composite indexes including workspaceId
foreign keys that include tenant ownership

If all the protection lives in JavaScript, one missed code path is enough to break isolation.

Reviewing tool calls and agent actions

Check whether tools inherit the correct tenant scope

Tools should receive tenant context from the server, not from the model output. That means the tool runner should inject orgId, workspaceId, and user role before the call executes.

A weak pattern looks like this:

await tools.sendEmail({
  to: modelArgs.to,
  body: modelArgs.body,
  workspaceId: modelArgs.workspaceId,
});

The model should not choose its own scope. It should only operate inside the scope already assigned by the backend.

Test for overbroad write actions and cross-tenant side effects

Read leaks are bad. Write leaks are worse.

Check whether a tool can:

update another tenant's record
send notifications outside the workspace
change billing or permissions
write to a shared knowledge base

A good test is to verify that every tool call is rejected if the target object does not belong to the current tenant, even when the model generates a valid-looking action.

Concrete defense patterns that hold up

Enforce authorization in the backend, not the prompt

Prompt instructions can explain policy, but they cannot enforce it. The backend must verify:

authenticated user identity
tenant membership
object ownership
role-based permission
action-specific policy

If a tool or route changes state, require the same checks as a normal API endpoint. Do not rely on “the assistant was told not to do that.”

Add per-tenant test cases and regression checks

I like tests that fail loudly when isolation breaks:

request tenant B data while authenticated as tenant A
reuse a cached prompt summary across tenants
call a tool with a foreign object ID
export a record from a workspace the user cannot access

Write these as automated regression tests. Multi-tenant bugs come back fast when the codebase grows.

What a good audit report should include

A useful report should show:

the exact tenant boundary that was crossed
the request or tool path involved
the server-side check that was missing or bypassed
the impact in plain terms
a backend fix, not just a prompt tweak

If the issue is cross-tenant access, say so directly. The strongest finding is often not “the model was tricked,” but “the application failed to bind identity to every retrieval and tool action.”

That is the real boundary to audit.