Zero-Trust for AI: Architecting LLM Pipelines That Distrust Every Input

AI Usage (85%)

I ran a test last week where a browser agent read a support ticket and fired a refund API call. The ticket was from an angry user. The prompt injection looked like harmless text, but the agent trusted it. That's when I stopped treating AI inputs as safe.

Why You Can't Trust LLM Inputs

LLMs turn unstructured text into high-privilege actions — tool calls, database writes, code execution. A single crafted paragraph can slip past naive filters and land directly in the reasoning path. Prompt injection isn't the only problem. Data poisoning, instruction drift, and context-window contamination all exploit the same weakness: every word that reaches the model comes from outside. None of it is safe by default.

Zero-Trust's Core Principles for AI

If you apply zero-trust ideas to an AI pipeline, you treat every chunk of data as hostile until proven otherwise — system prompt, user message, tool result, model output. You don't trust the network, and you don't trust the model's own reasoning.

Verify Every Input Explicitly

Before a prompt hits the model, or before a tool receives arguments, validate against a concrete policy. Don't lean on the LLM to reject bad instructions. Enforce rules outside the model.

Assume Breach at Each Step

Even if an input clears an early guard, later components still re-verify. An injection that slips through the first sanitizer should get caught by tool-call authorization or output validation.

Least Privilege for Tools and Models

Grant the model and each tool only the permissions they need for the immediate action. A knowledge-base search tool doesn't require write access to the database, so don't give it that key.

Architecting the Distrustful Pipeline

A distrustful pipeline is a chain of friction points, not a single magic filter.

Input Sanitization Beyond Prompt Injection

Forget just blocking “ignore previous instructions.” Enforce structure. Validate JSON schemas on structured arguments. Deny unusual Unicode sequences that bypass keyword checks. Maintain a strict allow-list of tokens when you can.

Authorizing Tool Calls with Context

A tool call that's harmless for one user can become a disaster in another session. Bind each execution to the user's auth context, current session, and explicit consent. The pipeline should ask: does this user, in this context, have the right to invoke this function with these arguments?

Output Validation as a Defense Layer

Model outputs that flow into downstream systems — HTML, SQL, another prompt — must be treated as untrusted. Apply the same scrutiny you'd give to any external input. If the LLM generates a command, check it against a known-safe list before you run it.

Implementing a Zero-Trust LLM Pipeline in Node.js

In practice, I build the pipeline as a stack of composable functions. Each layer adds friction where it matters.

Building an Input Guard with Policy Checks

A guard is a plain function that runs before every model call:

input-guard.js

const guard = (input, policy) => {
// Reject inputs that contain forbidden markers
if (policy.blockList.some(pat => input.match(pat))) {
  throw new Error('Blocked by policy');
}
// Enforce max length and character set
if (input.length > policy.maxLength || /[\\x00-\\x08]/.test(input)) {
  throw new Error('Suspicious input');
}
return input;
};

No sensitive logic trusts the LLM to decide what's safe.

Secure Tool Execution with Temporary Credentials

Instead of long-lived secrets, I generate short-lived, scoped tokens per request:

scoped-tool-exec.js

async function executeTool(toolName, args, userContext) {
const scopedToken = await issueScopedToken({
  userId: userContext.id,
  scopes: [toolName],
  expiry: 30 // seconds
});
return callToolWithToken(toolName, args, scopedToken);
}

Even if a prompt tricks the agent into calling a sensitive tool, the token limits damage to that single operation and the current user's scope.

Auditing the Pipeline with Log Hooks

Inject audit hooks at every layer — guard rejections, tool calls, output post-processing:

audit-hook.js

const withAudit = (handler, eventType) => async (data, ctx) => {
const start = Date.now();
try {
  const result = await handler(data, ctx);
  await logEvent({ eventType, data, result, duration: Date.now() - start });
  return result;
} catch (error) {
  await logEvent({ eventType, data, error: error.message, duration: Date.now() - start });
  throw error;
}
};

This turns a black-box into a forensic trail you can inspect after an incident.

Testing the Distrust Model

Run adversarial probes that simulate prompt injection, malformed tool arguments, and context-overflow attacks. Verify each guard rejects them correctly, tool authorization blocks out-of-scope calls, and output filters catch dangerous reflected content. Fuzz the pipeline like any other security boundary.

Zero-Trust for AI: Architecting LLM Pipelines That Distrust Every Input

Why You Can't Trust LLM Inputs

Zero-Trust's Core Principles for AI

Verify Every Input Explicitly

Assume Breach at Each Step

Least Privilege for Tools and Models

Architecting the Distrustful Pipeline

Input Sanitization Beyond Prompt Injection

Authorizing Tool Calls with Context

Output Validation as a Defense Layer

Implementing a Zero-Trust LLM Pipeline in Node.js

Building an Input Guard with Policy Checks

Secure Tool Execution with Temporary Credentials

Auditing the Pipeline with Log Hooks

Testing the Distrust Model

Further Reading

Share this post

More posts

Hardening LLM Agents: Preventing Tool Abuse via Prompt Injection

A Counterfeit npm Package, 450 Exposed Repos: Dissecting the TanStack Supply Chain Incident

From AI-Discovered 0-Day to Hardened Redis: Practical Defensive Fixes

Comments