Lorem, ipsum dolor sit amet consectetur adipisicing elit. Qui, itaque voluptate ipsa non enim amet ducimus voluptatibus deserunt nam esse!
Testing for vm2 Sandbox Bypasses: What to Look for in User-Run Script Features

Testing for vm2 Sandbox Bypasses: What to Look for in User-Run Script Features

pr0h0
vm2sandbox-escapejavascript-securitysaas-security
AI Usage (89%)

Why vm2 keeps showing up in user-run script features

I keep seeing the same pattern in product reviews: a SaaS app wants flexible user logic, and someone reaches for vm2 because it feels like the safest way to run it.

That choice shows up in:

  • custom formulas in business apps
  • workflow builders
  • plugin runtimes
  • chatbot tools
  • AI agent actions
  • online code runners
  • “safe” automation hooks

The problem is not the idea. The problem is that JavaScript sandboxing is a narrow boundary, and teams often treat it like a hard security wall. Recent vm2 disclosures, including CVE-2026-26956 and other critical flaws reported around May 7, 2026, are a reminder that this boundary can fail in ways that matter.

When that happens, the attacker stops being “a user running script” and becomes “code execution inside the host process.”

What a sandbox escape means in a SaaS app

A sandbox escape is not just a strange exception or a broken script. In a SaaS context, it can mean the attacker crosses from untrusted code into the process hosting your app logic.

That is where the impact starts.

If the host process can reach:

  • environment variables
  • API keys
  • filesystem paths
  • internal service credentials
  • cloud metadata endpoints
  • private network routes
  • queue workers or job tokens

then a sandbox escape can become a full application compromise, not just a feature bug.

⚠️

If the sandbox process can read production secrets or call internal services, treat any plausible escape as high severity until proven otherwise.

Where JavaScript sandboxing gets fragile

Exceptions and error objects crossing the boundary

A lot of sandbox bugs begin with “safe” data that is not really safe. Error objects are awkward because they cross trust boundaries and can carry surprising behavior.

If the host inspects thrown values, stringifies them, or logs them with assumptions about their shape, you may get access to methods or properties that were never meant to be attacker-controlled. A strong review asks one question first: what exactly leaves the sandbox, and who touches it next?

Prototypes, constructors, and host object leaks

JavaScript has a deep object model, and sandbox code will probe it.

The usual pressure points are:

  • prototype chains
  • constructors
  • constructor.constructor
  • leaked references to host objects
  • objects that look inert but still inherit powerful behavior

I usually test these paths by checking whether the sandbox can observe host-created objects, then verifying whether it can climb from a plain object to something that should have stayed unreachable. The exact trick changes by library and runtime version, but the review goal stays the same: find out whether object identity is enforced or merely assumed.

Engine features and non-obvious execution paths

Sandbox code does not only run through obvious eval-style paths. It can also hit runtime features that are easy to miss in a product review:

  • WebAssembly
  • dynamic import behavior
  • asynchronous callbacks
  • exception serialization
  • proxy traps
  • getter/setter side effects

The mistake is thinking “we only allow JavaScript expressions” means the attack surface is small. In practice, any feature that executes attacker-controlled code or reifies host state can become a boundary crossing.

What to check if your product uses vm2 or a similar sandbox

Inventory every user-controlled execution surface

Start with a boring inventory. It usually finds the real risk.

Look for:

  • custom scripts
  • formulas
  • workflow steps
  • plugin manifests
  • agent tool code
  • “advanced” automation rules
  • templating engines that allow expressions
  • admin-only script runners that later get exposed to tenants

If the feature accepts user input and executes it as code, put it on the list.

Verify which host capabilities are reachable

Then ask what the script can reach if the sandbox fails in a small way.

Check for access to:

  • process
  • require
  • filesystem modules
  • outbound network calls
  • timers and background jobs
  • host globals
  • serialization helpers
  • app-specific helper functions that wrap privileged actions

A good test is to map the boundary in layers:

LayerQuestionWhy it matters
Sandbox APIWhat objects are exposed?Leaked helpers often become the first pivot
Host processCan code touch Node internals?This is where RCE becomes real
Runtime envCan secrets be read?Env vars are often the fastest win
NetworkCan the code call internal services?Internal reach can be worse than disk access

Test resource limits and process boundaries

Even if escape does not happen, resource abuse still matters.

Test whether one script can:

  • consume CPU indefinitely
  • allocate memory until the worker dies
  • hold open event loop work
  • trigger retries that multiply load
  • block a shared execution pool

If all user scripts run in the same process as the app, denial of service is not a side issue. It is part of the trust boundary.

Safer isolation patterns for untrusted code

Separate processes, containers, and microVMs

If the user can run code, prefer true isolation over in-process sandboxing.

Better patterns include:

  • separate worker processes
  • locked-down containers
  • microVMs for stronger workload separation
  • strict syscall filtering where applicable

The goal is to move from “library boundary” to “operating system boundary.”

Egress controls, short-lived credentials, and secret hygiene

Do not put production secrets in the execution process if you can avoid it. If the sandbox needs credentials, use short-lived, narrowly scoped tokens.

Also:

  • block unnecessary egress
  • deny access to metadata services
  • isolate per-tenant credentials
  • avoid shared high-value API keys in worker env vars

Audit logs and dedicated execution workers

I like dedicated workers because they make incident response much simpler.

You get:

  • a narrower blast radius
  • cleaner logs
  • easier patching
  • simpler rotation if compromise is suspected

If the execution worker is shared with the main app, your logs and containment story are both weaker.

Practical incident response if a sandbox escape is plausible

If you think vm2 or a similar sandbox was exploitable in your environment:

  1. Patch or disable the feature.
  2. Inventory every place the library is used.
  3. Check whether the sandbox process had access to secrets.
  4. Rotate any exposed credentials.
  5. Review logs for unusual script behavior, outbound connections, or host-level file access.
  6. Re-test whether user-controlled code can reach privileged host APIs.
  7. Assume any shared execution worker may need to be rebuilt.

Do not wait for proof of full compromise before rotating secrets. If the escape path is plausible and the process had sensitive reach, the safe assumption is exposure.

How to report this safely in bug bounty or internal testing

For authorized testing, keep the report focused on boundary failure and business impact.

A good report usually includes:

  • the exact user-script feature tested
  • the sandbox library and version if visible
  • the smallest safe proof that shows boundary crossing
  • what host capability became reachable
  • whether secrets, filesystem, or internal network access were in scope
  • evidence without weaponized payloads

Avoid publishing destructive payloads or a turnkey escape chain. You do not need that to prove impact. Show that user-controlled script can violate the intended isolation boundary, then stop.

Conclusion

vm2 and similar libraries are useful, but they are not a complete security story.

If your product lets users run JavaScript, treat that feature like a real execution environment, not a convenience API. The security question is not “did the script stay inside the sandbox wrapper?” It is “what happens if it gets out, and what was reachable if it did?”

That is the review that matters.

Share this post

More posts

Comments