Marimo CVE-2026-39987: How Attackers Weaponized an LLM Agent After the Exploit

AI Usage (98%)

What stood out in the Marimo CVE-2026-39987 report was not the initial compromise. It was what happened after it.

According to the public reporting, attackers did not stop at a single exploit against the Marimo notebook environment. They chained in an LLM agent to handle post-exploitation work. That is a useful warning for anyone running notebook tooling, because the real boundary is not the notebook UI. The real boundary is everything the notebook can reach once code execution is available: local files, environment variables, cloud credentials, git state, internal services, and whatever the operator assumed was “just dev tooling.”

I usually think about notebook incidents in two phases. The first is access. The second is leverage. An LLM agent tends to show up in phase two, where the attacker wants to turn a fragile foothold into a repeatable workflow.

What happened in Marimo CVE-2026-39987

The public report describes attackers using Marimo CVE-2026-39987 as the entry point, then bringing an LLM agent into the picture after the exploit. That sequence matters. It suggests the exploit was not the end goal. It was the opening move.

Marimo is a notebook-style development environment, so the trust model is already unusual. Notebook systems mix editor, runtime, and execution history in one place. That is convenient for developers and attractive to attackers, because one trusted session often reaches farther than it should. If a notebook can execute code, it may also be able to inspect local project files, read secrets from the process environment, and reach internal services that were never meant to be exposed to a browser-facing workflow.

The vulnerable trust boundary in the notebook workflow

When I audit notebook platforms, I look for the exact point where user-authored content crosses into execution. That is the trust boundary. In a browser app, a malicious string should stay data. In a notebook runtime, the same string may become code, a command, or a tool invocation.

That boundary gets fuzzy in notebook products for a few reasons:

cells are supposed to run code by design
state persists between executions
local files are often mounted into the runtime
users assume “local-first” means “not remotely reachable”
operators often expose the service for convenience during development

Once an attacker crosses that line, the incident is no longer about the original bug alone. It becomes about everything the notebook process can see. If the notebook server inherits cloud credentials, package registry tokens, SSH keys, or access to mounted source trees, the compromise can move quickly from code execution to data access.

Why this issue mattered beyond the initial exploit

The headline detail is the exploit. The practical detail is the aftermath.

A one-shot exploit usually gives the attacker a narrow, brittle foothold. They need to decide quickly what to collect, where to look, and how to avoid tripping alarms. If they have an LLM agent available, that workflow changes. The agent can inspect outputs, keep context across multiple steps, and decide what to try next without an operator stitching everything together by hand.

That matters because notebook environments often contain exactly the kind of high-value material an attacker wants after initial access:

.env files with service tokens
cloud SDK credentials
source code with hardcoded endpoints
notebook outputs that leak secrets
cached package credentials
internal documentation or API examples
mounted repositories and workspace artifacts

If the attacker can automate the search, they do not need a perfect exploit. They just need a foothold and a way to keep exploring.

Why attackers added an LLM agent after compromise

The real shift is from “exploit once” to “understand and adapt.”

A static script is fast, but it is brittle. It expects a known layout, predictable file names, and a fixed command sequence. Real environments are messy. The process may run as a different user, the target may be containerized, the cloud metadata service may be blocked, the shell may be missing tools, or the directory structure may be unfamiliar.

An LLM agent is useful because it can absorb those surprises. It can read command output, make a best guess, and keep going. That is exactly what makes it dangerous in post-exploitation.

Moving from one-shot exploitation to guided post-exploitation

A traditional attacker workflow often looks like this:

land a shell or code execution primitive
run a few discovery commands
search for secrets
collect data or pivot

An agent changes the pace. The attacker can give it a goal instead of a script:

map the environment
find the active user and working directory
identify interesting files
look for credentials or tokens
summarize what looks valuable
stop before doing anything noisy

That is not magic. It is just better state management. But that is enough to reduce operator effort, especially when the target environment is unfamiliar.

What an agent can do better than a static script

In practical terms, an agent is better at four things:

branching: it can choose different next steps based on command output
context retention: it remembers what it already checked
summarization: it turns noisy output into a short list of leads
task decomposition: it can break a broad instruction into smaller checks

That is a real advantage when the attacker does not know whether they are looking at a laptop, a container, a CI runner, or a production notebook host.

It is also why defenders should pay attention to the shape of the activity, not just the final payload. A human typing a few commands looks different from an agent doing repeated read-only checks, asking for summaries, and branching based on what it sees.

Reconstructing the attack chain step by step

The public reporting does not give us a full packet capture or command history, so I would avoid pretending we know every exact step. But the post-exploitation pattern is familiar enough to reconstruct the likely phases in a defensible way.

Initial access and execution path

The first phase begins with the Marimo exploit. In a notebook system, that usually means one of two things:

the attacker reaches an execution path in the notebook service itself
the attacker gains code execution in the notebook runtime and uses it to reach the host or adjacent resources

From a defender’s point of view, both are serious. The difference is mostly where the control plane sits. If the exploit lands inside the notebook process, the attacker may be constrained by the container or user context. If the exploit reaches the host or the server’s management path, the blast radius grows quickly.

The key question is not “did they get code execution?” It is “what identity did that code execute as, and what did that identity already trust?”

That includes:

current Unix user
mounted volumes
environment variables
network egress permissions
notebook execution context
any parent process privileges

If the notebook service runs as a powerful user or inherits secrets from the host, a small exploit can become a broad compromise.

Privilege discovery and environment mapping

Once inside, the attacker’s first job is to map the environment. In a notebook context, that means checking whether the process is constrained or whether it can see far more than it should.

The useful defensive lens here is to think in terms of questions, not commands:

Which user is running the notebook?
Is the process inside a container or on bare metal?
What directories are mounted?
What environment variables are present?
Can the process reach the internet?
Can it talk to internal services?
Are cloud metadata endpoints reachable?
Is there an attached workspace or repository?

The agent helps because it can answer those questions quickly, then adapt. If the environment looks like a container, it can inspect mount points and runtime metadata. If it looks like a developer workstation, it can inspect local project files and user artifacts. If it sees cloud SDKs or kubeconfig files, it can prioritize those instead.

This is where notebook compromise becomes much more than a single vulnerable service. The notebook often has the same reach as the developer session that launched it.

Lateral checks, persistence hints, and artifact hunting

After mapping the environment, the next phase is usually to look for artifacts that can extend access or reveal something worth stealing. That may include:

SSH configuration and keys
cloud credentials in standard config paths
package manager tokens
.git history and remote URLs
notebook checkpoints and outputs
shell history
service unit files or startup scripts
CI credentials or deployment manifests

From the attacker’s perspective, the goal is not always persistence in the classic sense. Sometimes the goal is simply to collect enough context to return later through a different path.

From the defender’s perspective, any read access to these locations is a signal. Repeated file reads across credential-bearing paths inside a notebook process should not look normal. The same is true for attempts to enumerate user home directories, hidden config folders, or mounted secrets volumes.

A useful rule of thumb: if the notebook is reading like a human developer would during normal work, that is one pattern. If it is systematically enumerating credential and startup locations, that is a different pattern.

How the LLM agent changes attacker workflow

This is the most important part of the story. The LLM agent does not create new physics. It changes the economics of time, attention, and adaptation.

Prompting the agent with environment constraints

A post-exploitation agent is usually told what it can and cannot do. That instruction set matters, because it shapes how much noise the attacker creates.

A constrained prompt might tell the agent to:

inspect the current environment
avoid destructive actions
prefer read-only checks first
summarize interesting files and processes
report back before any high-impact action

That is exactly why agent-based post-exploitation can be quieter than a blunt script. It can be instructed to stay low-profile until it has enough confidence to move.

For defenders, the presence of a “careful” attacker is not reassuring. It means the activity may stay in the reconnaissance phase longer, which makes it harder to notice if you only alert on obvious payloads.

Using tool calls for safe reconnaissance versus noisy abuse

There is a big difference between read-only reconnaissance and noisy abuse. I find it helpful to separate the two in a simple table:

Activity type	What it looks like	Why it matters
Read-only enumeration	process listing, file discovery, environment inspection	often the first sign of post-exploitation
Credential hunting	access to `.env`, cloud config, SSH material, notebook outputs	high-value data exposure
Network probing	connections to internal services or metadata endpoints	possible pivot or data access
Tool installation	package manager use, binary downloads, script fetches	may indicate persistence or tooling expansion
Archiving and transfer	compression, staging, outbound uploads	possible exfiltration

An agent is useful because it can keep these phases separate. It can inspect first, decide second, and only then escalate its own activity. That means defenders should not wait for exfiltration to notice something is wrong.

Failure modes: hallucinated commands, bad assumptions, and logging clues

Agents are not perfect. They hallucinate commands, assume a tool exists when it does not, and infer paths that are wrong for the environment. Those failures are useful for defenders because they leave traces.

Common failure modes include:

repeated attempts to use a missing binary
commands that assume a Linux layout on a different base image
mistaken assumptions about the current user or permissions
excessive retries against blocked network locations
odd command combinations that look machine-generated
logs that show a lot of short, exploratory actions instead of one clean script

Those clues matter because they can distinguish an agent-driven workflow from a human operator. Human operators usually settle on a path quickly. Agents often test a few dead ends before converging.

Where defenders can spot this kind of activity

The short version is that you should correlate notebook execution with host-level and network-level telemetry. Looking at only one layer misses the shape of the attack.

Process, shell, and notebook telemetry to review

I would start with the process tree. Notebook compromise tends to spawn unusual child processes.

Watch for:

notebook server processes spawning shells
shells spawning network tools or archive utilities
Python processes launching subprocesses unexpectedly
repeated short-lived children under the notebook runtime
commands that read many local files in a row

A useful triage bundle on a suspect host is usually this kind of volatile snapshot:

ps auxf
ss -tpna
lsof -p <suspect-pid>
env | sort
find /proc/<suspect-pid>/fd -maxdepth 1 -type l -ls

That is not a full forensic answer, but it helps you understand what the process could already see and where it was trying to reach.

Network indicators and unusual model or API usage

The public report says attackers used an LLM agent after exploitation, so network telemetry becomes especially important. If an agent is active, it may call out to a model provider, a proxy, or an internal orchestration service.

Defenders should review:

unexpected outbound API calls from notebook hosts
DNS lookups that do not match normal development tooling
model or inference endpoints that are not part of the approved stack
spikes in outbound traffic after a notebook execution event
repeated small requests that match tool-call loops

You do not need to know the exact model provider to spot the pattern. The signal is that the notebook host starts behaving like an orchestrated agent runner instead of a local development box.

File-system and credential-access patterns worth alerting on

The most reliable detection often comes from file access patterns. Notebook compromise tends to touch predictable locations.

High-signal paths include:

user home directory secret stores
cloud SDK config directories
SSH key locations
.env files
notebook checkpoints and outputs
repository metadata and deployment manifests
service account tokens mounted into containers

If your logging can show file reads, tie them to the notebook process tree. A notebook server reading a project directory is normal. A notebook server reading credential stores, kubeconfig, and shell history in quick succession is not.

Defensive checks for Marimo users and notebook operators

If you run Marimo or any similar local-first notebook tool, the practical lesson is simple: treat it like a remote-capable execution surface once it is exposed to more than localhost.

Verify exposed services, auth boundaries, and remote execution paths

Start with the basics:

Is the service bound only to localhost?
Is authentication required before any execution path is available?
Are remote share or collaboration features enabled?
Can a browser session trigger code execution without a strong authorization check?
Are there any routes that transform notebook content into server-side actions?

These are boring questions, but they are the ones that decide whether a notebook is a personal tool or a production exposure.

A lot of teams assume that a development notebook is safe because it is “just for internal use.” That assumption usually breaks the moment the service becomes reachable from another machine, a VPN, a reverse proxy, or a shared environment.

Patch, isolate, and restrict notebook privileges

If you operate a vulnerable version, patch first. Then reduce the privileges of the runtime even if you think the patch is enough.

Good defaults include:

run as a non-root user
use a dedicated service account
avoid mounting the host filesystem
keep secrets out of the notebook process environment
disable unnecessary network egress
isolate workspaces per user or project
make notebook containers ephemeral where possible

The point is to keep a compromise from becoming an environment-wide incident. If the notebook process can only see a small workspace and a narrow set of credentials, the attacker’s post-exploitation path gets much shorter.

Reduce blast radius with container, user, and network controls

This is where operators can make a real difference.

Control	What it limits	Why it helps
Container isolation	host visibility	reduces access to system files and neighbors
Non-root execution	privilege escalation	limits damage if code runs
Read-only mounts	workspace tampering	protects local state and tooling
Egress filtering	outbound connections	makes exfiltration and agent calls harder
Secret scoping	credential exposure	shortens token lifetime and reach
Separate service accounts	lateral movement	prevents reuse across systems

If you cannot implement all of those, start with the ones that block the easiest wins for an attacker: secrets, egress, and privilege.

Detection ideas that map to the post-exploitation phase

The post-exploitation phase is where a lot of teams are weakest, because the behavior can look like normal developer activity right up until it does not.

High-signal audit events and command patterns

I would prioritize alerts on combinations rather than single events:

notebook process followed by shell spawn
shell followed by file enumeration across hidden config paths
repeated reads of secret-bearing directories
archive creation followed by outbound transfer
package manager activity from a notebook runtime
access to cloud metadata or instance identity services
outbound requests to unexpected model APIs from notebook hosts

You want to catch the sequence, not just the node in the sequence.

A simple mental model is:

notebook runs
notebook starts exploring the environment
notebook touches secrets or network paths
notebook stages data or calls out

The earlier you catch it, the smaller the blast radius.

Correlating notebook actions with unexpected agent-driven enumeration

The clearest sign of an agentic post-exploitation workflow is repetitive, adaptive enumeration. It looks like this:

many small read-only operations
branching based on output
sudden focus on one interesting directory
a shift from discovery to credential hunting
a network call that does not match the normal notebook workflow

If your logging can correlate notebook cell execution with process creation and network events, you can often reconstruct the exact point where the attacker shifted from “testing the environment” to “working the environment.”

That is the point where response should escalate.

A practical hardening checklist for developers

I like checklists for notebook systems because they force the uncomfortable truth: local tools become remote surfaces once they are shared, proxied, or exposed.

Secure defaults for local-first tools that can become remote surfaces

If you build or deploy notebook software, default to the safest possible stance:

bind to localhost unless remote access is explicitly required
require authentication before execution
separate content rendering from code execution paths
make unsafe features opt-in, not on by default
log notebook execution and process spawning clearly
document the trust model in plain language

The point is to make it hard to accidentally expose an execution surface and easy to see when someone does.

Secrets handling, service accounts, and environment hygiene

A compromise gets much worse when the notebook process is surrounded by secrets. Clean that up.

keep long-lived secrets out of environment variables when possible
use short-lived scoped tokens
rotate credentials after any notebook exposure
remove unused cloud credentials from developer hosts
separate notebook credentials from deployment credentials
scrub sensitive outputs from saved notebooks and checkpoints

If you only do one thing, reduce what the notebook process can inherit at startup. That alone can cut the impact of post-exploitation dramatically.

What incident responders should preserve first

If you are responding to a notebook compromise, do not rush to reimage before you have collected the useful state. Notebook incidents often leave the most important clues in memory, logs, and execution state.

Volatile evidence, notebook state, and execution history

Preserve:

process tree and child processes
active network connections
notebook server logs
notebook execution history
shell history for the runtime user
mounted volumes and their permissions
current environment variables
recent file access and command traces if available

The notebook’s state matters because it can show which cell or session introduced the suspicious behavior. That is often more useful than the final payload.

Questions to answer before rebuilding or reimaging

Before you wipe anything, answer these questions:

What was the initial ingress path?
Which identity did the notebook run as?
What secrets were present in the environment?
Did the attacker touch local files, cloud credentials, or internal services?
Was any outbound traffic associated with a model API or agent runner?
Did the attacker stage data, or only enumerate?
How long was the notebook reachable before detection?

Those answers shape the containment plan and the follow-up rotation work. If you skip them, you will probably have to revisit the incident later with less evidence.

Closing perspective: why the post-exploit stage is the real story

The headline exploit gets attention, but the post-exploit phase is where the damage is decided.

The Marimo report is a good reminder that attackers are increasingly willing to put an LLM agent in the middle of their workflow once they have a foothold. That does not mean the agent is smarter than the attacker. It means the attacker can spend less time manually poking around and more time exploiting the environment’s own assumptions.

For defenders, the response is not to panic about “AI attackers” as a category. The response is to tighten the places where notebook software crosses trust boundaries, reduce the secrets available to the runtime, and watch for the sequence that turns code execution into environment mapping and data access.

If your notebook can reach real assets, then the post-exploitation phase is not an edge case. It is the main event.