When Attackers Use AI to Evade EDR: Hardening Build Agents Against Lateral Movement

AI Usage (79%)

AIMeter and threat framing

On June 3, 2026, a report described a pattern defenders have been watching for a while: attackers are using AI tools to speed up Active Directory abuse and make EDR noise easier to manage. The key point is not that AI found a new bug class. It didn’t. The key point is that it lowered the effort needed to chain reconnaissance, credential abuse, and log shaping across a Windows-heavy environment.

That difference matters.

A lot of teams hear “AI attack” and picture some novel exploit hidden inside the model. In practice, the risk is usually less dramatic and more useful to an attacker: AI helps them move faster through the parts of the kill chain that already worked. It can parse host data, draft scripts, compare paths, reshape commands to look less suspicious, and summarize trust relationships. None of that replaces access control. It just makes weak boundaries easier to pressure at scale.

If your build infrastructure can reach domain services, internal APIs, secret stores, or admin tooling, a compromised runner can become a pivot point. The report is basically saying attackers are already treating that pivot as a workflow problem, not just a payload problem.

Why build agents are a high-value pivot point

Build agents live in a strange spot. They are supposed to be temporary, automated, and low-friction. In practice, they often sit close to the keys.

They run code from pull requests and package builds. They fetch dependencies. They mount caches. They talk to artifact registries, internal package feeds, secret managers, and deployment APIs. On Windows-heavy shops, they may also sit on networks where domain controllers, file shares, and remote management endpoints are only a few hops away.

CI/CD trust is often broader than people think

The trust model around CI/CD tends to grow one permission at a time.

A team starts with one runner that can build code. Then it needs access to a private NuGet feed. Then it needs a token for staging deploys. Then it needs to sign artifacts. Then it needs to talk to a config service. Each change feels small on its own. Put them together and the runner now has enough reach to touch production-adjacent systems.

The usual problem is not the first credential. It is the pile of permissions around the job:

environment variables with deploy tokens
mounted files with signing keys or service credentials
cache directories that persist across jobs
network reachability to internal-only services
build-time identity that can impersonate a broader automation account

If an attacker gets code execution on that runner, they are not just inside a container or VM. They are inside a trust bundle.

A compromised runner can reach identity, secrets, and internal services

From a defender’s point of view, a build agent is risky because it often has three kinds of access at once:

Access type	Why it matters	Common failure mode
Identity	The runner can authenticate as a service account or with a federated token	Overprivileged automation account
Secrets	The job can read API keys, signing material, or deploy credentials	Secrets exposed in env vars, files, or logs
Network	The agent can talk to internal systems that are not exposed externally	Flat east-west access from a “temporary” host

That combination is what makes lateral movement possible. The runner does not need to be domain-admin by design. It only needs to be close enough to something more valuable.

What the report is actually signaling about AI-assisted tradecraft

The useful signal in the report is not “AI can hack.” It is that AI can compress the operator workflow around known techniques.

AI as an operator accelerator, not a new exploit class

I would break AI use into three buckets:

Parsing and planning
The model helps sort through logs, inventory data, and host telemetry. That can shorten reconnaissance.
Code and script generation
The model helps draft small utilities, wrappers, or one-off automation around existing system tools.
Message and log shaping
The model helps paraphrase commands, standardize naming, and reduce obvious signatures in operator notes or script comments.

That does not create a new bypass of EDR by itself. It does make it easier for a less skilled attacker to imitate patterns that would otherwise take time to learn.

The defender takeaway is simple: do not assume clumsy tradecraft is a reliable filter anymore.

How AI helps with AD reconnaissance, phrasing, and log evasion

Active Directory environments reward patience and context. Attackers want to answer questions like:

Which machine is joined to the domain?
Which service account has unusual rights?
Which hosts can reach a domain controller?
Which endpoints expose remote management?
Which internal shares contain useful material?

AI helps because those questions often produce messy text output. It can summarize that output, turn it into a prioritized list, and suggest next steps.

The same thing applies to evasion. An operator can ask a model for alternate ways to describe an action, or to rework scripts so the process tree looks less obvious. That is not magic, but it lowers the time cost of trying variants until one blends into normal admin activity.

For defenders, this means the old assumption — “attackers will make obvious mistakes” — is less dependable. You need telemetry that catches the behavior, not just the exact string.

A safe walkthrough of the attacker path from build agent to lateral movement

This is where I want to stay defensive and still be concrete. The path from runner compromise to lateral movement usually does not start with anything dramatic. It starts with a job that should never have had that much reach.

Initial foothold assumptions in CI/CD environments

In most real environments, the attacker foothold is one of these:

a poisoned dependency or build step
a malicious pull request that triggers a job
a stolen token for the CI platform
a vulnerable self-hosted runner
an abused plugin or shared automation secret

The runner itself may be ephemeral, but the credentials around it are often not. That is where the chain begins.

A practical way to think about it is this: if a job can execute code, then every secret attached to that job is potentially reachable. The question is not whether the build is “trusted.” The question is what else the job can see.

Enumerating domain context, service accounts, and reachable systems

Once a runner is compromised, the useful defensive questions are:

Is the host domain-joined?
What identity is the process running under?
Are there service-account credentials on disk or in memory?
Which internal hosts respond from this subnet?
Can the runner reach LDAP, SMB, WinRM, RDP, or Kerberos services?
Are there cached artifacts, logs, or temp files with tokens?

A safe verification loop looks like this:

Identify the job account and its privileges.
Identify the network paths available to the runner.
Identify secrets exposed to that job at runtime.
Identify whether those secrets can access directory, storage, or deployment systems.
Identify whether anything persists after the job ends.

Here is a defensive inspection pattern you can run on a controlled Windows runner to understand context without simulating abuse:

whoami /all
hostname
ipconfig /all
Get-ChildItem Env: | Sort-Object Name
Get-SmbConnection
Get-NetTCPConnection -State Established | Select-Object LocalAddress,LocalPort,RemoteAddress,RemotePort,OwningProcess

The value here is not the output itself. It is seeing how much a “temporary” agent can learn from its own runtime environment.

Where lateral movement usually starts to succeed

Lateral movement tends to work at the first boundary where the runner can talk to something privileged with a reusable credential.

Common examples:

a service account that can authenticate to multiple internal systems
a deployment token that can trigger privileged jobs
a host-based management tool that trusts the runner subnet
cached admin credentials left behind by maintenance
an internal API that accepts bearer tokens from the automation plane

When that happens, the attacker does not need a perfect exploit. They need a path that looks normal to the infrastructure.

That is why build agents are such good pivots. They often live in the gray area between developer convenience and production trust.

EDR evasion patterns defenders should expect on build infrastructure

The report’s mention of EDR evasion is worth taking seriously, but not because EDR is useless. It is because build hosts are often noisy in ways that make naive detection harder.

Process shaping, living-off-the-land binaries, and reduced noise

Attackers know endpoint tools look for unusual parent-child chains, weird command lines, and new binaries dropped into temp locations. On a build agent, many legitimate tasks already resemble admin activity:

script hosts launching compilers
package managers spawning shell helpers
archive tools extracting content
system utilities touching registry, service, or network settings

That normal noise creates cover.

The evasion pattern is usually not “turn off the EDR.” It is more subtle:

use built-in binaries instead of dropping obvious tools
borrow the same execution style as the platform’s own automation
keep commands short and modular
avoid high-volume failures that trigger attention
reuse paths and tools that already appear in build logs

Defenders should watch for when that pattern crosses from build behavior into operator behavior. A compiler invoking a shell is not always suspicious. A compiler invoking a shell that starts enumerating domain services is.

Scripted abuse of legitimate admin tooling and remote management paths

Remote management tools are a major concern because they are designed to be trusted. If the build network can reach them, an attacker does not need to invent a transport.

The risk categories are familiar:

PowerShell remoting where it should not exist
SMB access to administrative shares from an untrusted runner
WinRM exposure across a broad subnet
scheduled tasks or service creation used as lateral movement mechanics
directory queries from a host that should not need them

The point is not the tool itself. The point is the trust boundary around it.

A runner that can use a legitimate management path can look almost invisible unless you correlate process context with identity and network telemetry.

Why traditional endpoint detections miss runner-specific behavior

Traditional EDR detections often assume an employee workstation or a server with a stable role. Build agents break that model.

They tend to have:

high process churn
frequent archive and extraction activity
scripts from multiple languages
transient binaries and temp paths
service accounts that are hard to distinguish from automation

That means simple rules like “script host plus network activity” are too broad. But if you ignore those hosts entirely, you create a blind spot.

The better model is to baseline by runner role:

what executables should appear
what command patterns are normal
which destinations are expected
which secrets are allowed at runtime
how long the host should live

If a job runner is making domain queries at 2 a.m. from a subnet that only needs artifact access, you want that to light up.

Telemetry that catches this chain early

Good detections here are cross-domain. Host-only data is not enough. Identity-only data is not enough. Network-only data is not enough. You need the chain.

Host signals: process trees, parent-child anomalies, and suspicious execution context

Start with the process tree. On build systems, the most useful questions are:

Did a build orchestrator spawn an unexpected shell?
Did a shell spawn directory tools or remote management utilities?
Did a script run from a temp directory, cache path, or artifact folder?
Did a known build binary suddenly start launching reconnaissance commands?

A useful heuristic table:

Signal	Why it matters	Example concern
Unusual parent-child chain	Indicates execution outside normal build flow	Compiler spawning interactive shell
Script from temp/cache path	Often used for transient staging	Job leaves behind live script in cache
New binary in runner workspace	Suggests dropped tooling	Unsigned executable in build folder
High entropy or renamed files	Can hide staging artifacts	Randomized filenames in job workspace
Unexpected interactive context	Rare in automated jobs	Session-like behavior from service account

You do not need every signal to fire. One strong anomaly plus matching identity and network context is enough.

Identity signals: unusual token use, service account misuse, and privilege jumps

Identity is where a lot of build-agent incidents become obvious in hindsight.

Look for:

a service account authenticating to systems it never normally touches
token use outside normal job windows
a runner account showing privilege changes mid-job
failed authentication bursts followed by success
multiple hosts using the same credential material

If your CI/CD platform supports workload identity or short-lived tokens, that should reduce the blast radius. If you still see long-lived credentials on the runner, treat that as a red flag.

A simple rule of thumb: automation credentials should authenticate to automation endpoints. If they start behaving like an operator account, something is off.

Network signals: unexpected east-west traffic, LDAP, SMB, WinRM, and Kerberos patterns

Network telemetry often catches the pivot before endpoint telemetry does.

Watch for:

a build runner reaching LDAP or Kerberos endpoints it does not normally need
SMB connections from build subnets to administrative shares
WinRM or RDP from a runner network segment
unusual east-west connections to file servers, domain controllers, or management hosts
service accounts making requests to multiple internal systems in a short window

You can think about it in terms of expected job traffic versus suspicious internal reconnaissance:

Normal runner traffic	Suspicious runner traffic
artifact registry	domain controller queries
dependency feed	SMB to admin share
package mirror	WinRM to server subnet
secret manager	Kerberos bursts to many hosts
deployment API	repeated LDAP queries

If a build host suddenly acts like a workstation doing admin discovery, that deserves attention.

Hardening build agents to resist AD pivoting

This is where most of the risk reduction happens. The goal is not to make runners impossible to compromise. The goal is to make compromise non-pivotal.

Make runners ephemeral, isolated, and narrowly scoped

Ephemeral is good, but only if it is real.

A runner should be:

recreated from a clean image
isolated from other runners
scoped to one job class or trust tier
unable to persist local state across builds
blocked from interacting with sensitive internal networks by default

If the host is long-lived, then the “temporary” assumption is fake. Once a runner becomes a persistent server, treat it like one.

Separate build identities from domain privileges

Do not let build identity drift into directory privilege.

Prefer:

short-lived tokens over reusable secrets
workload identity over static passwords
separate identities per repo, pipeline, or environment
distinct accounts for build, deploy, and sign
no membership in broad domain groups unless absolutely required

If a runner must authenticate to internal services, limit exactly which services and actions are allowed. The fewer cross-domain permissions a build account has, the less valuable it becomes after compromise.

Lock down secrets, tokens, and machine credentials

A runner usually fails at the secret boundary before it fails anywhere else.

Good controls include:

inject secrets only for the specific job that needs them
redact secrets from logs and crash dumps
avoid placing credentials in globally readable env vars
prevent secrets from being written to workspace or cache paths
rotate any secret that has to be mounted into a job

⚠️

A wildcard EDR exclusion on a build workspace is usually a gift to an attacker. Exclude only the exact path or process you can justify, and review it often.

Also pay attention to machine credentials. If the runner image or host is domain-joined, any cached ticket, token, or machine secret increases the blast radius. A compromised host should not be able to impersonate the environment around it.

Use egress control, segmentation, and allowlists to shrink movement options

If a runner can only talk to the endpoints it needs, lateral movement gets harder fast.

The safest model is deny-by-default:

allow artifact registry access
allow source control access
allow secret manager access
allow deployment API access only when needed
deny SMB, LDAP, WinRM, RDP, and general east-west by default

If some internal traffic is required, build an allowlist tied to the job role, not the whole subnet. Segmentation should reflect function, not convenience.

The practical effect is that a compromised build agent can still fail a build, but it cannot freely scan or bounce into the domain.

Audit EDR exclusions and remove runner-specific blind spots

I see this mistake a lot: a team gives build agents broad EDR exclusions because they are “too noisy,” then treats the exception as permanent.

That is risky for two reasons:

it creates a direct blind spot on the exact hosts that execute untrusted code
it trains attackers to look for the same exclusion pattern across environments

Review:

excluded paths
excluded extensions
excluded hashes
excluded parent processes
runner-specific policy exceptions

If an exclusion is necessary, make it as narrow as possible and tie it to a documented business need. If you cannot explain why a runner needs it, remove it.

Verification steps you can run without simulating abuse

You do not need a live attack simulation to learn whether your CI/CD boundary is leaky.

Map every trust relationship from runner to internal network

Start with a simple inventory:

what image does the runner boot from?
what identity does it run under?
what secrets are mounted into jobs?
what internal subnets can it reach?
what admin or deployment endpoints are reachable?
what is left behind after job completion?

If you want a working review template, this is enough to begin:

Question	Evidence to collect	Why it matters
Does the runner need domain access?	network policy, auth logs	reduces directory pivot risk
Which secrets are available at runtime?	pipeline config, vault policy	shows blast radius of job compromise
Can the runner talk to admin ports?	firewall and connection logs	reveals lateral movement paths
Do logs contain credentials?	job logs, artifact review	catches accidental secret leakage
Does state persist across jobs?	disk, cache, container layers	determines if compromise survives

Check whether a low-privilege job can reach admin-only resources

Pick one low-privilege job and verify that it cannot reach anything beyond its purpose.

That means confirming it cannot:

open admin shares
query directory services beyond what it needs
hit remote management endpoints
authenticate to unrelated internal systems
access production-only secret paths

If your controls depend on policy documents but the network still allows the traffic, the policy is fiction.

Review whether build logs, caches, or artifacts leak usable credentials

This is one of the cheapest checks you can run, and one of the most common misses.

Look for:

access tokens in logs
private keys in artifacts
stale environment dumps
dependency caches containing config files
debug bundles with secrets or session material

The report’s underlying theme is automation abuse. Build artifacts are often where automation leaves its fingerprints. If those fingerprints include secrets, you have already lost part of the fight.

Incident response priorities if a runner is compromised

If a runner is compromised, treat it as both a host incident and an identity incident.

Contain the agent, rotate credentials, and invalidate session material

First priority: stop the blast radius.

remove the runner from service
isolate the host or scale the pool down
rotate any secrets that were available to the job
invalidate tokens, sessions, and certificates associated with the runner
revoke access to internal tools used by that automation identity

Do not wait for perfect attribution before rotating secrets. If the runner could read them, assume they are exposed.

Preserve job logs, process data, and directory authentication evidence

You want three evidence buckets:

job and pipeline logs
process and host telemetry
identity and directory logs

Preserve:

command history
runner logs
process trees
authentication events
unusual network connections
artifacts from the job workspace and cache

If the attacker used AI-assisted tooling, the interesting artifacts may be in how quickly the environment was enumerated, what scripts were generated, and which identities were touched. That shows up in telemetry more than in a single payload.

Scope for domain tampering, secret exfiltration, and persistence

Once the host is contained, ask three questions:

Did the runner touch domain services or modify directory objects?
Did it read or export any secret material?
Did it create a persistence mechanism outside the runner lifecycle?

That scope should include:

new accounts or group changes
ticket or token misuse
unexpected access to file shares or secret stores
scheduled tasks, services, or startup changes
modified pipeline definitions or supply-chain artifacts

A compromised runner can become a launch point for a bigger incident if you only clean up the host and ignore the identity trail.

Closing the gap between CI/CD convenience and domain safety

The report about AI-assisted AD attacks and EDR evasion is worth reading as a warning about workflow, not just tooling. Attackers are using AI to compress the steps between initial access and lateral movement. That makes build agents even more sensitive, because they already sit close to secrets, identity, and internal reach.

If you want a practical defense strategy, keep it simple:

assume runners will be probed
make runner access narrow and disposable
separate build identity from privileged identity
block east-west movement by default
review exclusions as if they were temporary exceptions, not permanent policy
verify with logs, not assumptions

The convenience of CI/CD is real. So is the risk when that convenience bleeds into the domain. The job is not to make build agents fearless. The job is to make them boring after compromise.