Cursor IDE: Building a Custom Review Agent with Rules and Memory

AI Usage (88%)

Why a review agent in Cursor IDE is worth building

I started using Cursor like a lightweight review station instead of just an autocomplete tool. That change matters when you want a repeatable second pass on diffs, not a one-off chat reply.

A custom review agent is useful when your team keeps missing the same things:

unsafe assumptions in backend handlers
missing tests around edge cases
sloppy error handling
client code that looks fine but hides a data-flow bug

The goal is not to let the agent decide anything. The goal is to make it inspect code the same way every time, with the same scope and output shape.

What Cursor rules and memory can actually control

Cursor gives you two different levers:

Rules: stable instructions for how the agent should behave
Memory: project-specific facts the agent should remember across sessions

Rules for repeatable review behavior

Rules are where you pin the review contract. I use them for things like:

always review only staged diffs or explicitly provided files
prefer concrete findings over generic advice
separate correctness, security, and test coverage
never rewrite code unless asked
cite exact file paths and line ranges when possible

That keeps the agent from drifting into vague but polished commentary.

Memory for project-specific context

Memory is where you store the boring truths the agent should not relearn every time:

the app uses JWTs in cookies, not local storage
admin routes must be checked server-side
src/lib/api.ts is the only approved network wrapper
tests run with Vitest, not Jest

Use memory for stable context, not active debugging notes. If you start stuffing every incident into memory, the agent gets noisy and brittle.

Designing the review workflow

The workflow should be boring on purpose.

Scope the agent to safe, read-only review tasks

Keep the agent on rails:

review code, do not edit code
inspect diffs, do not browse the whole repo unless needed
summarize findings, do not propose speculative rewrites
flag risk, do not trigger actions or external tools

That makes the agent useful in a code review loop without turning it into an unbounded assistant.

Define the inputs, outputs, and failure modes

A review agent works better when you are explicit about structure.

Part	What to provide	Why it matters
Input	diff, file list, or pasted patch	keeps the review bounded
Output	findings, severity, evidence, fix suggestion	avoids vague commentary
Failure mode	“no issues found” must still explain what was checked	prevents lazy acceptance

If you do not define the output shape, the agent will usually invent one.

Building the custom review agent

Write the rule set

A good rule file is short and strict. Mine usually reads like this:

You are a code review agent.

Review only the provided diff or files.
Focus on correctness, security, test coverage, maintainability, and hidden assumptions.
Do not rewrite code unless explicitly asked.
Do not mention style issues unless they block understanding or correctness.
For each issue, include:
- severity
- file and line reference
- why it matters
- minimal fix suggestion

If there are no issues, say what you checked and why it looks sound.

That is enough to get a useful first pass without turning the agent into a chatty consultant.

Seed useful memory without overfitting

Good memory entries are short and durable:

“API authorization is enforced in route handlers, not in the client.”
“Feature flags live in config/flags.ts.”
“Test fixtures should not call real network services.”

Bad memory entries are temporary or too specific:

“Yesterday's bug was in checkout.ts”
“Always mention the dogfood branch”
“This one endpoint is probably broken”

If you encode one bug too literally, the agent may start pattern-matching the wrong lesson into future reviews.

Test the agent on real diffs

Do not validate the agent with a toy example. Use a few real diffs with known outcomes:

one small refactor with no issues
one bug fix with a hidden regression
one auth-sensitive change
one test-only patch

Check whether the agent catches the real risk and ignores noise. The best signal is not “it found something,” but “it found the right thing for the right reason.”

Example review prompts and review output format

A simple prompt template is enough:

Review this diff as a code reviewer.

Look for correctness, security, missing tests, and broken assumptions.
Only report issues supported by the diff.
Return findings in this format:

- severity:
- file:
- evidence:
- impact:
- fix:

If nothing stands out, explain what you checked.

A good output is compact and actionable:

- severity: high
  file: src/server/payments.ts:42-68
  evidence: the handler trusts `userId` from the request body
  impact: a caller could act on another account if auth is not rechecked
  fix: derive the user from the session and compare ownership on the server

That format is better than a long paragraph because it is easy to scan in a review thread.

Common mistakes when using agent memory

The biggest mistake is treating memory like a notebook.

Do not store transient incidents.
Do not store secrets.
Do not store review verdicts that depend on one branch.
Do not store contradictory rules in multiple places.

Another mistake is letting memory replace the diff. The agent still needs the code in front of it. Memory should bias the review, not impersonate a permanent codebase map.

How to keep the agent useful over time

Review agents drift unless you maintain them.

I usually do three things:

prune stale memory entries every so often
keep rules short enough to read in one pass
compare agent findings against human review outcomes

If the agent starts missing the same class of bug, update the rule set with that failure mode. If it starts nagging about irrelevant issues, tighten the scope.

The useful version of this setup is not magical. It is just consistent. Cursor rules give the agent a review posture, memory gives it project context, and your test diffs tell you whether it is actually helping.

Conclusion

A custom Cursor review agent works when you constrain it. Give it a narrow job, a stable output format, and memory that reflects how your project actually behaves. That is enough to turn it from a generic chat box into a repeatable review tool that catches real mistakes without inventing new ones.