
Cursor IDE: Building a Custom Review Agent with Rules and Memory
Why a review agent in Cursor IDE is worth building
I started using Cursor like a lightweight review station instead of just an autocomplete tool. That change matters when you want a repeatable second pass on diffs, not a one-off chat reply.
A custom review agent is useful when your team keeps missing the same things:
- unsafe assumptions in backend handlers
- missing tests around edge cases
- sloppy error handling
- client code that looks fine but hides a data-flow bug
The goal is not to let the agent decide anything. The goal is to make it inspect code the same way every time, with the same scope and output shape.
What Cursor rules and memory can actually control
Cursor gives you two different levers:
- Rules: stable instructions for how the agent should behave
- Memory: project-specific facts the agent should remember across sessions
Rules for repeatable review behavior
Rules are where you pin the review contract. I use them for things like:
- always review only staged diffs or explicitly provided files
- prefer concrete findings over generic advice
- separate correctness, security, and test coverage
- never rewrite code unless asked
- cite exact file paths and line ranges when possible
That keeps the agent from drifting into vague but polished commentary.
Memory for project-specific context
Memory is where you store the boring truths the agent should not relearn every time:
- the app uses JWTs in cookies, not local storage
- admin routes must be checked server-side
src/lib/api.tsis the only approved network wrapper- tests run with Vitest, not Jest
Use memory for stable context, not active debugging notes. If you start stuffing every incident into memory, the agent gets noisy and brittle.
Designing the review workflow
The workflow should be boring on purpose.
Scope the agent to safe, read-only review tasks
Keep the agent on rails:
- review code, do not edit code
- inspect diffs, do not browse the whole repo unless needed
- summarize findings, do not propose speculative rewrites
- flag risk, do not trigger actions or external tools
That makes the agent useful in a code review loop without turning it into an unbounded assistant.
Define the inputs, outputs, and failure modes
A review agent works better when you are explicit about structure.
| Part | What to provide | Why it matters |
|---|---|---|
| Input | diff, file list, or pasted patch | keeps the review bounded |
| Output | findings, severity, evidence, fix suggestion | avoids vague commentary |
| Failure mode | “no issues found” must still explain what was checked | prevents lazy acceptance |
If you do not define the output shape, the agent will usually invent one.
Building the custom review agent
Write the rule set
A good rule file is short and strict. Mine usually reads like this:
You are a code review agent.
Review only the provided diff or files.
Focus on correctness, security, test coverage, maintainability, and hidden assumptions.
Do not rewrite code unless explicitly asked.
Do not mention style issues unless they block understanding or correctness.
For each issue, include:
- severity
- file and line reference
- why it matters
- minimal fix suggestion
If there are no issues, say what you checked and why it looks sound.
That is enough to get a useful first pass without turning the agent into a chatty consultant.
Seed useful memory without overfitting
Good memory entries are short and durable:
- “API authorization is enforced in route handlers, not in the client.”
- “Feature flags live in
config/flags.ts.” - “Test fixtures should not call real network services.”
Bad memory entries are temporary or too specific:
- “Yesterday's bug was in
checkout.ts” - “Always mention the dogfood branch”
- “This one endpoint is probably broken”
If you encode one bug too literally, the agent may start pattern-matching the wrong lesson into future reviews.
Test the agent on real diffs
Do not validate the agent with a toy example. Use a few real diffs with known outcomes:
- one small refactor with no issues
- one bug fix with a hidden regression
- one auth-sensitive change
- one test-only patch
Check whether the agent catches the real risk and ignores noise. The best signal is not “it found something,” but “it found the right thing for the right reason.”
Example review prompts and review output format
A simple prompt template is enough:
Review this diff as a code reviewer.
Look for correctness, security, missing tests, and broken assumptions.
Only report issues supported by the diff.
Return findings in this format:
- severity:
- file:
- evidence:
- impact:
- fix:
If nothing stands out, explain what you checked.
A good output is compact and actionable:
- severity: high
file: src/server/payments.ts:42-68
evidence: the handler trusts `userId` from the request body
impact: a caller could act on another account if auth is not rechecked
fix: derive the user from the session and compare ownership on the server
That format is better than a long paragraph because it is easy to scan in a review thread.
Common mistakes when using agent memory
The biggest mistake is treating memory like a notebook.
- Do not store transient incidents.
- Do not store secrets.
- Do not store review verdicts that depend on one branch.
- Do not store contradictory rules in multiple places.
Another mistake is letting memory replace the diff. The agent still needs the code in front of it. Memory should bias the review, not impersonate a permanent codebase map.
How to keep the agent useful over time
Review agents drift unless you maintain them.
I usually do three things:
- prune stale memory entries every so often
- keep rules short enough to read in one pass
- compare agent findings against human review outcomes
If the agent starts missing the same class of bug, update the rule set with that failure mode. If it starts nagging about irrelevant issues, tighten the scope.
The useful version of this setup is not magical. It is just consistent. Cursor rules give the agent a review posture, memory gives it project context, and your test diffs tell you whether it is actually helping.
Conclusion
A custom Cursor review agent works when you constrain it. Give it a narrow job, a stable output format, and memory that reflects how your project actually behaves. That is enough to turn it from a generic chat box into a repeatable review tool that catches real mistakes without inventing new ones.


