
I Built a Code Agent with DeepSeek V4: Handling 1M Token Context in Practice
Introduction
I started playing with code agents months ago, but DeepSeek V4's 1M-token context window finally let me feed entire repos to a model without splitting them into pieces. When the V4 preview landed on April 24, 2026, I built a real agent from scratch to see how it held up under actual coding tasks, not just benchmark numbers.
Why DeepSeek V4 for a Code Agent?
1M Token Context: Not Just a Bigger Window
A bigger window only helps if the model can stay sharp while using it. DeepSeek V4 leans on token-wise compression and DeepSeek Sparse Attention (DSA), so you can hand it a 100k-token codebase and it doesn't drown in compute. In my tests, attention latency stayed manageable even when I pushed context past 700k tokens.
Agentic Coding Benchmarks Matter
Before I built anything, the numbers were already out: DeepSeek-V4-Pro leads open-weight models on agentic coding benchmarks. It holds its own against top closed-source models for tasks like generating multi-file patches, reasoning about large repos, and producing valid pull requests. That was enough to trust it as the brain of the agent.
Setting Up the Agent Architecture
Model Selection and API Integration
I used deepseek-v4-pro through the OpenAI-compatible endpoint. Switching was trivial — same base URL, just a different model name. I stuck with non-thinking mode for most tool calls to avoid token waste, only flipping into thinking mode for tangled debugging sessions.
const response = await fetch('https://api.deepseek.com/v1/chat/completions', {
method: 'POST',
headers: { 'Authorization': 'Bearer sk-...', 'Content-Type': 'application/json' },
body: JSON.stringify({
model: 'deepseek-v4-pro',
messages: messages,
tools: tools,
max_tokens: 4096
})
});Tool Design: File System, Terminal, and Browser
The agent needed three tools: read/write files, shell commands, and a browser for documentation lookups. I kept the tool schemas minimal, adding extra parameters only when the model started making up arguments. It handled multi-step chains well, often running 5–6 calls in a row without losing the thread.
Prompt Engineering for Long-Context Coding
For a 1M window, I didn't just dump everything in at once. I fed the agent a repo summary and a file tree, then selectively inserted relevant files based on what it asked for. The system prompt told it to be conservative with context and request specific files instead of guessing.
Real-World Performance Testing
Repository-Wide Refactors
I threw a 14k-file TypeScript monorepo at it with a request to migrate from require to import. The agent located affected files, proposed edits, and ran tests. It even noticed circular dependency pitfalls and paused to ask before breaking the build.
Debugging Across Thousands of Files
A production bug in a NestJS app needed tracing a single malformed SQL query through eight layers of abstraction. With the full repo loaded, the model found the injection point and proposed a fix in one shot. That session used about 800k tokens — impossible with a smaller context window.
Token Usage and Cost Observations
DeepSeek-V4-Pro pricing is aggressive. One refactor session burned 1.2M input tokens and cost under $0.10. Token-wise compression sweetens the deal further: you're billed for raw input, but the model processes a compressed representation, so effective cost drops.
Handling the 1M Token Window in Practice
Prompt Compression and Caching
I built a compression layer that used the model's own summarization to condense call histories. Paired with DeepSeek's context caching API, repeated tool-call loops became cheaper. Hit rates for static repository content stayed above 80%.
When Context Window Isn't the Bottleneck
Even with 1M tokens, the real limit was the agent's reasoning stability. Long tool chains occasionally confused the model, making it retry failed actions. I added a hard cap of 15 tool calls per turn and a recovery prompt that told it to “stop and report what you've tried.”
Challenges and Limitations
The agent isn't flawless. It stumbles on highly dynamic browser automation states and occasionally invents file paths. Filling the context window all the way also makes responses sluggish, so I learned to keep active context under 500k tokens for reasonable response times. The Flash variant (DeepSeek-V4-Flash) is faster but noticeably weaker on complex planning tasks, so I reserve it for simple file lookups.
Is It Worth Switching?
If your code agent has to handle large repos, yes. The 1M context alone eliminates the need for elaborate RAG pipelines in many cases. For simpler tasks, V3.2 might still cut it, but V4's agentic improvements make it the better default for any tool-using agent.
Conclusion
DeepSeek V4 isn't just a bigger model — it's a practical enabler for coding agents. The 1M window works, the attention is efficient, and the pricing doesn't punish heavy usage. I'm keeping it as my agent's brain.


