AI Agents Create Duplicate Code: The Context Window Problem
AI agents create duplicate code not because they cannot write good abstractions, but because they cannot see the entire codebase at once. When an agent is asked to add a date formatting helper, it generates one — even if a nearly identical formatDisplayDate() already exists in a file three directories away that never entered the agent’s context window. Multiply this across hundreds of agent sessions on a large codebase, and you get a specific, measurable form of technical debt: dozens of near-identical utility functions, validation helpers, and data transformation snippets, each slightly different, each working, and each adding to a codebase that becomes harder to navigate with every agent session. This page covers why this happens structurally, how to detect it before it accumulates, and the workflow changes that prevent it at the source.
Applies to any codebase where AI coding agents — Claude Code, Cursor, or similar tools — make changes across multiple files and sessions, in Python, JavaScript, TypeScript, Go, Java, Kotlin, or Rust codebases alike.
TL;DR
- AI agents generate new code based on what is visible in their current context window — they have no built-in mechanism to search the entire codebase for existing equivalents before writing something new
- The result is near-duplicate functions: same purpose, slightly different implementation, different names, scattered across files — each individually correct, collectively a maintenance problem
- Traditional duplicate code detectors (jscpd, PMD CPD) catch exact or near-exact text duplication but miss semantically equivalent code with different variable names, structure, or language idioms
- The fix is not “tell the agent to be more careful” — it is giving the agent a way to search for existing equivalents before generating new code
- This is a new failure mode distinct from classic DRY violations — it scales with the number of agent sessions, not with the number of human developers
- Codebases in Python and JavaScript show this pattern more visibly due to looser typing and naming conventions that make near-duplicates harder to spot at a glance than in statically-typed languages like Rust or Kotlin
AI Agents Create Duplicate Code: Why Context Window Limits Cause It
AI agents create duplicate code because their context window — the portion of the codebase they can “see” during a given task — is a small fraction of most real codebases. When an agent is given a task like “add validation for the email field in the signup form,” it loads the relevant files for that task: the signup form component, maybe a validation utilities file if one is referenced nearby. It does not load every file in the project to check whether an email validation function already exists somewhere unrelated to the signup flow.
If no email validator is visible in the loaded context, the agent writes one. This is not a mistake in any individual instance — writing a validator when none is visible is the correct response to the visible information. The problem compounds across sessions: a different task touching a different part of the codebase, with a different context window, may also need email validation, also not see an existing validator, and also write one. Neither agent session did anything wrong in isolation. The duplication is an emergent property of many context-limited sessions operating on the same codebase over time.
# Conceptual — same validation logic, written independently by two agent sessions
# Session 1 (in src/features/signup/validation.py):
def is_valid_email(email: str) -> bool:
return "@" in email and "." in email.split("@")[1]
# Session 2, weeks later (in src/features/billing/utils.py):
def validate_email_format(value: str) -> bool:
parts = value.split("@")
return len(parts) == 2 and "." in parts[1]
# Both functions do the same thing. Neither session saw the other.
# Both are "correct" in isolation. The codebase now has two sources of truth
# for what counts as a valid email - and they will drift further apart
# the next time either one is edited.
Without a way to discover the first function before writing the second, every future change to email validation logic now has to be applied in two places — or, more realistically, gets applied in one place while the other silently continues using the old logic, creating an inconsistency that only surfaces when a user’s email passes validation in one part of the application and fails in another.
Why Does My AI Coding Assistant Keep Writing Similar Functions?
Because each task you give it is processed with whatever context is relevant to that specific task — and “relevant” is determined by file proximity, explicit references, and search results for the immediate task, not by an exhaustive search for every function that does something similar anywhere in the project. A coding assistant writing a CSV export function for a reports module has no default mechanism to discover that a different module already has a working CSV export function, unless that function happens to be imported nearby, referenced in the task description, or surfaced by a search query the agent happens to run. The assistant is not failing to look — it is looking within the scope it was given, and that scope is necessarily incomplete for a large codebase.
Does This Happen More in Python and JavaScript Than in Rust or Kotlin?
The underlying mechanism — context-limited generation — happens in every language. But the visibility of the resulting duplication differs. In Rust or Kotlin, a near-duplicate function often has a noticeably different type signature, which makes it stand out during code review or when searching by type. In Python and JavaScript, where types are often inferred or absent, two functions that do the same thing with different parameter names and slightly different implementations can look unrelated at a glance, and tooling that searches by signature is less effective. This does not mean duplication is rarer in typed languages — it means it is somewhat easier to notice once it exists, which is a detection advantage, not a prevention one.
Kafka Data Mapping and Schema Evolution Patterns That Don't Break at 2 AM It's always a "minor" change. A producer team renames a field, adds a required attribute, or—my personal favorite—decides that user_id should now...
DRY Principle Violations in AI Generated Code: The New Failure Mode
The DRY principle — Don’t Repeat Yourself — was originally framed around human discipline: a developer notices they are about to copy-paste a block of logic and chooses to extract it into a shared function instead. AI-generated DRY violations follow a different pattern. The agent is not copy-pasting — it is independently generating logic that happens to overlap with logic generated independently in a different session. There is no copy-paste moment where a human could catch it, because there was never a single moment where both versions existed in the same context.
This means classic DRY guidance — “if you’re about to copy this, extract it first” — does not apply, because the duplication was never a copy. The two implementations were each written as if they were the first and only implementation of that logic. The violation only becomes visible after the fact, when someone reads both files, or when a bug fix in one location does not propagate to the other and a user-facing inconsistency results.
# Two independently-generated retry wrappers, different languages, same project
// JavaScript - src/api/client.js
async function fetchWithRetry(url, maxAttempts = 3) {
for (let i = 0; i < maxAttempts; i++) {
try { return await fetch(url); }
catch (e) { if (i === maxAttempts - 1) throw e; }
}
}
// Go - internal/client/http.go
func FetchWithRetry(url string, maxAttempts int) (*http.Response, error) {
var lastErr error
for i := 0; i < maxAttempts; i++ {
resp, err := http.Get(url)
if err == nil { return resp, nil }
lastErr = err
}
return nil, lastErr
}
Without a shared retry policy, these two implementations will diverge the first time someone needs to add exponential backoff — one team adds it to the JavaScript client, the Go client continues retrying immediately with no delay, and the two services now behave differently under the exact same upstream failure, with no error message indicating why.
AI Generated Code DRY Violations: Why They Are Harder to Catch in Review
A human-introduced DRY violation usually happens within a single pull request — a developer copies a block, and a reviewer who has recently looked at the original code may recognize the similarity. An AI-generated DRY violation often spans pull requests that are weeks apart, reviewed by different people, neither of whom has the other implementation in their recent working memory. The similarity is invisible at review time not because reviewers are careless, but because the two pieces of code were never presented side by side to anyone — human or AI — at any point in their creation.
How Many Similar Functions Before You Should Refactor?
Two near-identical functions performing the same logical operation is enough to warrant consolidation — waiting for a third occurrence (the traditional “rule of three” for extraction) assumes the duplication was intentional copy-paste that a developer can evaluate holistically. With AI-generated duplication, by the time you have three independently-generated versions, you likely also have three sets of callers depending on three slightly different behaviors — edge case handling that differs, parameter defaults that differ, error handling that differs. Consolidating two versions before a third accumulates is meaningfully cheaper than consolidating three, because the behavioral drift between two versions is smaller and easier to reconcile than between three.
How to Detect Duplicate Code in an AI-Assisted Codebase
Traditional duplicate code detection tools — jscpd for JavaScript and TypeScript, PMD’s CPD for Java and Kotlin, similarity-rs style tools for Rust — work by comparing token sequences or abstract syntax trees for structural similarity. These tools catch the case where the duplication is structurally close: same control flow, same operations, different variable names. This catches a meaningful fraction of AI-generated duplication, because agents often produce structurally similar code for structurally similar problems.
What these tools miss is semantic duplication — two functions that achieve the same result through different logic. is_valid_email checking for “@” and “.” is structurally different from a version using a regex pattern, even though both validate the same thing. Token-based and AST-based tools see these as unrelated. Catching this category requires a different approach: comparing function names, docstrings, and behavior on the same inputs rather than comparing implementation structure.
# Running jscpd to catch structurally similar duplication first # This catches the cases where two agent sessions produced # near-identical implementations - the easiest category to fix npx jscpd ./src --min-lines 5 --min-tokens 50 --format "javascript,typescript" # Output flags blocks of 5+ lines with 50+ matching tokens across files - # review these first, since structural similarity strongly correlates # with "these should be the same function"
Without running structural duplication detection regularly — not just once — newly agent-generated duplicates accumulate between scans. A codebase scanned once at the start of a project and never again will show zero duplication in its report while silently accumulating new near-duplicates with every subsequent agent session, because the report reflects the state of the codebase at scan time, not its ongoing trajectory.
Claude Code Creates Duplicate Utility: What to Check First
When you notice a coding agent has created a utility function, the first check is a project-wide search for the function’s purpose, not its name — search for the operation (formatting a date, validating an email, retrying a request) rather than the specific function name the agent chose, since two implementations of the same operation are very unlikely to share a name. Search by what the function does, expressed in plain language, across the whole project — not just the directory the agent was working in. If a match exists, the question is not “which one is correct” but “which one has more callers and more test coverage” — that version becomes the canonical one, and the newly created duplicate gets replaced with a call to it before it accumulates its own callers.
Managing Complexity in Modern Software Design Overengineering in software often begins with the noble intent of future-proofing, yet it frequently results in accidental complexity that stifles team velocity. This article explores the transition from clean...
AI Agent Codebase Awareness: Why Context Limits Are Structural, Not a Bug
It is tempting to treat this as something that will be solved entirely by larger context windows — if the agent could see the whole codebase, it would not duplicate anything. Larger context windows help, but they do not fully solve this for codebases that exceed even a very large context window, which includes most production codebases beyond a small project. The structural fix is not “fit more code into context” — it is “give the agent a search mechanism that can find relevant existing code regardless of context size,” which is a retrieval problem, not a context size problem. A codebase with a million lines will never fully fit in any context window; what matters is whether the agent can retrieve the right ten lines from that million when it needs them.
Semantic Code Search Before Generation: Preventing Duplication at the Source
The most effective prevention is making “search for an existing implementation” a step that happens before code generation, not a cleanup step that happens after. This means the agent’s workflow for “add an email validator” should include a search step — querying the codebase for functions related to email validation — before the generation step, with the search results included in the context that informs whether to write new code or call existing code.
This is the same problem RAG (retrieval-augmented generation) systems solve for question-answering — retrieve relevant context before generating a response, rather than generating from a fixed context alone. Applied to code generation, the retrieval target is not documentation but the codebase itself: an embedding-based search over function signatures, names, and docstrings that can surface “this function already validates emails” even when the existing function’s name does not contain the word “email” — for example, a function named checkContactField that happens to validate email format internally.
# Conceptual workflow change - search before generation, not after
# WITHOUT semantic search step:
# task: "add email validation to the signup form"
# -> agent loads signup form file + immediate imports
# -> no validator visible -> agent writes new validator
# WITH semantic search step:
# task: "add email validation to the signup form"
# -> agent runs: search_codebase("email validation OR format check")
# -> search returns: src/shared/validators.py: is_valid_email()
# (even though this file was not in the original context)
# -> agent calls existing function instead of writing a new one
Without the search step, the agent’s decision of “write new vs. reuse existing” is made with zero information about what exists outside its current context — it cannot choose to reuse something it does not know exists. The search step does not need to be perfect; even a search that finds the right function 60-70% of the time meaningfully reduces the rate of new duplication compared to a workflow with no search step at all.
Tools and Configuration for Codebase-Aware Agent Workflows
Some agentic coding tools support project-level configuration files that point the agent toward canonical locations for shared utilities — a convention like “all validation logic lives in src/shared/validators/, search there first” gives the agent a starting point that does not depend on semantic search infrastructure. This is lower-effort than building embedding-based codebase search and captures a meaningful fraction of the benefit for codebases with reasonably consistent organization. For codebases without consistent organization, this convention-based approach has limited value — which is itself useful information, since it means inconsistent organization is a prerequisite that worsens this entire problem and is worth addressing independently.
How TypeScript and Java Type Signatures Help Search
In statically-typed languages, searching for “a function that takes a string and returns a boolean, related to validation” is a more constrained and more effective query than the equivalent search in a dynamically-typed language, where the same logical search has to rely entirely on names and docstrings since type signatures provide no narrowing. This is a practical argument for adding type annotations to shared utility code even in codebases that do not enforce typing everywhere — type-annotated utility functions are more discoverable by both search tooling and by agents reading code, because the signature itself communicates intent that an untyped signature does not.
AI Coding Agent Workflow Changes That Reduce Duplication
Beyond search-before-generation, three workflow changes reduce the rate of AI-generated duplication without requiring new tooling.
Maintain a single, discoverable location for shared utilities. A project convention of “general-purpose helpers live in one well-known directory, with descriptive names and docstrings” gives both agents and humans a first place to look. This sounds basic, but its absence is one of the strongest predictors of duplication — if shared logic is scattered across feature directories with no canonical home, every search for “does this already exist” returns nothing useful, and every agent session defaults to writing new code.
Review newly-created utility functions for name-based search hits before merging. Before merging a PR that introduces a new utility function, search the codebase for its purpose using different terminology than the function’s own name — if the agent named it parseDate, search for “format”, “datetime”, “timestamp” as well. This single check, performed at PR review time, catches a meaningful fraction of duplication before it merges, at a point where consolidation is cheapest.
Anti-Patterns That Silently Destroy Your Codebase Most codebases don't collapse overnight — they degrade through small design decisions. Common coding anti-patterns are recurring software development mistakes that look correct in isolation but lead to maintainability...
Periodically run structural duplication detection as a scheduled check, not a one-time audit. Run jscpd, PMD CPD, or the equivalent for your language stack on a schedule — weekly or per-release — and track the trend in duplication count over time, not just the absolute number. A codebase using AI agents heavily without any search-before-generation step will show this number increasing steadily; a codebase with the workflow changes above should show it stabilizing or decreasing.
# Tracking duplication trend over time, not just a single snapshot # Run on a schedule (CI weekly job), append to a log file npx jscpd ./src --format "javascript,typescript,python" --reporters json --output ./duplication-reports/$(date +%Y-%m-%d) # Compare against the previous week's report - # a growing duplicate count over consecutive weeks indicates # the codebase needs a search-before-generation step, # regardless of which agent or model produced the duplicates
Without tracking the trend, a single duplication report is just a number with no context — 47 duplicate blocks could be stable, improving, or worsening, and only the trend tells you whether your current workflow is containing the problem or whether it is accumulating faster than it is being addressed.
FAQ: AI Agents and Duplicate Code
Why do AI coding agents create duplicate code?
AI agents generate code based on their current context window, which is a subset of the full codebase. If an existing implementation of similar logic is not visible in that context, the agent writes a new one — not as an error, but as the correct response to the information it has. Across many sessions on a large codebase, this produces multiple independent implementations of the same logic, each individually reasonable but collectively redundant.
How is AI-generated code duplication different from regular DRY violations?
Regular DRY violations typically originate from copy-paste, where a developer had both versions visible and chose not to extract a shared function. AI-generated duplication originates from independent generation — the duplicate implementations were never visible to the same session at the same time, so there was no copy-paste moment for anyone to catch. This makes it harder to detect in code review, since the two versions are rarely reviewed close together in time.
Can duplicate code detection tools catch AI-generated duplication?
Tools like jscpd and PMD CPD catch structural duplication — code with similar token sequences or AST structure, even with different variable names. This covers a meaningful share of AI-generated duplication, since agents often produce structurally similar code for structurally similar tasks. They miss semantic duplication — different implementations achieving the same result through different logic, such as one email validator using string checks and another using a regex. Catching semantic duplication requires comparing function purpose, not implementation structure.
How do I prevent AI agents from writing duplicate functions?
Add a search-before-generation step to the agent’s workflow — before writing a new function, the agent searches the codebase for existing implementations of similar logic, using the operation being performed rather than guessing a function name. This mirrors how retrieval-augmented generation works for question-answering, applied to code: retrieve relevant existing code before generating new code. Even an imperfect search step meaningfully reduces new duplication compared to no search step.
How many duplicate functions before I should consolidate them?
Two is enough — do not wait for a third occurrence. With AI-generated duplication, each additional independently-written version tends to accumulate its own callers and its own slight behavioral differences (different edge case handling, different defaults). Consolidating two versions, before a third accumulates its own dependents, is meaningfully cheaper than consolidating three versions that have each drifted independently.
Does this problem get worse with larger codebases?
Yes, and not linearly. The probability that any given agent session’s context window contains an existing relevant implementation decreases as the codebase grows, because the context window is a roughly fixed size while the codebase grows around it. A small project might fit entirely in context, making duplication rare. A large codebase guarantees the agent sees only a small fraction of existing code on any given task, making duplication structurally more likely with every additional file the codebase grows by.
Does adding type annotations help reduce AI-generated duplication?
Indirectly, yes. Type signatures make shared utility functions more discoverable — both by search tooling that can filter by signature shape, and by agents reading code, since a typed signature communicates intent more explicitly than an untyped one. This does not prevent duplication on its own, but it makes existing implementations easier to find during a search-before-generation step, which is the actual prevention mechanism. Codebases in TypeScript or Java have a search advantage over equivalent JavaScript or Python codebases for this reason specifically.
Should I refactor existing AI-generated duplicates immediately when I find them?
Not always immediately, but flag them for consolidation before the next related change touches either version. If both versions are stable and unlikely to be modified soon, consolidating can wait without cost. If either version is likely to be touched again — a feature actively under development — consolidate before that next change, because making the same fix in two places (or, more likely, making it in one and forgetting the other) is exactly the scenario that turns silent duplication into a user-facing inconsistency.
Written by: