The question nobody asks before changing code with AI

AI agents refactor confidently and fast. They also have zero institutional knowledge about what their changes will break. Here's how to fix that.

A developer I work with asked Claude to clean up a utility module. The module had grown over two years: a dozen functions, some redundant, a few with signatures that no longer matched how they were actually called. Claude refactored it, simplified the interfaces, removed dead code and wrote tests for every surviving function. All tests passed. The PR looked clean. Merged on a Friday.

On Monday, a downstream service started throwing type errors. One of the "dead" functions Claude had removed was imported by a service in a separate repository — one Claude had never seen and had no reason to know about. The function was called exactly once, in an error recovery path that only triggered under specific failure conditions. No test in the main repository exercised that path. The connection between these two files existed only in the minds of two engineers who'd both since left the company.

What else depends on this, and how do I know? That's the question nobody asks before handing code changes to AI.

The knowledge gap

A human developer who's been on a team for a year carries a mental map of the codebase. Not a complete map, but a map of the dangerous places. They know the billing module is touchy. They know the config parser is used by fourteen services. They know the error types in core/errors.rs are part of the public contract even though nobody formally documented them as such. None of this is in any file. It's institutional memory, built up through experience, incident retrospectives and hallway conversation.

An AI agent has none of it. The agent sees code as it exists right now, in the files it can access. It can't know that the function it wants to remove is called by a different repository. It can't know that the last time someone changed the signature of parse_config, three teams spent a day fixing downstream breakage. It can't know that the file it considers trivial is, in practice, the most dangerous file in the codebase.

This isn't about model intelligence. It's about context. And the problem compounds as codebases grow, as organisations scale, and as implicit dependencies multiply.

Two layers of understanding

When I started building Cartographer, a tool for understanding codebase structure and history, the core insight was simple: you need two distinct layers of information to predict the impact of a code change.

The first layer is structural: what depends on what? Parse the source code, extract the import graph, and you get a directed dependency map. Function A calls function B. Module X imports module Y. This is what an IDE's "find references" gives you, and it's genuinely useful. But import graphs only capture explicit, compile-time dependencies. They miss runtime dependencies, configuration-driven relationships and cross-repository contracts.

The second layer is dynamic: what actually changes together? This comes from Adam Tornhill's work on behavioural code analysis. Mine the git history and you find files consistently modified in the same commits, even when they share no structural dependency. These co-change patterns reveal coupling the code itself doesn't express. Two files that always change together are coupled, whether or not one imports the other. A file that changes frequently and correlates with bug-fix commits is a hotspot, a source of instability that deserves extra scrutiny when anything nearby gets modified.

Cartographer builds both layers. It uses tree-sitter to parse source code into a dependency graph, backed by petgraph for traversal and SQLite for persistence. Then it uses git2 to mine commit history for co-change frequency, blame-based ownership (who actually maintains each file) and change velocity. The result is a queryable model: "If I change this file, what else is likely to break?" It draws on both what the code declares and what the history reveals.

The code tells you what depends on what. The git history tells you what actually changes together. You need both to understand the real blast radius of a change.

Why this matters more with AI

Human developers are slow. That's usually framed as a problem, but slowness creates checkpoints. A human about to refactor a core module will pause, ask a colleague, scan related files, or remember the last time this area caused trouble. These informal checkpoints are impact analysis, even if nobody calls them that.

AI agents don't pause. They move fast, they move confidently, and they don't get the nagging feeling that a file might be more important than it looks. An agent refactoring a utility module will do it cleanly and thoroughly. That's exactly the problem. It'll remove every unused function, simplify every interface, standardise every naming convention — all without the caution a human developer would apply to a file they know is load-bearing.

I'm not arguing against using AI for refactoring. I'm arguing for giving the AI, or the human reviewing its work, the information needed to understand consequences. If the agent knows that error_types.rs has fourteen downstream dependents across three repositories, it'll approach the refactoring differently. If the reviewer can see the file being modified has a co-change relationship with the billing service, they'll review the PR with more care.

Making the invisible visible

Cartographer exposes its analysis through a CLI and an MCP server. The CLI offers seven commands: index to build the model, blast-radius to show affected files, hotspots to find high-change-frequency code, co-changes to reveal hidden coupling, who-owns to identify maintainers, deps to visualise the dependency graph, and serve to run the MCP server. The MCP server gives AI agents five tools they can call directly, making codebase understanding part of the agent's own workflow.

In practice, blast-radius is the most useful command. Point it at a file you're about to change and it returns every file that could be affected: direct dependents from the structural graph and co-change partners from the history. When we ran it against ripgrep as a test case, it mapped 100 files and 69 import edges, surfacing relationships invisible from any single file's perspective.

Scope is currently limited to Rust, with Python and TypeScript planned. The git analysis layer is language-agnostic; co-change patterns and ownership work on any repository. The structural parser is the bottleneck. Extending it to new languages means writing new tree-sitter grammars — straightforward, but not free.

The broader point

I don't want to overstate what a tool like this can do. It won't replace human judgement. It won't catch every possible interaction. It doesn't know about verbal agreements, Slack messages, or architectural decisions made in a meeting room and never written down. Software has a social layer no static analysis can fully capture.

But here's what it does: it surfaces the structural and historical information that already exists in your codebase, information most teams never look at. The dependency graph is there but nobody queries it. The co-change patterns are there but nobody mines them. The ownership data is there but nobody aggregates it. All of it exists. It just isn't accessible at the moment it would matter most — the moment before a change gets made.

As AI agents write more code, change velocity will keep climbing. How will your team answer "what will this affect?" before merging? The teams that invest in answering that question systematically will ship faster and break less. Everyone else will discover their blast radius the hard way, one Monday morning at a time.