What happens when your AI agent has the keys

You gave your agent access to real code, real databases, and real deployment pipelines. Now what? Lessons from building production agent infrastructure.

Last autumn I watched an agent delete a migration file it had created ten minutes earlier. It had written the migration, run it against a development database, noticed the schema didn't match expectations, and decided the cleanest fix was to drop the file and start over. Reasonable logic. Except it had already committed the migration to a shared branch, and another agent running in a parallel session on the same repository had built three files on top of it. When the first agent force-pushed its "clean" branch, the second agent's work vanished. Nobody was paged. Nobody noticed until morning.

Nothing was lost that couldn't be recovered. But that incident changed how I think about agent access. The question isn't whether to give agents access — that ship sailed, and the productivity gains are real. How do you design boundaries so access doesn't become a liability?

The access paradox

Agents are most useful when they have broad access. An agent that can read your codebase, run tests, query your database, and deploy to staging is wildly more capable than one that can only edit a single file. But broad access creates a surface area that multiplies with every tool you expose. Each tool is an entry point for the agent's intended actions, and also for every edge case, hallucination, and prompt injection the model might produce.

Most teams' instinct is to restrict access. Limit the agent to a sandbox, give it read-only permissions, disable anything that touches production. Safe. Works. Also eliminates most of the value. You end up with an expensive autocomplete that can't actually do the work you hired it to do.

The teams getting the most out of agents take a different approach: give agents real access, then build the containment infrastructure that makes it safe. More work up front, but you get agents that actually ship code.

Containment as a design discipline

When we built Sparks, a multi-agent orchestrator for coding work, the first architectural decision was simple: every agent execution runs in its own container. A real Docker container with a hardened security profile.

The specifics matter. We drop all Linux capabilities by default (CAP_DROP ALL) and add back only the handful the agent actually needs. The root filesystem is read-only — the agent can write to its working directory, but it can't modify system binaries, install packages outside its workspace, or tamper with its own runtime. We set hard limits on memory and process count, so a runaway loop or recursive tool call hits a ceiling instead of consuming the host. Network access is restricted to an allowlist: the agent can reach your repository host and your LLM provider, but it can't make arbitrary HTTP requests to internal services.

None of this is novel. Basic container security, the kind of thing any ops team would apply to an untrusted workload. Here's the part that surprises people, though: most agent frameworks treat the agent as a trusted process. They run it on the developer's machine, with the developer's credentials, in the developer's shell. The agent inherits every permission the developer has. SSH keys, cloud credentials, database access, production deployment tokens. If the model hallucinates a destructive command, the only thing between that hallucination and reality is whatever confirmation prompt the framework happens to show.

Most agent frameworks treat the agent as a trusted process. That assumption is the single largest unexamined risk in the AI tooling ecosystem.

Your tool handlers are an attack surface

There's a subtler risk that even container isolation doesn't fully address: the tool handler layer. When an agent calls a tool — say, a function that reads a file — the model supplies the arguments. Those arguments are generated text, not validated input. If the model is asked to read config/database.yml and instead produces ../../../etc/passwd, your tool handler is the last line of defence.

Not theoretical. Path traversal, SSRF (where the agent gets tricked into making requests to internal services), and sensitive file exposure are all documented patterns in agent security research. In Sparks, every tool input passes through a validation layer that blocks path traversal sequences, checks URLs against an allowlist, and rejects requests for files matching sensitive patterns like private keys, environment files, and credential stores. The scanner runs in two modes: flag (log and allow, for development) and block (reject, for production).

Look, the mindset shift is straightforward: treat every tool handler as if it were an API endpoint exposed to the public internet. Because from a security perspective, it is. The model generating those inputs isn't under your control. It's a probabilistic system that will, given enough time, produce every possible combination of arguments. Including the adversarial ones.

If you can't see it, it didn't happen

The migration incident had a second failure mode beyond the code loss: we didn't know it had happened. The agents logged their actions, but the logs were per-session. Each agent had its own trace, nobody was aggregating across sessions, and the conflict stayed invisible until a human read the git history the next morning.

That changed how we think about observability. Sparks now emits a structured event stream with roughly twenty event types, from routine heartbeats and tool invocations to mood changes (when the agent's confidence shifts mid-task) and explicit escalation signals. Every event is timestamped, tagged with the agent's identity, and written to a shared store. A Telegram integration lets you review sessions in real time, replay what an agent did and why, and set alerts for patterns like repeated failures, unexpected tool calls, or confidence drops.

The purpose isn't surveillance. It's legibility. When an agent does something surprising, you need to reconstruct the chain of reasoning that got it there. Without that, you're operating on faith. Faith isn't an engineering strategy.

The uncomfortable middle

There's a temptation to treat this as solved — lock everything down, add logging, move on. The reality is messier. Every constraint you add to an agent reduces its capability. Every permission you grant increases your risk surface. The real work isn't finding a set of rules and enforcing them forever. It's continuously calibrating the boundary between useful and safe, for each agent and each repository, adjusted by risk tier.

In Sparks, we use what we call KPI-driven routing. Historical success and rollback rates per repository determine how much autonomy each agent gets. An agent with a strong track record on a stable repository earns broader permissions. One working on a fragile codebase with a history of rollbacks gets more constraints and more human checkpoints. It adapts based on what actually happens.

I don't think this is the final answer. Honestly, I think the real answer involves better tooling, better model alignment, and better standards across the whole ecosystem. But right now, the teams building production agent infrastructure are making judgment calls every week about where to draw lines. The quality of those calls separates teams that ship from teams that stay stuck in the sandbox.

The goal isn't making agents safe by making them useless. It's making them safe enough to be genuinely useful, while staying honest about the gap.

If you're building agent systems today, the highest-value move isn't adding another feature. Build the infrastructure that tells you exactly what your agents did, why they did it, and what you can do when they're wrong. Access without observability is just risk. Access with containment, validation, and legibility is how you get actual work done.