Where vibe coding breaks — Stas Shymanskyi

Vibe coding is a genuine unlock for personal tools and prototypes. For systems that touch production data, send real messages, and make real API calls, it's a different story.

Andrej Karpathy coined it in early 2025: "vibe coding." You describe what you want in natural language, the model writes the code, you glance at the output, and if it feels right you keep going. You don't read every line. You don't fully understand every function. Does the output look correct? Does the app seem to work? Ship it.

Here's where I stand: vibe coding is one of the most important shifts in how software gets made. For personal tools and prototypes, it's genuinely transformative. I've built entire utilities this way. Tools that would've taken me a full day now take forty minutes, because I'm describing intent instead of typing syntax. The barrier between "I wish this existed" and "it exists" has never been lower.

But there's a category of software where vibes aren't enough. And that category is growing fast.

When the system has consequences

The distinction isn't between simple and complex. It's between software that has consequences and software that doesn't. A personal script that renames files on your desktop? No consequences beyond your own machine. A CLI tool that converts data formats? No consequences beyond the terminal. If these break, you notice, you fix them, nobody else is affected. Vibe coding is perfect here.

Agent systems are different. An agent system touches real data: customer records, financial transactions, medical information, internal documents. It makes real API calls to billing services and communication platforms. It sends real messages to customers and external partners. It deploys real code to staging, sometimes to production. The gap between "it seems to work" and "it works correctly under every condition I haven't imagined" is exactly where consequences live.

I've seen a vibe-coded notification agent send duplicate messages to three hundred customers. The deduplication logic had a race condition that only showed up under concurrent load. A vibe-coded data pipeline I reviewed overwrote a day's worth of analytics because the upsert logic silently dropped the WHERE clause on one edge case. And then there was the vibe-coded billing integration that charged a test customer's real credit card because the environment detection code used a string comparison that was case-sensitive in production and case-insensitive in development.

None of these were failures of the model's capability. The model wrote reasonable code in each case. They were failures of the development mode. A mode that prioritises speed and intuition over scrutiny. For personal tools, that tradeoff is excellent. For systems with consequences, it's debt that compounds until it explodes.

What agent systems actually demand

Building production agent systems means holding several domains in your head at once. Most of it is straightforward engineering. But the domains don't naturally overlap, and skipping any one of them creates a hole the others can't cover.

Security. Every tool handler is an attack surface. Path traversal, prompt injection, SSRF, credential exposure. These aren't theoretical risks. They're documented patterns. If you don't understand container isolation and input validation, your agent is a liability.

Observability. Your agent operates autonomously, potentially for hours. Can you see what it did? Every tool call, every decision? When something goes wrong (and it will), the difference between "we can reconstruct what happened" and "we have no idea" is the difference between a fixable incident and one that destroys trust.

State management. Memory across sessions, context within sessions, the ability to resume interrupted work. Cache invalidation and consistency guarantees. Classic engineering problems that don't respond well to vibes.

Error handling at the boundaries. Look, the model will hallucinate. The API will rate-limit you. The tool will return malformed output. Every boundary between your agent and the outside world needs retry logic and graceful degradation. These boundaries fail routinely. The only question is whether your system handles it or your user does.

Product design. When should the agent act on its own? When should it ask for confirmation? What does "I don't know" look like to the end user? How do you communicate confidence levels? These are product decisions that shape the user's trust in the system. You can't vibe them into existence.

Vibe coding is a fantastic way to start building. It's a terrible way to ship something that has access to your customers' data and your company's billing API.

The accidental polymath

Earlier in my career, I wrote about T-shaped professionals: people with deep expertise in one domain and broad competence across several. At the time, it was a conceptual framework. Now I build agent infrastructure, and I experience it as a practical requirement. You can't build production agent systems from a single discipline. You need security thinking, systems engineering, observability practice, and product design instinct all at once, in the same codebase, serving the same user.

Honestly, this isn't a gatekeeping argument. I'm not saying you need ten years of experience in each domain. I'm saying you need enough awareness to recognise when something is missing. Enough about input validation to feel uneasy when there isn't any. Enough about observability to notice when the agent's actions are invisible. You should be able to look at a bare try/catch and know it isn't a strategy.

Vibe coding bypasses this cross-domain awareness. It lets you move fast precisely because you don't examine the code closely. And the code you don't examine is where the security holes and the un-designed user experience live.

The honest position

I vibe-code all the time. I vibe-coded a file conversion tool last week and a Telegram bot for personal use the week before. These tools work, they serve their purpose, and I didn't need to understand every line. I wouldn't give it up.

But when I sit down to work on agent infrastructure — systems that run autonomously, touch real data, and need to earn the trust of organisations with compliance requirements — I switch modes. I read every line. I think about failure modes. I validate inputs. I test edge cases. Not because I'm a perfectionist, but because the consequences demand it.

The gap between these two modes isn't about skill or experience. It's about whether the software you're building has an audience of one (you) or an audience that includes people who didn't choose to trust your vibes.

For the first audience, vibes are great. For the second, vibes are where the conversation starts.