Context Engineering vs Vibe Coding
Software engineering is changing fast. One person with AI can now prototype what used to take a small team weeks. But there's a catch — the way you work with AI matters more than which AI you use.
Two approaches have emerged: vibe coding and context engineering. Understanding when to use each is the key skill of AI-assisted development.
What Is Vibe Coding?
The term was coined by Andrej Karpathy in 2024. The idea is simple: instead of specifying exactly how something should be built, you describe what you want in natural language. The AI figures out the implementation.
You don't say "create a React component with useState and useEffect for data fetching with proper error handling." You say "I want a button that shows a modal with a user profile when clicked."
Then you iterate: "make the button bigger", "add a loading spinner", "change the colors to match our brand." You react to something concrete instead of designing from scratch.
The workflow:
- Describe the desired experience
- AI generates initial code
- Review — does it work? Does it match intent?
- Describe adjustments — what's wrong, what to change
- Repeat until the prototype is ready
- Apply engineering rigor before shipping to production
Step 6 is the one people skip. That's where things go wrong.
What Is Context Engineering?
Context engineering is about giving AI the right information and guardrails — not just in a single prompt, but as a persistent, structured environment.
Prompt engineering is crafting one good message. Context engineering is setting up the entire system around AI so it consistently produces good results.
This includes:
- Persistent instructions — files like
CLAUDE.md, Cursor rules, or Copilot instructions that define your project, tech stack, and conventions. Set them up once, apply to every session. - File-based context — your project structure, documentation, and examples already carry information. AI reads them.
- Progressive disclosure — loading context on demand instead of dumping everything at once. Don't waste tokens on irrelevant information.
- Reusable prompts — slash commands and templates for tasks you do repeatedly.
A simple CLAUDE.md example:
## Project: E-commerce App
## Tech Stack: Next.js, TypeScript, Prisma
## Rules:
- Use functional components
- All API calls go through /api routes
- Follow existing naming conventionsOnce this is set up, AI automatically follows these guidelines. No repeating yourself every session.
The key insight: better context beats a bigger model. I've seen teams chase the latest model when the real problem was poor context. An older model with great context outperforms a newer model with bad context every time.
The "Lost in the Middle" Problem
Large prompts have an attention distribution problem. AI pays the most attention to what's at the beginning and at the end. Information in the middle gets less weight — similar to how humans remember first and last items in a list better.
The practical rule:
- Beginning = key rules, system instructions (highest attention)
- End = current task or question (latest message)
- Middle = reference materials, background (model may overlook)
If you bury a critical instruction in the middle of a large prompt, the model is likely to miss it.
When to Vibe vs When to Be Precise
It's not either/or — it's a spectrum. You move along it as you go from exploration to production.
| Vibe Coding works for | Context Engineering needed for |
|---|---|
| Prototyping new ideas | API integrations |
| Exploring UI variations | Security-critical features |
| Internal tools | Specific business logic |
| Initial scaffolding | Data transformations |
| MVPs where speed matters | Compliance requirements |
Start with vibe to explore quickly, then switch to precision as the stakes increase.
Test-Driven Development with AI: The RGR Pattern
One of the most effective ways to improve AI code quality is surprisingly old-school: Test-Driven Development.
The classic Red-Green-Refactor (RGR) cycle translates remarkably well to AI-assisted workflows:
- Red — Write a failing test that defines the expected behavior
- Green — Let the AI agent write the minimal implementation to pass the test
- Refactor — Clean up the code while keeping all tests green
Why does this work so well with AI? Because tests are context. When an agent has a clear, executable specification of what "correct" looks like, the output quality jumps significantly. The tests act as guardrails — the agent can run them, see failures, and iterate until everything passes. There's no ambiguity about what "done" means.
The workflow in practice:
- Write a clear spec — describe the function signature, inputs, outputs, and edge cases
- Have the agent write tests first — or write them yourself if the logic is critical
- Let the agent implement — it now has concrete success criteria
- Agent runs tests, iterates — the feedback loop is automatic
- Refactor — simplify, extract, rename — tests catch regressions
Yes, this consumes more context tokens. You're sending test files, test output, and failure messages back and forth. But the trade-off is worth it — the agent produces code that actually meets your requirements instead of code that merely looks plausible.
The key principles:
- Small iterations. Don't ask the agent to build an entire module at once. One function, one test suite, one cycle. Smaller scope means fewer hallucinations and clearer feedback loops.
- Clear specifications. TDD with AI only works if you know what you want. Vague requirements produce vague tests, which produce vague code. Write precise input/output expectations.
- Let tests drive the conversation. Instead of describing what's wrong in natural language, let the test output speak. A failing assertion is unambiguous — far more useful than "the sorting doesn't work right."
This approach combines the best of both worlds. You keep the speed advantage of AI-generated code while maintaining the quality assurance of a disciplined engineering process. The agent isn't just generating code — it's generating code against a verifiable contract.
One practical tip: if you're using an AI coding agent (Claude Code, Cursor, etc.), include a rule in your context file that says "always write tests before implementation" or "follow TDD — red, green, refactor." This nudges the agent toward the pattern automatically.
The Productivity Paradox
The research on AI-assisted development tells a surprising story. Studies show contradictory results:
| Study | Context | Finding |
|---|---|---|
| Google RCT (2024) | Unfamiliar codebase | +21% faster |
| Microsoft/Accenture (2024) | Controlled tasks | +26% faster |
| METR Study (2025) | Own large repos | -19% slower |
The pattern makes sense when you look closer: AI helps in unfamiliar territory (new codebase, new framework). It can actually hurt when you already have deep context about your own code.
Junior developers see 35–39% speed gains. Seniors? Only 8–16% — or even negative.
The most striking finding is the perception gap: developers believed they were 20% faster. The actual measurement showed they were 19% slower. Don't trust your gut feeling about AI productivity — measure it.
The Learning Trade-off
A 2026 study by Shen & Tamkin at Anthropic looked at developers learning a new async Python library with and without AI. The finding: AI use impairs skill development. Full delegation resulted in zero learning.
They identified six interaction patterns:
| Preserves learning | Impairs learning |
|---|---|
| Asking about concepts | Full delegation |
| Debugging step by step | Generate first, understand later |
| Gradually increasing AI use | Asking AI to explain code for you |
The surprise: having AI explain code to you doesn't build the same understanding as figuring it out yourself.
Stay cognitively engaged. Use AI as a thinking partner, not a replacement for thinking.
Security: Where It Gets Dangerous
AI-generated code introduces real risks:
- ~40% of Copilot-generated code contains security weaknesses
- Developers using AI wrote less secure code but were more confident it was secure
- AI-generated code gets rewritten 2x faster than human code — suggesting hidden quality issues
- AI can hallucinate package names. Attackers register those names with malware. You
npm installand get compromised
Common vulnerabilities: SQL injection, path injection, hardcoded credentials, OS command injection.
The overconfidence problem is the most dangerous part. The code looks professional, compiles, runs — but hides subtle vulnerabilities.
Anti-Patterns to Avoid
- "Vibe to production" — shipping AI-generated code without engineering review. The whole point of vibe coding is fast iteration, not skipping quality gates.
- "Rubber stamping" — approving AI code without actually reading it. "It came from AI, looks fine" defeats the purpose of review.
- "Context neglect" — explaining the same things to AI every day instead of setting up persistent context files.
- "Trust without verification" — assuming AI output is correct because it sounds confident. AI can be confidently wrong.
- "One-and-done prompts" — accepting the first output without iterating. Prompts need refinement, like code.
What to Do Instead
- Invest in context first. Set up your
CLAUDE.md, project docs, and persistent instructions. This has the highest ROI of anything you can do with AI tools. - Know which mode you're in. Are you exploring or building for production? Switch consciously.
- Use TDD as your quality gate. Let tests define correctness before the agent writes a single line of implementation. Iterate in small cycles.
- Review AI code more carefully than human code. Humans make predictable mistakes. AI makes different, subtle ones.
- Run static analysis. Tools like CodeQL, Bandit, or ESLint catch what your eyes might miss.
- Build a prompt library. When something works, save it. Share it with your team. Iterate on it.
- Version control everything. AI-generated code goes through the same git workflow as any other code.
MCP: Connecting AI to Your Tools (With a Caveat)
Model Context Protocol (MCP) is an open standard by Anthropic for connecting AI tools to external services — databases, APIs, GitHub, Jira.
Think of it as USB for AI. Before USB, every device had its own connector. MCP standardizes how AI tools talk to external systems. Write an MCP server once, and any compatible AI client can use it.
The trap: MCP servers can silently eat your context window. Every tool definition, every response from an external service — it all counts as tokens. Connect a few MCP servers and suddenly a large chunk of your context is consumed before you even start working. I've seen setups where MCP integrations used tens of thousands of tokens just for tool definitions alone.
Use MCP with intent. Do you actually need this connection for the current task? Not every session needs access to Jira, your database, and three other services at once. Load what you need, when you need it.
Skills: The New Context Engineering Pattern
Skills solve something MCP doesn't: teaching AI how to do things consistently, not just connecting it to external systems.
A skill is a folder containing instructions, scripts, and resources that AI discovers and loads dynamically when relevant to a task. Think of them as reusable expertise packages — a brand guidelines skill, a code review skill, a deployment checklist skill.
What makes skills interesting for context engineering:
- Progressive disclosure by design. Only metadata loads first (~100 tokens). Full instructions load only when the skill is actually needed. This prevents the context bloat problem that plagues MCP.
- Reusable across conversations. If you find yourself repeating the same instructions to AI, that's a signal to turn them into a skill.
- Simple to create. Markdown files with some YAML metadata. No protocol specification, no server infrastructure.
- Composable. Multiple skills combine naturally. A project can have a coding conventions skill, a testing skill, and a deployment skill working together.
The key difference from MCP: MCP provides connectivity (access to tools and data). Skills provide expertise (how to use that access well). A well-structured skill setup means AI knows your team's conventions, your project's patterns, and your preferred approaches — without repeating them every session.
Simon Willison called skills "maybe a bigger deal than MCP" — and the reasoning is sound. Skills are closer to how context engineering works in practice: packaging knowledge so it's available on demand, not all at once.
The Paradox: Understanding Code Matters More Than Ever
In an era where AI writes more and more of our code, deep understanding of what the code actually does has become more valuable — not less.
When you wrote every line yourself, you understood it by default. You lived through the decisions, the trade-offs, the bugs. Now, AI hands you hundreds of lines in seconds. If you can't read them critically, you're shipping code you don't understand into systems you're responsible for.
This is where the skill gap will widen. Not between people who use AI and people who don't — but between those who can evaluate AI output and those who can't. Two areas matter most:
Security. AI generates code that compiles, runs, and passes basic tests — while quietly introducing vulnerabilities that look correct at first glance. SQL injection, improper input validation, hardcoded secrets. You need to understand attack vectors, trust boundaries, and data flow well enough to spot what the AI missed.
Performance. AI optimizes for "works" — not for "works well at scale." It will generate an O(n²) solution, an unnecessary database round-trip, or a memory leak that only shows up under load. Knowing the difference between code that runs and code that runs well requires understanding no amount of prompting can replace.
The developers who thrive won't be the ones who generate the most code. They'll be the ones who know when AI is wrong — even when the output looks confident, professional, and complete. Especially then.
AI is a power tool. A power tool in the hands of someone who doesn't understand the material is just a faster way to make mistakes.
Key Takeaways
- Context quality > model selection. Better context beats a bigger model.
- Vibe for prototyping, precision for production. Know when to switch.
- TDD works even better with AI. Tests give the agent clear guardrails and a verifiable definition of done. Iterate in small cycles with clear specs.
- Always review AI code — especially for security. It's more important, not less.
- Stay cognitively engaged. Full delegation = zero learning.
- Measure, don't assume. Your perception of AI productivity may be wrong.
- Set up your context once, benefit every session. Highest-leverage thing you can do.
References & Further Reading
Research:
- Shen & Tamkin (Anthropic, 2026): How AI Impacts Skill Formation
- METR Study (2025): Measuring AI Impact on Developer Productivity
- Security Implications of AI Code Assistants
- Stanford User Study: AI assistants and code security
- GitClear (2025): AI Code Quality Research
- Google RCT & Microsoft/Accenture: referenced in The Reality of AI-Assisted Software
Articles:
- Simon Willison: Claude Skills are awesome, maybe a bigger deal than MCP
- Anthropic: Skills explained — How Skills compares to prompts, Projects, MCP, and subagents
- Martin Fowler: Context Engineering for Coding Agents
Books:
Videos: