Why Context Engineering Matters More Than Model Capability
There's a common assumption in AI-assisted development: better models will eventually figure everything out. The reality is the opposite. Context engineering, not model capability, is the primary bottleneck for AI coding productivity.
The same agent with different context produces dramatically different results. The question that matters isn't "which model is best" but "what context does this agent have access to, and how did it get there." That context falls into six distinct pillars.
Knowledge vs. Expertise
Before the pillars themselves, a distinction worth being explicit about.
Knowledge is what you can derive from reading the code. What exists, how it's structured, which functions call which, what patterns appear. Any agent with file access can build this picture, given enough time and tokens.
Expertise is what surrounds the code but isn't in it. Why this approach was chosen over the alternatives that were considered. What constraints were discovered the hard way. What broke in production six months ago and why the current implementation guards against it. Which patterns are intentional and which are accidents that nobody got around to fixing.
Reading code gives you knowledge. Working in a codebase for years gives you expertise. The gap between the two is what makes a senior engineer's output materially different from a competent newcomer's, even when they're using the same tools.
The same gap exists for AI coding agents. They can acquire knowledge - that's what codebase analysis is for. They cannot acquire expertise unless someone captures it for them. Most teams don't, which is why agent output so often compiles, passes tests, and is wrong for the project.
The Core Questions
Every piece of context an agent needs ultimately answers one of these questions:
| Question | Context Type |
|---|---|
| WHAT should I build? | Intent Context |
| WHY should I build it this way? | Historical Context |
| HOW do we do things here? | Convention Context |
| WHAT exists already? | Structural Context |
| WHAT could go wrong? | Operational Context |
| HOW do I know it's right? | Verification Context |
The revelation: all this context already exists in every software organization. It's scattered across people's heads, documents that drift from reality, implicit patterns in code, and conversations that disappear. The opportunity is to materialize it, version it, and make it queryable.
The Six Pillars of Development Context
1. Intent Context
Intent context answers: WHAT should I build and WHY at the task level?
This includes goals, acceptance criteria, and explicit non-goals. The bulk of intent gets established before any code is written - during the planning conversation where the approach is chosen, alternatives are weighed, and constraints are surfaced. The remainder emerges during implementation, when the developer or agent discovers something the plan didn't anticipate.
The practical question is where this intent lives. If it lives only in a chat window that gets closed, it's gone the moment the session ends. If it lives in a planning doc that nobody links from the code, the next person to touch the work has to reconstruct it.
2. Structural Context
Structural context answers: WHAT exists and HOW is it organized?
Architecture, patterns, dependencies, module boundaries, data flow, API contracts. This is highly extractable from the codebase itself - file structure, imports, ASTs, type signatures. It's also the substrate everything else hangs off: a convention or a verification rule has to apply to something, and that something is a module or a file or a service. Structural context is what gives the rest of the system a coordinate system to anchor to.
This is the primary value from cold-start extraction. The agent immediately knows where new code goes and what exists to build on, which is most of what it needs to stop guessing.
3. Convention Context
Convention context answers: HOW do we do things here?
Code style, design patterns, error handling approaches, testing strategy, commit conventions. Conventions are what make code look like it belongs. Without them, agent output is obviously AI-generated even when it's technically correct - wrong indentation, generic error messages, the import order nobody on the team would write.
Some conventions are extractable from existing code (the linter config, the recurring patterns). Others have to be captured deliberately because they only show up in code review comments or team discussions.
4. Historical Context
Historical context answers: WHY are things the way they are?
Decisions made, decisions rejected, system evolution, bug patterns, intentional workarounds. This is the pillar that sits closest to expertise. Knowing what was rejected is often more valuable than knowing what was chosen - it prevents the agent from re-litigating settled debates, and it reveals constraints that aren't visible in the current code.
Historical context is rarely needed for daily coding. It becomes critical for refactoring, migrations, hard debugging, and any change that requires understanding why the code is shaped the way it is rather than just how it currently behaves.
The hard part has always been capture. Historical reasoning happens during development sessions and conversations, then gets discarded. The recent shift worth noting: this is what Contextual Commits addresses - using git commit bodies as the medium for capturing structured reasoning, so the WHY travels with the code instead of disappearing with the chat window.
5. Operational Context
Operational context answers: WHAT happens in production?
Constraints, failure modes, performance baselines, security boundaries. This context type largely lives outside coding sessions - production reality isn't in the IDE. It enters the development context only when something from production gets captured back: an incident postmortem that reveals a constraint, a load test that establishes a baseline, a security review that identifies a boundary.
6. Verification Context
Verification context answers: HOW do I know it's right?
Quality criteria, test strategy, review checklists, acceptance tests, known risks. This is where current agents fail most dramatically. Without verification context, an agent can write code but cannot evaluate if the code is good - it can only check if it compiles and passes whatever tests already exist. It can't know that this team requires error-handling tests for any new endpoint, or that there's a known N+1 query risk in the data access layer that any change should be checked against.
This is what separates a coding agent from a competent developer: not the ability to write code, but the ability to know whether the code that was written meets the actual standard.
What Makes Context High-Impact?
Not all context types have equal impact on coding session quality:
| Impact Level | Context Types |
|---|---|
| CRITICAL | Structural, Convention, Intent (Goal), Verification |
| SIGNIFICANT | Historical (Decisions, Rejected, Constraints) |
| MARGINAL | Operational (Constraints), Historical (Bug patterns) |
| MINIMAL | Intent (Motivation), Operational (Runtime) |
Different context types influence different quality dimensions:
- Correctness depends on Intent and Structural context
- Compliance depends on Convention and Verification context
- Safety depends on Historical and Verification context
- Maintainability depends on Convention and Historical context
Where the Mechanism Has to Live
A pattern worth being honest about: most attempts to capture context have failed because they require discipline that doesn't survive contact with real development.
Wikis go stale. Architecture decision records get written for the first three decisions and abandoned. Inline comments answer "what" but rarely "why." Slack threads contain genuine reasoning but are unsearchable in any practical sense.
The capture mechanism has to live where the work already happens. Git is the obvious candidate - every developer already commits, every commit already has a body field, every commit body already travels with the code through history. The infrastructure is in place. What's been missing is the convention for what goes into those bodies and how downstream tools should read them.
This is the bet behind contextual commits as a standard: enrich the artifact developers already produce, instead of asking them to maintain a parallel system.
Better Models Amplify Context, They Don't Replace It
A common assumption is that sufficiently advanced agents will simply figure everything out. The opposite is true.
More capable models are better at capturing context: they recognize what's significant in a conversation, extract cleaner decision rationale, identify patterns worth preserving. They're also better at utilizing context: reasoning over larger windows, synthesizing information from multiple sources, applying historical knowledge more precisely.
The bottleneck was never the model's ability to use context - it was getting the right context to the model in the first place.
This creates a compounding relationship. As models improve, the same context infrastructure delivers progressively better outcomes. Teams that accumulate this context gain increasing advantage as the agents consuming it grow more sophisticated.
The Fundamental Limitation
Agents can reverse-engineer the WHAT through codebase analysis, AST parsing, and pattern recognition. They cannot reverse-engineer the WHY.
No amount of codebase analysis will reveal:
- Why GraphQL was rejected
- What production incident led to that defensive timeout
- Which testing patterns the team values versus tolerates
This knowledge exists only in human heads and disappearing conversations. It's the expertise layer, and it only exists if someone made the decision to capture it at the moment it was produced.
The Path Forward
Context doesn't need to be exhaustive or perfect. It needs to be captured where the work already happens, scoped to the parts of the codebase it actually applies to, and accessible to whatever agent is doing the work next.
When an agent combines its native ability to search and analyze code with accumulated context about why things are the way they are, the result is not a different kind of agent. It's the same agent, operating with the institutional knowledge that previously existed only in senior developers' heads.
The goal is simple: help agents produce code that a thoughtful human teammate would produce - code that fits, respects history, anticipates problems, and meets the team's actual quality bar.
