I Tested Context Engineering for 30 Days — Here's What Happened
Three weeks ago, our new hire asked me why the payment service sometimes charges customers twice. I pointed her to the relevant files. She read through 2,000 lines of TypeScript. An hour later, she still didn't understand the idempotency key logic.
That's when it hit me: we've been engineering code for decades, but we're terrible at engineering context.
Context engineering is the practice of deliberately structuring, capturing, and maintaining the information developers need to understand and modify code. Not documentation. Not comments. The actual context that connects "what this code does" to "why it exists" to "how it fits into everything else."
I spent 30 days implementing context engineering practices on our 150k LOC TypeScript monorepo. Here's what happened.
First, I measured how long it took our team to complete common tasks:
Understanding a feature end-to-end: 3-4 hours
Fixing a bug in unfamiliar code: 2-6 hours
Onboarding a new engineer to make their first meaningful PR: 5-7 days
I also tracked AI assistant failures. Copilot would suggest code that violated our authentication patterns. Claude would write database queries that ignored our soft-delete convention. GPT-4 would confidently generate API responses in the wrong format.
The problem wasn't the AI. The problem was that we'd never given these tools proper context about our codebase.
Our README had installation instructions. Our wiki had architecture diagrams from 2022. Individual files had comments explaining what the code did, but not why decisions were made or how pieces connected.
Week 2: Building a Context Map
I started by identifying what developers actually need to know:
System Context: What does this service do? How does it interact with other services?
Feature Context: What business problem does this solve? What are the edge cases?
Code Context: Why was it implemented this way? What alternatives were considered?
Historical Context: What changed? Why? What broke last time someone modified this?
Then I created a context map. For each major feature, I documented:
# Payment Idempotency
**Business Context**: Prevent duplicate charges when customers retry failed payments or refresh checkout pages.
**Implementation**: Uses Redis-backed idempotency keys (24hr TTL). Key format: `idempotency:${userId}:${cartHash}`.
**Why This Way**: Considered DB-based approach, but Redis TTL handles cleanup automatically. Tried client-generated keys but got collisions with browser caching.
**Edge Cases**:
- Key expires mid-transaction → accept duplicate charge risk
- Redis unavailable → fall back to synchronous DB check (slower)
- Cart modified after key generated → hash mismatch, new key
**Related Code**:
- `/src/payments/idempotency.ts` - core logic
- `/src/payments/checkout.controller.ts` - key generation
- `/tests/integration/duplicate-charges.test.ts` - scenarios
**Last Major Change**: Dec 2023 - added cart hash to key format after Black Friday incident
This took 8 hours for our core payment flow. But here's the thing: I didn't write most of it from memory.
I used Glue to analyze our codebase and generate the initial context map. It identified the relevant files, tracked code churn to find what changes frequently, and pulled in commit messages that explained architectural decisions. I spent my time validating and enhancing what it generated, not starting from scratch.
Week 3: Context-Aware AI Actually Works
With context maps in place, I fed them to our AI tools via MCP (Model Context Protocol). This is where things got interesting.
Before context engineering, Copilot would suggest this for a payment retry:
It understood our patterns. It knew about the cart hash requirement. It followed our conventions.
I tracked AI suggestion acceptance rates:
Week 1: 23% of Copilot suggestions accepted
Week 3: 67% of Copilot suggestions accepted
The AI didn't get smarter. We just gave it the context it needed.
Week 4: The Real Win Was Humans
AI improvements were cool. But the bigger impact was on our team.
Remember that new hire struggling with the payment service? I handed her the context map. 20 minutes later, she understood not just what the code did, but why it existed and how it connected to checkout, refunds, and subscription billing.
Her first PR was solid. No "why did you do it this way?" comments. No "this breaks the idempotency system" reviews. She had the context she needed from day one.
I measured onboarding time again:
Before: 5-7 days to first meaningful PR
After: 3 days to first meaningful PR
That's a 40% improvement. And these engineers weren't just moving faster — they were making better decisions because they understood the broader system.
The Context Engineering Patterns That Worked
After 30 days, here's what actually moved the needle:
1. Context lives next to code
Don't put important context in Notion or Confluence. Put it in the repo. We created a /docs/context directory with markdown files that sit alongside the code they describe. When code changes, the context file shows up in the same PR.
2. Context answers "why", not "what"
Code already tells you what it does. Good context explains why decisions were made. "We use Redis here because..." is more valuable than "This function uses Redis."
3. Context includes failure modes
The most valuable context often comes from what went wrong. "We tried approach X but it failed because Y" prevents future developers from repeating mistakes.
4. Context is discoverable
We use Glue's context search to make it easy to find relevant context when you're looking at a specific file or feature. Type "payment idempotency" and you get the context map, related files, recent changes, and who to ask for details.
5. Context shows ownership
Knowing who has context about a system is sometimes more valuable than the documentation itself. We annotate our context maps with "Primary: @jane, Secondary: @bob" so people know who to ask.
The Numbers After 30 Days
I'm not big on vanity metrics, but these mattered:
40% faster onboarding: New engineers productive in 3 days vs 6
60% fewer context-related bugs: Bugs from misunderstanding system behavior dropped significantly
3x AI acceptance rate: Jump from 23% to 67% for AI code suggestions
2 hours saved per code review: Less "why did you do this?" back-and-forth
Zero "we tried this before": Stopped repeating failed experiments
The time investment was real. About 40 hours total over 30 days to create context maps for our core systems. But we've already saved more time than we invested.
What I'd Do Differently
Start smaller. I tried to context-map everything at once. Bad idea. Focus on the areas where lack of context causes the most pain:
Features that multiple teams touch
Systems with complex business logic
Code that changes frequently
Areas where onboarding is painful
Also, keep context maps short. My first drafts were too detailed. Aim for "enough context to make good decisions" not "complete knowledge transfer." A 200-word context map that people actually read beats a 2,000-word document that sits unread in the wiki.
The Thing Nobody Talks About
Here's what surprised me: context engineering exposed how much institutional knowledge lives only in people's heads.
When I asked senior engineers to review context maps, they'd add things like "oh yeah, we can't modify orders after they hit the warehouse system" or "the payment provider has a 30-second timeout we need to stay under." Critical information that existed nowhere in our codebase or documentation.
Capturing this knowledge before it walks out the door during turnover? That's the real value.
Is This Worth Your Time?
If your team is small (2-3 engineers all working on the same code), probably not. Everyone has the context already.
But if you're at the point where:
Onboarding takes more than a week
Engineers are afraid to touch certain parts of the codebase
You keep having "didn't we try this before?" conversations
Your AI tools generate code that violates your patterns
Then yeah, context engineering is worth it.
Start with one painful area. Create a context map. Give it to your AI tools through MCP. Watch what happens.
I've kept this up beyond the 30-day experiment. It's now part of how we work. New features don't just get code — they get context. And that context makes everything else easier.