AI for Software Development: Hidden Truths Nobody Tells You
Everyone's using AI to write code now. GitHub Copilot, ChatGPT, Claude, Cursor. The demos look incredible. Type a comment, get a function. Ask a question, get an explanation.
But after six months of real-world use, most teams discover the same problems. The AI suggests code that doesn't match your patterns. It hallucinates APIs that don't exist. It confidently explains features you deprecated months ago. And when you ask it to refactor something complex, it produces technically correct code that violates every convention your team has established over three years.
The demos don't show these failures because demos use toy projects. Your codebase has 200,000 lines across 50 repositories with undocumented decisions made by engineers who left two years ago.
The Context Problem Nobody Talks About
Here's what actually happens when you use AI coding tools:
You're working on a payment flow. You ask your AI assistant to add retry logic. It generates beautiful code with exponential backoff, jitter, circuit breakers—textbook perfect. Except your company already has a retry library. It's used in 47 other places. It has specific error handling for your payment provider's quirks. It integrates with your observability stack.
The AI doesn't know this exists. Why would it? The AI has a 128k token context window. Your codebase is millions of tokens. Even if you could fit everything, the AI can't distinguish between the important patterns and the legacy code you're actively migrating away from.
So you use the AI-generated code. Three months later, someone encounters a payment bug that the standard library would have caught. They dig through the codebase and find one payment handler using different retry logic. Now they don't know which approach is correct.
This happens constantly. AI tools optimize for appearing helpful in the moment. They don't optimize for consistency across your entire codebase.
The Hallucination Tax
AI vendors call them "hallucinations" like it's a quirky bug. In production codebases, hallucinations are expensive.
An engineer asks the AI about your authentication middleware. The AI confidently explains a feature that almost exists—you discussed it in a design doc but never implemented it. The engineer builds on this assumption. Two weeks later during code review, someone catches it. That's two weeks burned.
Or worse: nobody catches it until production.
I've seen teams spend entire sprint retrospectives discussing AI-generated bugs. The pattern is always the same. The AI suggested something plausible. The engineer trusted it because it looked right. The tests passed because the tests also don't know your full context.
The real cost isn't the bugs. It's the trust erosion. After a few hallucinations, engineers start fact-checking everything the AI suggests. Now your "productivity tool" adds overhead.
The Token Economics Nobody Shows You
Let's talk about money.
You're debugging a complex issue. You paste code into ChatGPT. You iterate back and forth. Each round trip costs tokens. By the time you solve the problem, you've sent the same context five times because the conversation grew too long and you had to start fresh.
Now multiply this across your team. Twenty engineers doing this daily. The token costs add up fast. More importantly, the cognitive overhead of managing context windows becomes a tax on every interaction.
Some teams try to solve this by pasting their entire codebase into prompts. This fails immediately because:
Most codebases exceed context limits
LLMs perform worse with huge contexts—they lose details in the middle
You're paying to process mostly irrelevant code
The alternative vendors push is RAG (Retrieval Augmented Generation). Search your codebase, pull relevant chunks, add them to the prompt. Sounds good. Works poorly.
Why? Because similarity search finds syntactically similar code, not semantically relevant code. Your AI searches for "retry logic" and gets every file with a retry loop, including that spike project from two years ago that never shipped.
What Actually Works
The teams that get value from AI coding tools do one thing differently: they build context deliberately.
They don't expect the AI to magically understand their codebase. They treat context as a product. They create architectural decision records. They document patterns. They maintain clear ownership boundaries. They use tools that understand their specific codebase structure.
This is where platforms like Glue become relevant. Instead of hoping the AI will figure out your codebase, Glue indexes it explicitly. It discovers features through AI analysis, maps relationships between components, tracks which code is actually maintained versus abandoned, and exposes this through a knowledge graph that any AI tool can query.
The difference is architectural. Most AI tools operate in isolation—they only know what you paste into each conversation. Glue builds persistent knowledge about your codebase that accumulates over time. When you ask about retry logic through an AI that's integrated with Glue's context, it knows which implementation is current, who owns it, what depends on it, and how it's evolved.
The Integration Problem
Here's another truth: your team uses multiple AI tools. Some engineers love Cursor. Others prefer Copilot. Your designers use Claude. Your PMs use ChatGPT.
Each tool learns nothing from the others. An engineer using Cursor figures out a tricky pattern. That knowledge dies in that conversation. Another engineer hits the same problem in Copilot the next day. They solve it differently.
Over time, your codebase develops inconsistencies. Not because engineers are careless, but because they're getting different guidance from different AIs, none of which know what the others suggested.
The solution isn't standardizing on one tool. Engineers should use what makes them productive. The solution is standardizing the context all tools receive.
Glue's MCP (Model Context Protocol) integration does this. It provides a consistent context layer that works with Cursor, Copilot, Claude, or whatever tool ships next month. When an engineer discovers something useful, it can inform the context available to the entire team.
The Documentation Lie
AI vendors love to say their tools make documentation optional. "Just ask the AI!" Except the AI is learning from your undocumented code and propagating its implicit assumptions.
Good documentation doesn't just explain what code does. It explains why decisions were made. What alternatives were considered. What constraints existed. What assumptions might break.
AI can't infer this from code. When it tries, it hallucinates plausible-sounding rationales that might be completely wrong.
The teams that succeed use AI to generate documentation, then human engineers to verify and enrich it. Glue's documentation generation works this way—it produces initial docs from code analysis, but treats them as drafts. The real value comes from the workflow: AI does the tedious work of writing basic descriptions, humans add the context that actually matters.
Real Numbers From Real Teams
I talked to a team at a Series B startup. Twenty engineers. They'd been using Copilot for eight months. I asked about productivity gains.
"Hard to measure," the engineering manager said. "Some tasks are definitely faster. But we're also fixing more AI-introduced bugs. And our code consistency has gotten worse."
They weren't tracking token costs. When we added it up, they were spending about $400/month on API calls from engineers copy-pasting code into ChatGPT for explanation. Not a huge number, but they were also paying for Copilot licenses. The combined cost was real.
More importantly: three of their senior engineers had stopped using AI tools. Not because they were AI skeptics. Because the tools kept suggesting patterns that violated the team's architecture. It was faster to write code themselves than to fix AI suggestions.
This is the reality that demos don't show.
What to Actually Do
If you're using AI coding tools—and you should be—here's what works:
First, accept that AI tools are assistants, not replacements. They're great for boilerplate, converting formats, explaining unfamiliar syntax, brainstorming approaches. They're terrible at understanding your specific context without help.
Second, invest in context infrastructure. This means documentation, but it also means tools that understand your codebase structure. Architecture decision records. Clear ownership. Code that's actively maintained versus deprecated. Tools like Glue that build this understanding systematically rather than hoping each engineer discovers it individually.
Third, standardize the context layer, not the AI tool. Let engineers use whatever makes them productive. Focus on ensuring all tools receive consistent, accurate information about your codebase.
Fourth, track the real costs. Not just token usage, but time spent fixing AI-introduced bugs, inconsistencies that emerge from AI suggestions, and context management overhead.
Fifth, build feedback loops. When an AI suggests something wrong, that should improve the context available to everyone. When an engineer discovers a useful pattern, other engineers should benefit.
The Real Promise
AI coding tools aren't overhyped. They're genuinely useful. But they're useful the way a power tool is useful—in the hands of someone who knows what they're building and has the right setup.
The promise isn't that AI will write your code for you. It's that AI can handle the mechanical parts while you focus on the interesting problems. But only if the AI understands enough about your specific codebase to suggest things that actually fit.
This is why the winning approach isn't better prompts or bigger context windows. It's building systems that give AI tools the specific knowledge they need about your specific codebase. Knowledge graphs instead of text search. Structured understanding instead of pattern matching.
The teams that figure this out will compound their AI productivity gains. The teams that keep copy-pasting code into ChatGPT will keep getting inconsistent results.
The technology works. But only if you do the work of teaching it about your code.