Most AI application tutorials show you how to call an API. They don't show you what happens when that API serves 10,000 concurrent users, each with different context windows, each expecting sub-second responses.
Here's what we learned building Glue's AI infrastructure — the patterns that survived production and the ones that didn't.
The Context Window Problem
Every AI application eventually hits the same wall: context windows are finite, but user context is not.
A developer asking "how does authentication work in this codebase?" might need context from 50+ files, 200+ functions, and years of git history. You can't shove all of that into a single prompt.
Pattern 1: Hierarchical RAG
Instead of flat vector search, build a hierarchy: