Blast Radius Oracle FAQ: Building Code Change Impact Analysis
You change one line in auth.service.ts and suddenly the checkout flow breaks. Nobody warned you. The tests passed. Your PR got approved. Production is on fire.
This happens because we're terrible at predicting consequences. We treat codebases like deterministic machines when they're more like ecological systems—everything affects everything else in ways that aren't obvious until something dies.
I spent the last year building a blast radius oracle for Glue. Not the "here are your direct dependencies" kind that every IDE already does. The kind that tells you "if you change this authentication helper, these seven customer-facing features will need testing, and three of them are already in bad shape."
The naive approach is simple: parse the code, build a dependency graph, walk it. Done.
Except that misses almost everything that matters.
Consider a database schema change. Your static analysis sees zero callers because it's data, not code. But sixteen features will break. Or a CSS class rename that your bundler doesn't catch. Or a feature flag removal that affects behavior without changing call sites.
The real problem is semantic relationships that transcend syntax. You need to answer "what does this code mean in the context of what users can do?"
That requires building multiple graphs and overlaying them:
The syntax graph (imports, function calls, type references) The runtime graph (what actually executes together) The semantic graph (which files contribute to which features) The organizational graph (who owns what, who's changed what recently)
Most tools stop after the first graph. That's why they're not useful.
File-to-Feature Mapping: The Foundation
You can't predict blast radius without knowing what features exist. Not "services" or "modules"—actual user-facing capabilities.
We do this through multi-signal feature discovery:
API boundary analysis. Every HTTP endpoint is a potential feature boundary. We trace from routes backward through middleware, services, and data access layers. An endpoint like POST /api/checkout/complete maps to a checkout feature cluster.
UI component trees. In frontend codebases, features are component hierarchies. We identify root components (usually route-level) and build downward, tracking props flow and event handlers. The CheckoutFlow component and everything it imports is checkout-related.
Behavioral clustering. Files that change together probably belong to the same feature. If payment.service.ts, order.repository.ts, and inventory.service.ts get modified in the same PRs repeatedly, they're part of a logical unit even if the imports don't make that obvious.
Domain language extraction. We parse identifiers, comments, and docs looking for domain terms. Files containing "checkout", "payment", "cart" cluster together. This sounds crude but works surprisingly well when combined with other signals.
The output is a probabilistic mapping: each file gets feature tags with confidence scores. auth.service.ts might be 90% auth-feature, 40% checkout-feature (because checkout uses auth), 20% admin-feature.
This fuzzy mapping is crucial. Clean boundaries don't exist in real codebases. Everything bleeds into everything else.
Call Graph Analysis That Doesn't Lie
Static call graphs are easy to build and mostly useless.
They tell you functionA calls functionB. They don't tell you if that call happens once during initialization or a million times in the hot path. They don't tell you if it's behind a feature flag that's always false. They don't distinguish between error handling code that never runs and critical business logic.
We build a weighted call graph using:
Static analysis as the base layer. Parse the AST, extract calls, build edges. This gives you topology but not importance.
Execution frequency estimates. We analyze code patterns to guess hotness. Code inside loops scores higher. Error handlers score lower. Event handlers score based on how critical the events are (mouse clicks > resize events). This is heuristic but better than uniform weighting.
Churn correlation. If functionA and functionB change in the same commits frequently, they're operationally coupled even if the static link is indirect. Maybe they both touch the same database table. Maybe they're different implementations of the same protocol. Whatever—they move together, so changes to one affect the other.
Test coverage overlay. A function with zero tests is higher risk. If it's also called by other untested code, you've found a blast radius amplifier. One change potentially breaks many things with no safety net.
For Glue's own codebase, this revealed that our feature discovery pipeline—seemingly a background job—was actually called inline during several API requests. The static graph showed the calls, but the weighted graph showed those were hot paths. One optimization there yielded a 200ms latency improvement in three different endpoints.
Churn-Based Risk Scoring
Not all dependencies matter equally. If file A imports file B, the risk depends on whether B is stable or actively evolving.
We calculate file-level risk scores from:
Change frequency. Files modified in >10% of recent PRs are high-churn. They're either hotspots of activity or sources of bugs (usually both).
Change recency. A file changed yesterday is higher risk than one changed six months ago. Fresh changes haven't been battle-tested yet.
Author diversity. Files touched by many different people have integration risk. Everyone has slightly different assumptions. More authors means more potential for misunderstanding.
Defect density. We look for commits with keywords like "fix", "bug", "revert" in the message that touch this file. High defect density means the code is either complex or misunderstood—both increase blast radius risk.
Coupling instability. If a file's dependencies change frequently, it's indirectly unstable even if it doesn't change much itself. The ground is shifting underneath it.
These combine into a 0-100 risk score. When you're about to modify a file, we show you not just what depends on it, but which of those dependencies are in good shape vs. which are already fragile.
Example from a real analysis: authMiddleware.ts had 47 direct dependents. Twelve of them were high-risk (score >70). Three of those were in features that had been changed by four different people in the last sprint. Those three got flagged for extra testing attention.
The Organizational Layer
Code doesn't exist in a vacuum. Teams exist.
We add organizational context to blast radius predictions:
Ownership boundaries. If your change affects files owned by another team, that's higher coordination cost and higher risk. Different teams have different testing practices, different deployment schedules, different domain knowledge.
Expertise mapping. Has anyone on your team changed these files before? If not, you're flying blind. We flag "first-time edits" separately from "routine maintenance."
Communication gaps. If your change affects code owned by a team you don't normally work with (measured by commit co-authorship, PR reviews, Slack activity), that's a red flag. The organizational boundary creates information loss.
Glue surfaces this in the PR interface. You see not just technical blast radius but organizational blast radius: which teams need to be looped in, which code is outside your team's expertise, which changes touch poorly-understood areas.
This has caught issues we would have missed. Someone on our API team modified a database query. Static analysis showed minimal impact—just one service file. But that service was owned by our analytics team and had complex performance characteristics we didn't understand. The organizational view flagged it. We talked to them before merging. Turned out the query change would have broken their ETL pipeline.
What Doesn't Work
I've tried a lot of things that sound good but fail in practice:
Machine learning for everything. Yes, you can train models to predict blast radius. They work okay until you refactor something, then they hallucinate wildly because the patterns changed. Rule-based heuristics are more reliable and debuggable.
Asking developers to annotate. Nobody does this consistently. It rots immediately. Any solution requiring ongoing human maintenance fails.
Test execution tracing. In theory you could run tests and instrument what code they touch, building a perfect runtime map. In practice this is too slow, requires test infrastructure investment most teams don't have, and doesn't work for integration tests.
Perfect precision. False positives are annoying but tolerable. False negatives (missing a real impact) are catastrophic. We tune aggressively for recall over precision. Better to over-warn than under-warn.
Real-World Results
Since deploying this in Glue, we've caught:
23 changes that would have broken features owned by other teams
7 cascading refactors where the blast radius was 3-4x larger than estimated
Dozens of "this file is scarier than it looks" warnings that prompted extra testing
Multiple cases where low-risk changes were being over-scrutinized because humans are bad at intuiting complexity
The system isn't perfect. It sometimes flags things as high-risk that turn out fine. It occasionally misses subtle impacts in dynamically generated code. But it's right often enough to be essential. We can't imagine doing large refactors without it now.
Building Your Own
You don't need Glue to implement this. The core ideas work with open-source tools:
Start with feature mapping. Even crude clustering (files that change together) gives you 70% of the value.
Build a weighted call graph. Static analysis plus git history analysis gets you most of the way there.
Add churn metrics. This is just git log parsing. Track changes per file, authors per file, time since last change.
Layer in test coverage data. Most languages have coverage tools that export structured data.
Make it visible during code review. Blast radius is useless if nobody sees it until after merge.
The hard part isn't the technology. It's getting the signal-to-noise ratio right so developers trust it. Start conservative—only flag the obviously dangerous changes. Build trust. Gradually increase sensitivity.