When a developer asks "which files handle authentication?" they expect a precise answer. Not a keyword search. Not a directory listing. An actual traced answer: these 14 files, across 4 directories, form the authentication feature. Here's how they connect.
Building this requires solving a graph clustering problem. Here's exactly how we do it.
The Problem
A codebase is a graph. Files import other files. Functions call other functions. Types reference other types. This creates a dense dependency network.
The question is: which clusters of files form coherent features? "Authentication" isn't a directory — it's files spread across controllers, services, middleware, types, and utils. The grouping is structural (based on dependencies), not spatial (based on file paths).
This tells you about architecture. It doesn't tell you about features. The authentication feature is authController.ts + authService.ts + authMiddleware.ts + User.ts + parts of the session configuration. These files are in 4 different directories, but they form one cohesive unit.
The Graph Approach
Step 1: Build the Dependency Graph
We extract every import, function call, and type reference in the codebase. Each file becomes a node. Each dependency becomes an edge. The edge weight reflects the strength of the connection:
Import: weight 1.0 (direct dependency)
Function call: weight 0.8 (runtime dependency)
Type reference: weight 0.5 (structural dependency)
Test file to source: weight 0.3 (test dependency)
For a typical 4,000-file codebase, this produces a graph with 4,000 nodes and 15,000-40,000 edges.
Step 2: Louvain Community Detection
Louvain is a modularity optimization algorithm. It groups nodes into communities that maximize internal connections and minimize external connections. In our case: files that depend heavily on each other get grouped together.
The algorithm works in two phases:
Phase 1 (Local): Each node starts in its own community. For each node, try moving it to each neighbor's community. Accept the move that gives the largest modularity gain. Repeat until no move improves modularity.
Phase 2 (Aggregation): Collapse each community into a single node. Build a new graph where edge weights between super-nodes are the sum of edges between their constituent nodes. Go back to Phase 1.
This repeats until modularity stabilizes. The result: a hierarchical clustering of files into features.
Step 3: Feature Labeling
Louvain gives us clusters, not names. We label features by analyzing the cluster contents:
Extract the most common domain terms from file names and function names in the cluster
Identify the "entry point" files (most imported by other clusters) — these define the feature's public interface
Use the entry point names and domain terms to generate a feature label
92% accuracy when validated against our team's mental model of feature boundaries
Processing time: 4.2 seconds for the full clustering
On a client codebase (~4,000 files, Node.js monolith):
41 features detected
134 sub-features
Revealed 3 unexpected dependencies between features the team thought were independent
Processing time: 11.8 seconds
Edge Cases and Refinements
Utility Files
Files like utils.ts or helpers.ts connect to everything. They create noise in the clustering. We handle this by:
Detecting high-degree nodes (imported by >30% of the codebase)
Reducing their edge weights by a dampening factor
Allowing them to be assigned to the cluster they're most strongly connected to
Test Files
Test files mirror source files structurally but shouldn't dominate clustering. We assign them to their source file's cluster with reduced edge weight.
Configuration Files
Config files (next.config.ts, .env, tsconfig.json) are excluded from clustering — they're infrastructure, not features.
Why This Matters for Developers
Feature clustering powers everything else in Glue:
"What files handle authentication?" → query the authentication cluster
"What depends on authentication?" → check cross-cluster edges from the auth cluster
"What's the blast radius of changing this auth function?" → trace dependencies within and across clusters
"Which features are most complex?" → measure intra-cluster density and cross-cluster coupling
It turns a bag of 4,000 files into a structured, queryable knowledge graph. That's the foundation that makes every other intelligence feature possible.
Keep Reading
Feature clustering is the foundation that eliminates the Understanding Tax. When you can instantly answer "which files handle authentication?", the 30-90 minutes of manual context gathering disappears.
Glue runs Louvain clustering during codebase indexing, building the feature map that powers every pre-code intelligence query — from ticket analysis to blast radius to tribal knowledge surfacing.