You can't improve what you don't measure. But you can definitely measure the wrong things.
Most teams track coverage and call it "quality." That's like measuring a restaurant by counting plates served. Let me show you what actually indicates code health.
The Metrics That Lie
Test Coverage 80% coverage means 80% of lines were executed during tests. It says nothing about:
- Whether edge cases are tested
- Whether tests actually assert anything useful
- Whether the tests are maintainable
I've seen 95% coverage codebases that were unmaintainable disasters.
Lines of Code More lines ≠ worse code. Sometimes explicit is better than clever. A 20-line function that's readable beats a 5-line function nobody understands.
Cyclomatic Complexity "Keep complexity under 10" is cargo cult programming. A switch statement with 15 cases might be the clearest solution. Context matters more than numbers.
The Metrics That Matter
After building Glue and analyzing hundreds of codebases, here are the metrics that actually predict maintainability:
1. Change Frequency (Churn)
How often does a file change? Files that change constantly are either:
- Central to the product (expected)
- Poorly designed (problem)
// How we track this in Glue (healthInsights.ts)
interface FileMetrics {
file_path: string;
line_count: number;
change_count: number; // Git commits touching file
contributor_count: number; // Unique developers
symbol_count: number; // Functions/classes in file
avg_symbol_complexity: number;
}
Benchmark:
- Normal: < 20 changes/quarter
- Watch: 20-50 changes/quarter
- Critical: > 50 changes/quarter (for non-config files)
2. Contributor Collision
How many people touch the same file?
// From our hotspot detection (hotspotInsights.ts)
// Collision Hotspot Detection
if (file.contributor_count >= 4 && file.change_count >= 30) {
insights.push({
type: 'collision_hotspot',
message: `${file.file_path} is a collision hotspot - ` +
`${file.contributor_count} contributors, ${file.change_count} changes`
});
}
High contributors + high changes = merge conflict hell.
Benchmark:
- Healthy: 1-2 primary contributors per module
- Watch: 3-4 contributors modifying same files
- Critical: 5+ contributors, frequent conflicts
3. God Object Detection
Large files with many symbols that change frequently:
// Our detection algorithm (healthInsights.ts:134)
if (file.change_count >= 80 && file.line_count >= 1000 && file.symbol_count >= 17) {
insights.push({
type: 'god_object',
severity: 'high',
message: `God object detected: ${file.line_count} lines, ` +
`${file.change_count} changes, ${file.symbol_count} symbols. ` +
`Consider extracting focused classes.`
});
}
Benchmark:
- Normal: < 500 lines, < 15 symbols per file
- Watch: 500-1000 lines, 15-25 symbols
- Critical: > 1000 lines with 25+ symbols
4. Blast Radius
What breaks when you change this?
// Our call graph analysis (symbols/[symbolId]/call-graph/route.ts)
// Uses PostgreSQL recursive CTE to trace 10 levels deep
WITH RECURSIVE call_tree AS (
-- Base case: direct callees
SELECT callee_symbol_id, 1 as depth
FROM code_call_paths
WHERE caller_symbol_id = $1
UNION ALL
-- Recursive: callees of callees
SELECT cp.callee_symbol_id, ct.depth + 1
FROM code_call_paths cp
JOIN call_tree ct ON cp.caller_symbol_id = ct.callee_symbol_id
WHERE ct.depth < 10
)
SELECT * FROM call_tree;
Benchmark:
- Low risk: < 10 transitive callers
- Medium risk: 10-50 transitive callers
- High risk: > 50 transitive callers (requires careful testing)
5. Knowledge Distribution
Who knows this code?
// We track contributors per module
{
module: 'payments/',
contributors: [
{ name: 'alice', commits: 67, percentage: 0.67 },
{ name: 'bob', commits: 22, percentage: 0.22 },
{ name: 'charlie', commits: 11, percentage: 0.11 }
],
risk: 'single_point_of_failure' // alice owns 67%
}
Benchmark:
- Healthy: No single contributor > 50% of critical module
- Watch: One contributor > 60%
- Critical: One contributor > 80% (bus factor = 1)
Building a Quality Dashboard
Here's how we structure code health in Glue:
// Overall health score calculation
function calculateHealthScore(metrics: FileMetrics[]): number {
let score = 100;
// Deduct for god objects
const godObjects = metrics.filter(f =>
f.line_count >= 1000 && f.symbol_count >= 17
);
score -= godObjects.length * 5;
// Deduct for high churn
const highChurn = metrics.filter(f => f.change_count >= 50);
score -= highChurn.length * 3;
// Deduct for collision hotspots
const collisions = metrics.filter(f =>
f.contributor_count >= 4 && f.change_count >= 30
);
score -= collisions.length * 4;
return Math.max(0, score);
}
Status Thresholds:
- 🟢 Healthy (70-100): Ship with confidence
- 🟡 Watch (50-69): Monitor these areas
- 🔴 Critical (0-49): Address before adding features
The Complete Measurement Stack
| Layer | What to Measure | Tool | |-------|-----------------|------| | Syntax | Linting violations | ESLint/SonarQube | | Types | Type coverage | TypeScript | | Tests | Meaningful coverage | Jest + mutation testing | | Churn | Change frequency | Git analysis | | Architecture | Coupling, dependencies | Graph analysis (Glue) | | Knowledge | Contributor distribution | Git + org data |
Actionable Benchmarks
Weekly Review:
- Which files changed most?
- Any new collision hotspots?
- Any tests skipped/deleted?
Monthly Review:
- Knowledge distribution changes
- New god objects emerging
- Architecture drift (new unexpected dependencies)
Quarterly Review:
- Overall health score trend
- Major refactoring candidates
- Team topology vs code ownership alignment
The Implementation
We store all this in our api_request_logs partitioned table and calculate insights on demand:
// From apiMetricsService.ts
interface ApiMetrics {
endpoint: string;
method: string;
avg_duration_ms: number;
p95_duration_ms: number;
p99_duration_ms: number;
error_rate: number;
request_count: number;
}
Combined with code metrics, you get a complete picture of system health.
Stop Measuring Theater
The goal isn't green dashboards. It's shipping faster with confidence.
If your metrics don't help you decide:
- Where to invest refactoring time
- Which code needs more testing
- Who should review which PRs
...then you're measuring the wrong things.
Start with churn and contributor distribution. Those two metrics alone will show you where the real problems are.