Your dashboard has 47 metrics. You look at 3. The other 44 are vanity metrics.
Let me tell you which 3 actually matter.
The Vanity Metrics Trap
Lines of Code More lines ≠ progress. Sometimes deleting code is the best contribution.
Commits per Day Measures activity, not value. One thoughtful commit beats ten "fix typo" commits.
Story Points Completed Points are made up. Teams inflate them. Comparing points across teams is meaningless.
Test Coverage Percentage 80% coverage with bad tests is worse than 50% coverage with good tests.
These metrics look good in reports. They don't predict software quality.
The Metrics That Matter
After building Glue and analyzing hundreds of codebases, here are the metrics that actually predict whether you'll ship well:
1. Change Failure Rate
How often do deployments cause incidents?
Change Failure Rate = Deployments Causing Issues / Total Deployments
Benchmarks:
- Elite: < 5%
- High: 5-10%
- Medium: 10-20%
- Low: > 20%
This directly measures deployment quality. No gaming it.
2. Lead Time for Changes
How long from commit to production?
Lead Time = Time(commit merged) → Time(in production)
Benchmarks:
- Elite: < 1 day
- High: 1-7 days
- Medium: 1 week - 1 month
- Low: > 1 month
Long lead times indicate process problems, not code problems.
3. Code Churn Rate
How much code changes repeatedly?
// What we track in Glue (healthInsights.ts)
interface FileMetrics {
file_path: string;
change_count: number; // Total commits
line_count: number;
contributor_count: number;
}
// High churn detection
if (file.change_count >= 50 && file.line_count >= 500) {
insights.push({
type: 'active_churn',
severity: 'medium',
message: `${file.file_path}: ${file.change_count} changes - ` +
`consider architectural review`
});
}
Benchmarks:
- Healthy: < 20 changes/quarter for most files
- Watch: 20-50 changes/quarter
- Critical: > 50 changes/quarter
Files that churn constantly are either central (fine) or poorly designed (problem).
4. Blast Radius per Change
What's the impact of typical changes?
// How we calculate this (call-graph/route.ts)
// Recursive CTE traces callers up to 10 levels
WITH RECURSIVE call_tree AS (
SELECT callee_symbol_id, 1 as depth
FROM code_call_paths
WHERE caller_symbol_id = $1
UNION ALL
SELECT cp.callee_symbol_id, ct.depth + 1
FROM code_call_paths cp
JOIN call_tree ct ON cp.caller_symbol_id = ct.callee_symbol_id
WHERE ct.depth < 10
)
SELECT COUNT(DISTINCT callee_symbol_id) as blast_radius FROM call_tree;
Benchmarks:
- Well-isolated: Average blast radius < 10 files
- Moderately coupled: 10-30 files
- Tightly coupled: > 30 files (changes ripple everywhere)
5. Hotspot Score
Which files are most problematic?
// Our god object detection (healthInsights.ts:134)
function detectGodObjects(files: FileMetrics[]) {
return files.filter(f =>
f.change_count >= 80 && // Changes frequently
f.line_count >= 1000 && // Very large
f.symbol_count >= 17 // Many responsibilities
);
}
Track the count of hotspots over time. Going up = accumulating debt.
The Dashboard That Works
Instead of 47 metrics, track these:
| Metric | What It Tells You | Action | |--------|-------------------|--------| | Change Failure Rate | Deployment quality | Improve testing, rollback | | Lead Time | Process efficiency | Remove bottlenecks | | High-Churn Files | Architectural issues | Refactor or split | | Hotspot Count | Tech debt accumulation | Prioritize cleanup | | Knowledge Silos | Bus factor risk | Cross-training |
Five metrics. All actionable.
How to Track Them
Automated Collection
// We collect this automatically from:
// 1. Git history → churn, contributors
// 2. Code analysis → symbols, complexity, dependencies
// 3. Call graph → blast radius
// 4. CI/CD → lead time, failure rate
interface WeeklyReport {
changeFailureRate: number;
avgLeadTimeDays: number;
newHotspots: string[];
resolvedHotspots: string[];
highChurnFiles: FileMetrics[];
knowledgeSilos: ModuleOwnership[];
}
Manual Review (Weekly)
- Did any deployments cause issues?
- What's our lead time trend?
- Any new hotspots? Any resolved?
- Which files churned most?
Quarterly Assessment
- Are hotspots increasing or decreasing?
- Are knowledge silos getting worse?
- What does the call graph look like?
- Where should we invest refactoring time?
Avoiding Gaming
Any metric can be gamed. Here's how to prevent it:
Coverage gaming: "We hit 80%!"
- Measure mutation testing score instead
- Track what percentage of bugs were in tested code
Velocity gaming: "We did 50 points!"
- Measure customer outcomes instead
- Track bugs shipped, not features shipped
Commit gaming: "I committed 10 times today!"
- Don't track commits at all
- Track merged PRs or deployed changes
The best metrics are ones that are hard to game because they measure outcomes, not activity.
The Integration Point
Metrics become powerful when connected to code reality:
// Bad: "test coverage is 75%"
// Good: "test coverage is 75%, but these 5 hotspot files have 40%"
const insights = {
coverage: {
overall: 0.75,
hotspotCoverage: 0.40,
criticalPaths: [
{ path: 'src/payments/processPayment.ts', coverage: 0.35 },
{ path: 'src/auth/validateToken.ts', coverage: 0.42 }
]
}
};
Context transforms metrics from numbers to decisions.
The Bottom Line
Stop collecting metrics nobody uses.
Start with five:
- Change Failure Rate — Are we breaking prod?
- Lead Time — How fast do we ship?
- Churn Rate — Where are the problems?
- Blast Radius — How coupled is our code?
- Hotspot Count — Is debt growing?
Track these weekly. Review trends monthly. Make architectural decisions quarterly.
Everything else is noise.