Code Metrics: What to Track and Why

Your dashboard has 47 metrics. You look at 3. The other 44 are vanity metrics.

Let me tell you which 3 actually matter.

The Vanity Metrics Trap

Lines of Code More lines ≠ progress. Sometimes deleting code is the best contribution.

Commits per Day Measures activity, not value. One thoughtful commit beats ten "fix typo" commits.

Story Points Completed Points are made up. Teams inflate them. Comparing points across teams is meaningless.

Test Coverage Percentage 80% coverage with bad tests is worse than 50% coverage with good tests.

These metrics look good in reports. They don't predict software quality.

The Metrics That Matter

After building Glue and analyzing hundreds of codebases, here are the metrics that actually predict whether you'll ship well:

1. Change Failure Rate

How often do deployments cause incidents?

Change Failure Rate = Deployments Causing Issues / Total Deployments

Benchmarks:

Elite: < 5%
High: 5-10%
Medium: 10-20%
Low: > 20%

This directly measures deployment quality. No gaming it.

2. Lead Time for Changes

How long from commit to production?

Lead Time = Time(commit merged) → Time(in production)

Benchmarks:

Elite: < 1 day
High: 1-7 days
Medium: 1 week - 1 month
Low: > 1 month

Long lead times indicate process problems, not code problems.

3. Code Churn Rate

How much code changes repeatedly?

// What we track in Glue (healthInsights.ts)
interface FileMetrics {
  file_path: string;
  change_count: number;        // Total commits
  line_count: number;
  contributor_count: number;
}

// High churn detection
if (file.change_count >= 50 && file.line_count >= 500) {
  insights.push({
    type: 'active_churn',
    severity: 'medium',
    message: `${file.file_path}: ${file.change_count} changes - ` +
             `consider architectural review`
  });
}

Benchmarks:

Healthy: < 20 changes/quarter for most files
Watch: 20-50 changes/quarter
Critical: > 50 changes/quarter

Files that churn constantly are either central (fine) or poorly designed (problem).

4. Blast Radius per Change

What's the impact of typical changes?

// How we calculate this (call-graph/route.ts)
// Recursive CTE traces callers up to 10 levels

WITH RECURSIVE call_tree AS (
  SELECT callee_symbol_id, 1 as depth
  FROM code_call_paths 
  WHERE caller_symbol_id = $1
  UNION ALL
  SELECT cp.callee_symbol_id, ct.depth + 1
  FROM code_call_paths cp
  JOIN call_tree ct ON cp.caller_symbol_id = ct.callee_symbol_id
  WHERE ct.depth < 10
)
SELECT COUNT(DISTINCT callee_symbol_id) as blast_radius FROM call_tree;

Benchmarks:

Well-isolated: Average blast radius < 10 files
Moderately coupled: 10-30 files
Tightly coupled: > 30 files (changes ripple everywhere)

5. Hotspot Score

Which files are most problematic?

// Our god object detection (healthInsights.ts:134)
function detectGodObjects(files: FileMetrics[]) {
  return files.filter(f => 
    f.change_count >= 80 &&   // Changes frequently
    f.line_count >= 1000 &&   // Very large
    f.symbol_count >= 17      // Many responsibilities
  );
}

Track the count of hotspots over time. Going up = accumulating debt.

The Dashboard That Works

Instead of 47 metrics, track these:

| Metric | What It Tells You | Action | |--------|-------------------|--------| | Change Failure Rate | Deployment quality | Improve testing, rollback | | Lead Time | Process efficiency | Remove bottlenecks | | High-Churn Files | Architectural issues | Refactor or split | | Hotspot Count | Tech debt accumulation | Prioritize cleanup | | Knowledge Silos | Bus factor risk | Cross-training |

Five metrics. All actionable.

How to Track Them

Automated Collection

// We collect this automatically from:
// 1. Git history → churn, contributors
// 2. Code analysis → symbols, complexity, dependencies
// 3. Call graph → blast radius
// 4. CI/CD → lead time, failure rate

interface WeeklyReport {
  changeFailureRate: number;
  avgLeadTimeDays: number;
  newHotspots: string[];
  resolvedHotspots: string[];
  highChurnFiles: FileMetrics[];
  knowledgeSilos: ModuleOwnership[];
}

Manual Review (Weekly)

Did any deployments cause issues?
What's our lead time trend?
Any new hotspots? Any resolved?
Which files churned most?

Quarterly Assessment

Are hotspots increasing or decreasing?
Are knowledge silos getting worse?
What does the call graph look like?
Where should we invest refactoring time?

Avoiding Gaming

Any metric can be gamed. Here's how to prevent it:

Coverage gaming: "We hit 80%!"

Measure mutation testing score instead
Track what percentage of bugs were in tested code

Velocity gaming: "We did 50 points!"

Measure customer outcomes instead
Track bugs shipped, not features shipped

Commit gaming: "I committed 10 times today!"

Don't track commits at all
Track merged PRs or deployed changes

The best metrics are ones that are hard to game because they measure outcomes, not activity.

The Integration Point

Metrics become powerful when connected to code reality:

// Bad: "test coverage is 75%"
// Good: "test coverage is 75%, but these 5 hotspot files have 40%"

const insights = {
  coverage: {
    overall: 0.75,
    hotspotCoverage: 0.40,
    criticalPaths: [
      { path: 'src/payments/processPayment.ts', coverage: 0.35 },
      { path: 'src/auth/validateToken.ts', coverage: 0.42 }
    ]
  }
};

Context transforms metrics from numbers to decisions.

The Bottom Line

Stop collecting metrics nobody uses.

Start with five:

Change Failure Rate — Are we breaking prod?
Lead Time — How fast do we ship?
Churn Rate — Where are the problems?
Blast Radius — How coupled is our code?
Hotspot Count — Is debt growing?

Track these weekly. Review trends monthly. Make architectural decisions quarterly.

Everything else is noise.

Code Metrics: What to Track and Why

The Vanity Metrics Trap

The Metrics That Matter

1. Change Failure Rate

2. Lead Time for Changes

3. Code Churn Rate

4. Blast Radius per Change

5. Hotspot Score

The Dashboard That Works

How to Track Them

Automated Collection

Manual Review (Weekly)

Quarterly Assessment

Avoiding Gaming

The Integration Point

The Bottom Line

Related Posts

Future of Software Engineering: AI-First Development

Code Quality Measurement: Metrics, Tools & Benchmarks

Code Refactoring Tools: When to Automate vs Manual