Your Bus Factor Is Probably 1 (And Other Lies We Tell Ourselves About Knowledge Transfer)

# Bad: Only Sarah touches the billing code
class PaymentProcessor:
    def process_payment(self, amount, customer_id):
        # Sarah's magic lives here
        # TODO: Document this mysterious retry logic
        pass

# Better: Multiple people have touched this code
class PaymentProcessor:
    def process_payment(self, amount, customer_id):
        """
        Process payment with exponential backoff retry.
        
        Note: The 47-second initial delay is intentional.
        See incident #2847 - shorter delays triggered
        upstream rate limiting in the banking API.
        
        Last modified by: Sarah (v1.0), Mike (v1.1), Alex (v1.2)
        """
        pass

# Useless comment
def calculate_retry_delay(attempt):
    return min(300, (2 ** attempt) + random.uniform(0, 1))

# Useful context
def calculate_retry_delay(attempt):
    """
    Exponential backoff with jitter for payment retries.
    
    Max delay is 300s (5min) because our payment provider
    automatically cancels pending requests after 6 minutes.
    
    Jitter prevents thundering herd when multiple instances
    restart simultaneously (learned this during the Q3 2023
    outage when all workers synchronized their retries).
    """
    return min(300, (2 ** attempt) + random.uniform(0, 1))

# ADR-0023: Why We Use Redis for Session Storage

## Problem
Our session storage was hitting scaling limits. Users getting 
logged out randomly during peak traffic.

## Options Considered
1. Sticky sessions with in-memory storage
2. Database-backed sessions  
3. Redis cluster

## Decision
Redis cluster. Database was too slow (40ms avg vs 2ms).
Sticky sessions break when instances restart.

## Tradeoffs
- Added complexity (another service to monitor)
- Eventual consistency issues during Redis failover
- But: 95% reduction in session-related support tickets

# billing/payment_processor.py

"""
SHADOW DOCS - What we've figured out about this module:

1. The retry_count field in payments table is NOT the actual retry count.
   It's some kind of composite score. Real retry count is in logs.
   (Discovered during March incident - see Slack thread #payments-hell)

2. process_refund() has a race condition with process_payment() when
   called within 30 seconds. Workaround: check payment.status first.
   (Found this by breaking production twice - don't ask)

3. The "test mode" flag doesn't actually prevent real charges.
   It just changes which email gets the receipt.
   (Sarah apparently set this up for demo purposes in 2019???)
"""

Your Bus Factor Is Probably 1 (And Other Lies We Tell Ourselves About Knowledge Transfer)

The Documentation Delusion

Related Posts

Code Metrics: What to Track and Why

Why Knowledge Transfer Always Fails

What Actually Works: The Gradual Transition

1. Rotation, Not Ownership

2. Context in Code, Not Documents

3. Decision Records, Not Architecture Docs

4. Pairing, Not Solo Work

The Nuclear Option: Shadow Documentation

Metrics That Actually Matter

The Uncomfortable Truth

Future of Software Engineering: AI-First Development

Best GPT for Coding: Comparing AI Code Assistants