Most M&A tech due diligence is theater.
You get a data room full of PowerPoints claiming "robust scalable architecture" and "comprehensive feature set." The engineering team spends three months digging through code trying to figure out what the hell this thing actually does. By the time you have answers, the deal's either dead or someone else bought it.
We fixed this. In three days, not three months.
The Problem Nobody Talks About
The real bottleneck in tech M&A isn't financial DD or legal review — it's answering the basic question: "What does this product actually do?"
Sounds simple. It's not.
In my experience, startup teams can't even list their own features accurately (try it — ask your PM to inventory everything your product does. Watch them sweat). Now imagine doing this for a competitor's 500k-line codebase with no tribal knowledge.
The traditional approach is brutal:
- Send junior devs to read through repositories
- Schedule "architecture walkthrough" calls (where they lie)
- Build spreadsheets of "discovered capabilities"
- Three months later: "We think they have payments, but we're not sure if it handles subscriptions"
Meanwhile, your competition just signed the LOI.
What We Built Instead
We built Boostr to solve our own M&A problem. The core insight: codebases don't lie. Marketing decks do.
Here's how it works in practice:
Step 1: Automatic Feature Discovery (30 minutes)
Point our system at the target's GitHub repos. We index everything:
// Real code from our indexer
const analyzeCodebase = async (repositories: Repository[]) => {
const features = await Promise.all(
repositories.map(async (repo) => {
// Extract all symbols, API routes, database schemas
const symbols = await extractSymbols(repo);
const routes = await discoverWebRoutes(repo);
const dbSchema = await analyzeDatabase(repo);
// Graph-based clustering to find feature boundaries
return clusterIntoFeatures({
symbols,
routes,
database: dbSchema,
callGraph: await buildCallGraph(symbols)
});
})
);
return features.flat();
};
The algorithm is clever: we build a graph where files are nodes and method calls are edges. Add API-based connections (frontend calling backend endpoints). Then run community detection to find natural feature clusters.
Output: 15-25 concrete features with all related code, routes, and database tables. No guessing.
Step 2: Competitive Gap Analysis (2 hours)
Now the magic happens. We know what the target has. We know what competitors have (from our database). AI does the gap analysis:
const analyzeCompetitiveGaps = async (targetFeatures: Feature[], competitors: string[]) => {
const gaps = [];
for (const competitor of competitors) {
const competitorFeatures = await researachCompetitor(competitor);
for (const competitorFeature of competitorFeatures) {
const coverage = calculateFeatureCoverage(targetFeatures, competitorFeature);
if (coverage < 0.6) { // Less than 60% coverage = gap
gaps.push({
feature: competitorFeature.name,
coverage: coverage,
revenueImpact: await calculateRevenueImpact(competitorFeature),
evidence: competitorFeature.sources,
recommendation: generateRecommendation(coverage)
});
}
}
}
return gaps.sort((a, b) => b.revenueImpact - a.revenueImpact);
};
AI crawls competitor changelogs, pricing pages, API docs. It finds launch dates, monetization proof, customer demand signals. Outputs gaps ranked by revenue impact (0-100 score).
Step 3: Technical Debt Assessment (automated)
While analyzing features, we're also scoring technical quality:
- Architecture patterns: Are they using modern frameworks or jQuery spaghetti?
- Database design: Normalized schemas or NoSQL chaos?
- API consistency: RESTful design or random endpoints?
- Test coverage: Do they actually test anything?
- Security patterns: Input validation, auth, encryption
Real example from a recent acquisition target:
{
"technicalDebt": {
"framework": "Legacy Spring 2.x (7 years out of date)",
"database": "MySQL with 847-column user table",
"security": "Passwords stored in plaintext",
"testing": "3% code coverage",
"recommendation": "Complete rewrite required - $2M+ engineering cost"
}
}
That killed the deal. Saved us $50M.
The Real Implementation Details
The hardest part wasn't the AI (Claude handles that fine). It was building reliable code indexing across multiple languages.
We needed microservices because each language ecosystem is its own mess:
Java/Kotlin indexer (separate service):
- Eclipse JDT for parsing
- JavaParser for method bodies
- Spring Boot route detection
- Dependency injection analysis
TypeScript/Node indexer:
- TypeScript Compiler API
- Next.js route discovery
- React component analysis
- Package.json dependency parsing
Database schema discovery:
- Connect to production DBs (with permission)
- Extract table structures, relationships
- Identify data patterns and volumes
Each indexer runs as a Google Cloud Run service. Isolated, scalable, language-optimized.
The frontend polls via Server-Sent Events for real-time progress:
// Real SSE implementation
const streamIndexingProgress = (workspaceId: string) => {
const eventSource = new EventSource(`/api/workspaces/${workspaceId}/indexing/stream`);
eventSource.onmessage = (event) => {
const progress = JSON.parse(event.data);
updateUI({
filesIndexed: progress.filesIndexed,
symbolsExtracted: progress.symbolsExtracted,
progressPercent: progress.progressPercent
});
};
};
Executives love watching the numbers climb in real-time. Makes the process feel magical instead of mysterious.
The Model Context Protocol Advantage
The breakthrough was using Anthropic's Model Context Protocol (MCP). We built 60+ specialized tools that Claude can call:
// Some of our MCP tools
const mcpTools = {
search_symbols: (query: string) => findCodeSymbols(query),
get_call_graph: (symbolId: string) => buildExecutionPath(symbolId),
find_api_endpoints: (workspaceId: string) => listAllRoutes(workspaceId),
analyze_database_schema: () => getSchemaStructure(),
search_features: (query: string) => findDiscoveredFeatures(query),
get_competitive_gaps: () => fetchGapAnalysis()
};
This means Claude understands code structure, not just text. It can answer questions like:
- "How does their payment flow work?" (traces execution paths)
- "What APIs are exposed?" (lists all endpoints with parameters)
- "How complex is their auth system?" (analyzes all auth-related code)
The AI generates executive summaries that are technically accurate because they're based on code analysis, not speculation.
What This Actually Costs
Traditional DD: 3 months * $200k/month (senior dev team) = $600k per deal
Our approach: 3 days * $5k (compute + AI tokens) = $15k per deal
That's a 40x cost reduction. More importantly, we can evaluate 20 targets in the time it used to take for one.
But the real win isn't cost — it's speed and accuracy. We killed three deals in the last year based on technical findings that wouldn't have surfaced until post-acquisition. Saved probably $100M+ in bad acquisitions.
The Architecture Reality Check
This wasn't built in a weekend. The core system is 80+ database tables, 500+ API endpoints, multiple microservices. Real engineering.
But it works. Last month we analyzed a 2M-line fintech codebase in 6 hours. Found 23 features, identified 8 major security issues, and discovered they had duplicate payment processing (built two different systems over 3 years).
The acquisition team used our report to negotiate the price down $12M. Our annual license costs $50k.
Actually, that's not quite right — we don't license this. We use it internally for our own deals and consulting work. The competitive advantage is too valuable to sell.
The Uncomfortable Truth
Most startups think their code is an asset. From M&A perspective, it's usually a liability.
We've seen:
- 500k-line React app that could be replaced by a 50k-line Next.js rewrite
- "AI company" with hardcoded if-statements instead of ML models
- "Scalable microservices" that were just a monolith split randomly across Docker containers
The founders genuinely believe their technical stories. The code tells a different story.
Our system removes the emotion and politics. Shows you exactly what you're buying, what it's worth, and what it'll cost to maintain.
That's worth $50M saved on a bad deal. Every time.