Your engineering team wants AI agents. Your security team wants to sleep at night. Here's how to give them both what they want.
AI coding agents — Copilot, Cursor, Claude Code, Glue — need access to your code to be useful. That code contains business logic, API keys (hopefully not, but often yes), infrastructure patterns, and competitive advantages. The question isn't whether to use AI agents. It's how to use them without handing your intellectual property to a training pipeline.
The Threat Model
Before you lock anything down, understand what you're actually protecting against:
1. Training Data Exposure
Will your code be used to train the AI model? This is the headline risk, but it's also the most manageable. Most enterprise AI tools now offer zero-retention policies. Copilot for Business, Claude API, and Cursor's privacy mode all contractually guarantee your code isn't used for training.
Action: Read the data processing agreement. If it doesn't explicitly exclude training, assume it doesn't.
2. Context Window Leakage
AI agents send code snippets to remote APIs for processing. Those snippets pass through network infrastructure, load balancers, and potentially logging systems. Even with zero-retention, your code exists momentarily in third-party memory.
Action: Classify which code is acceptable to send to external APIs and which isn't. Not all code is equally sensitive.
3. Prompt Injection and Exfiltration
A malicious dependency or compromised file could contain hidden instructions that cause AI agents to exfiltrate data through their responses or actions.
Action: Review AI agent permissions. An agent that can read your codebase AND make network requests is a potential exfiltration vector.
4. Over-Permissioned Access
AI agents often request broad repository access to be maximally useful. But does the documentation agent really need access to your infrastructure-as-code repo? Does the code completion tool need to read your .env files?
Action: Apply least-privilege principles to AI agent access, just like you would for human team members.
The Data Classification Framework
Not all code needs the same protection level. Classify your repositories:
Tier 1: Unrestricted
Open-source code
Public documentation
Generic utility libraries
Test fixtures with synthetic data
AI policy: Full access for all AI tools. No restrictions needed.
Tier 2: Standard
Application business logic
Internal APIs and services
Frontend code
Non-sensitive configuration
AI policy: AI agents with enterprise agreements (zero-retention, SOC2). Code can be sent to external APIs for processing.
Tier 3: Sensitive
Authentication and authorization systems
Payment processing logic
PII handling code
Infrastructure configuration
AI policy: AI agents that process locally or with enhanced data agreements only. Consider self-hosted models for this tier.
Tier 4: Restricted
Encryption key management
Security vulnerability details
Penetration testing results
Compliance-critical systems
AI policy: No external AI processing. Self-hosted models only, or no AI assistance at all.
Route all AI agent traffic through a corporate proxy. This gives you:
Visibility into what code is being sent where
Ability to block requests containing sensitive patterns (API keys, PII)
Audit trail for compliance
2. Repository-Level Access Controls
Don't give AI agents blanket access. Configure per-repository:
.aiignore files — like .gitignore but for AI tools. Exclude sensitive files from AI context.
Repository-level permissions — only grant AI agents access to repos matching their tier policy.
Branch restrictions — AI agents on main/production branches only, not feature branches with experimental code.
3. Secret Scanning
AI agents will inevitably encounter secrets in code. Layer your defenses:
Pre-commit hooks — catch secrets before they're committed (git-secrets, detect-secrets)
AI context filters — strip environment variables and known secret patterns before sending to AI APIs
Runtime monitoring — alert when AI agent responses contain patterns that look like secrets
4. Audit and Monitoring
You can't secure what you can't see. Log:
Every request from AI agents to external APIs (endpoint, payload size, response time)
Which files and repositories AI agents access
Which developers are using which AI tools on which repos
Any anomalous patterns (sudden spike in external API calls, access to unusual repos)
The Compliance Angle
SOC2
If you're SOC2 compliant, AI agent usage falls under your existing access control and data handling policies. Document:
Which AI tools are approved
What data classification tiers they can access
How audit logs are maintained
GDPR / CCPA
If your code processes PII, AI agents that send code to external APIs may constitute a data transfer. Ensure your AI vendor's DPA covers this.
HIPAA
Healthcare companies: most AI coding tools are NOT HIPAA-compliant by default. You need a BAA (Business Associate Agreement) with the vendor, or use self-hosted models.
What Glue Does Differently
Most AI coding tools need to send your code to external APIs for every interaction. Glue's approach:
Index locally, query remotely. Your codebase is indexed and the knowledge graph is built. Only structured queries (not raw code) are sent to AI for natural language processing.
No code in prompts. When you ask Glue a question, it retrieves relevant context from the local index and sends summarized, structured data — not raw source files.
Audit trail built in. Every query, every retrieval, every AI interaction is logged with the user, timestamp, and scope.
The Bottom Line
Protecting company data when using AI agents is not about saying no to AI. It's about saying yes with guardrails:
Classify your code by sensitivity
Match AI tools to appropriate tiers
Route through proxies for visibility
Use .aiignore for granular exclusion
Audit everything
Review quarterly as AI tools evolve
The companies that figure this out first get the productivity benefits of AI agents without the security incidents. The ones that don't either ban AI tools entirely (and fall behind) or adopt them recklessly (and regret it).
Keep Reading
Security concerns often mask a deeper issue: AI tools need code access to be useful, but most send raw code to external APIs. The real question is how to get pre-code intelligence without exposing your intellectual property.
Glue takes a privacy-first approach to pre-code intelligence — indexing locally and sending only structured queries, not raw code, to AI for natural language processing.