Building Your First AI Agent with CrewAI: A Practical Guide
Most AI coding tools are just fancy autocomplete. You ask Claude to write a function, it hallucinates some code, you fix the imports, rinse and repeat. CrewAI is different. It lets you build agents that work together, have actual context about your system, and execute multi-step tasks without you babysitting them.
But here's the problem: agents are only as smart as the context you give them. Point CrewAI at your codebase blind and you'll get the same generic nonsense as any other LLM. The real power comes when agents understand your architecture, your patterns, your team's conventions.
Let me show you how to build something useful.
What Makes CrewAI Different
CrewAI is a framework for orchestrating multiple AI agents that collaborate on tasks. Think of it as a team of specialists instead of one overwhelmed generalist.
You define agents with specific roles—a researcher, a coder, a reviewer. Each agent has tools, a backstory, and goals. They communicate, delegate work, and produce results that are better than what any single LLM call could achieve.
The key concepts:
Agents: Autonomous entities with specific roles and capabilities
Tasks: Discrete work items with clear success criteria
Tools: Functions agents can call (search files, run tests, query APIs)
Crews: Orchestrated groups of agents working toward a goal
Here's a minimal example:
from crewai import Agent, Task, Crew
code_reviewer = Agent(
role="Senior Code Reviewer",
goal="Identify bugs, security issues, and code smells",
backstory="You've reviewed thousands of PRs and have strong opinions about clean code.",
tools=[read_file, search_codebase],
verbose=True
)
review_task = Task(
description="Review the authentication module for security vulnerabilities",
agent=code_reviewer,
expected_output="List of issues with severity ratings and fix suggestions"
)
crew = Crew(
agents=[code_reviewer],
tasks=[review_task]
)
result = crew.kickoff()
This works. Sort of. The agent will read files and make suggestions. But it doesn't know which authentication module you're talking about, what framework you use, or what your team considers a "security vulnerability" versus an acceptable tradeoff.
The Context Problem
I've seen teams spend weeks building elaborate agent systems only to discover they produce generic advice you could get from Stack Overflow. The agent doesn't know your codebase is using JWT tokens with a 7-day expiration because that's what your compliance team mandated. It doesn't know the AuthService class has been flagged for refactoring because three different teams keep modifying it.
This is where code intelligence platforms like Glue make a difference. Instead of having agents blindly search through files, they can query a knowledge graph of your actual codebase—features, ownership, churn hotspots, documentation gaps. Your agents start with real context, not a clean slate.
Building a Feature Gap Agent
Let's build something practical: an agent that analyzes your codebase for feature inconsistencies. If you've implemented pagination in three services but not the fourth, it should catch that. If every API endpoint has rate limiting except one forgotten route, it should flag it.
First, define your tools:
from crewai_tools import tool
import requests
@tool("Search Features")
def search_features(query: str) -> str:
"""
Searches the codebase for features matching the query.
Returns feature metadata, locations, and implementation status.
"""
# In production, this would query Glue's feature API
response = requests.post(
"https://api.glue.tools/features/search",
json={"query": query, "repo": "your-org/your-repo"}
)
return response.json()
@tool("Analyze Code Health")
def analyze_code_health(file_path: str) -> str:
"""
Returns complexity, churn rate, and ownership data for a file.
"""
response = requests.get(
f"https://api.glue.tools/health/{file_path}",
params={"repo": "your-org/your-repo"}
)
return response.json()
@tool("List Services")
def list_services() -> str:
"""
Returns all services in the codebase with their dependencies.
"""
response = requests.get(
"https://api.glue.tools/services",
params={"repo": "your-org/your-repo"}
)
return response.json()
Now create specialized agents:
feature_researcher = Agent(
role="Feature Analyst",
goal="Identify patterns and gaps in feature implementation across services",
backstory="""You've worked on distributed systems for years.
You know that consistency matters—if nine services handle auth one way
and one does it differently, that's where the bugs hide.""",
tools=[search_features, list_services],
verbose=True
)
code_health_analyst = Agent(
role="Technical Debt Specialist",
goal="Assess whether feature gaps correlate with code health issues",
backstory="""You connect the dots between high churn, complexity,
and missing features. Often the service with the gap is the one
everyone's afraid to touch.""",
tools=[analyze_code_health, search_features],
verbose=True
)
report_writer = Agent(
role="Engineering Lead",
goal="Synthesize findings into actionable recommendations",
backstory="""You translate technical findings into clear priorities.
You know when to file a ticket versus when to schedule a team discussion.""",
tools=[],
verbose=True
)
Define the workflow:
research_task = Task(
description="""Identify all services that implement authentication.
For each service, document the auth mechanism used.
Flag any services that lack authentication entirely.""",
agent=feature_researcher,
expected_output="Structured report of auth implementation by service"
)
health_task = Task(
description="""For services flagged with missing or inconsistent auth,
analyze their code health metrics. Are they high churn? Complex?
Who owns them? Is the gap intentional or accidental neglect?""",
agent=code_health_analyst,
expected_output="Health assessment with ownership and risk ratings",
context=[research_task] # This task depends on research_task output
)
recommendation_task = Task(
description="""Based on feature gaps and health analysis,
create prioritized recommendations. Which gaps are high risk?
Which teams should address them? What's the estimated effort?""",
agent=report_writer,
expected_output="Prioritized action items with team assignments",
context=[research_task, health_task]
)
crew = Crew(
agents=[feature_researcher, code_health_analyst, report_writer],
tasks=[research_task, health_task, recommendation_task],
verbose=True
)
Run it:
result = crew.kickoff()
print(result)
What you get is a multi-page analysis that would take a human engineer hours of manual code spelunking. The agent knows which services exist, which have auth, how each implements it, where the code smells bad, and who should fix it.
Making Agents Actually Useful
Here's what I've learned building these systems in production:
1. Constrain the scope brutally
Don't build an agent that "improves code quality." Build one that "checks if API responses include trace IDs for debugging." Narrow goals produce useful output. Broad goals produce essays.
2. Tools are everything
Your agent is only as good as its tools. Generic tools like "read file" and "run command" force the agent to fumble around. Specialized tools that query structured data (like Glue's feature index or code health APIs) let agents make informed decisions immediately.
3. Context is more important than prompts
You can spend days prompt-engineering perfect instructions. Or you can give the agent rich context about your actual codebase and mediocre prompts will work fine. When an agent knows the PaymentService has 47 changes this month and is owned by the checkout team, it makes better decisions than when it's flying blind.
4. Agents should delegate, not hallucinate
The worst AI agent systems are ones where a single agent tries to do everything. It searches, analyzes, codes, tests, and formats—and probably hallucinates at each step. Better to have a researcher agent that hands off to a coder agent that hands off to a reviewer agent. Each has a narrow job and does it well.
5. Humans in the loop matter
Even the best agent should propose, not execute. Have it generate a PR diff and Slack you the link. Have it create tickets, not merge code. The value is in automating the tedious research and analysis, not in removing human judgment entirely.
Real-World Workflow
Here's how we use this feature gap agent on our team:
Every Monday morning, the agent runs against main. It checks for common patterns—auth, logging, error handling, rate limiting, observability. It produces a report in our #engineering Slack channel with the top five gaps ranked by risk.
Last month it caught that our new ReportingService wasn't emitting metrics like every other service. That service was being built by a team in a different timezone. They had no idea our convention was to use Prometheus with specific label patterns. The agent flagged it before it hit production.
The agent doesn't file tickets automatically. It just surfaces the gaps. Our tech lead reviews the report, filters out false positives (sometimes a service intentionally doesn't have a feature), and assigns work. Takes 20 minutes instead of hours of manual auditing.
Getting Started
Start small. Pick one repetitive task you do weekly—checking for outdated dependencies, validating API documentation, finding tests with no assertions, whatever.
Build a single agent with 2-3 good tools. Run it manually. Iterate on the prompt until it produces something useful. Then add a second agent to validate the first agent's work.
If you're building these tools from scratch, you'll spend most of your time indexing code and extracting structure. That's where platforms like Glue help—they've already done the hard work of parsing your codebase, mapping ownership, tracking changes, and discovering features. You just wire up the agents.
CrewAI handles the orchestration. Your job is to give it the right context and tools to do something your team actually needs.
Most AI agent demos are solving fake problems. "Look, it can write a sorting algorithm!" Cool. You know what's harder? Making sure all 47 microservices in your platform follow the same authentication pattern. That's the kind of problem agents should solve.