Codebase Explorer Guru

A codebase exploration specialist that rapidly builds a complete mental model of unfamiliar codebases and presents clear, actionable summaries through a structured six-phase discovery process.

When to Use This Agent

Choose Codebase Explorer Guru when:

Onboarding to a new project and need to understand its structure quickly
Evaluating an open-source library's architecture before adopting it
Conducting a codebase audit for acquisition or technical due diligence
Mapping dependencies, data flows, and integration points in an unfamiliar system
Creating documentation for a codebase that has little or none

Consider alternatives when:

Looking for a specific function or class (use grep/glob directly)
Understanding one specific feature's implementation (use a code reader agent)
Making changes to code (use a development agent after exploring)

Quick Start


# .claude/agents/codebase-explorer-guru.yml
name: Codebase Explorer Guru
description: Explore and document unfamiliar codebases
model: claude-sonnet
tools:
  - Read
  - Glob
  - Grep
  - Bash

Example invocation:

claude "Explore the codebase in src/ and create a comprehensive map of the architecture, key modules, data flow, and integration points"

Core Concepts

Six-Phase Discovery Process

Phase	Focus	Key Actions
1. Project Discovery	What is this?	Read package.json, README, configs
2. Architecture Mapping	How is it structured?	Scan directories, identify layers
3. Dependency Analysis	What does it depend on?	Parse lockfiles, map internal imports
4. Data Flow Tracing	How does data move?	Follow request paths, trace state
5. Integration Points	What does it connect to?	Find API calls, DB connections, queues
6. Quality Assessment	How healthy is it?	Check tests, types, linting, patterns

Phase 1: Project Discovery


# Files to read first (in priority order)
1. package.json / pyproject.toml / go.mod    # Dependencies + scripts
2. README.md                                  # Intent and setup
3. .env.example / .env.template               # External dependencies
4. docker-compose.yml                         # Service topology
5. tsconfig.json / webpack.config.js          # Build configuration
6. .github/workflows/*.yml                    # CI/CD pipeline

Architecture Map Output


## Architecture Summary: E-Commerce Platform

### Tech Stack
- Runtime: Node.js 20 + TypeScript 5.3
- Framework: Next.js 14 (App Router)
- Database: PostgreSQL 16 via Drizzle ORM
- Cache: Redis 7 (sessions + rate limiting)
- Queue: BullMQ (order processing, emails)
- Auth: NextAuth.js v5 (Google, email)

### Directory Structure
src/
  app/           → Next.js App Router pages and layouts
  components/    → React components (154 files)
    ui/          → Shared UI primitives (Button, Modal, etc.)
    features/    → Feature-specific components
  lib/           → Core utilities and configurations
    db/          → Drizzle schema, migrations, queries
    auth/        → Authentication configuration
    stripe/      → Payment integration
  server/        → Server-side logic
    actions/     → Server actions (mutations)
    api/         → API route handlers
  types/         → TypeScript type definitions

### Data Flow: Order Placement
User → Checkout Page → createOrder (server action)
  → Validate cart items (DB query)
  → Create Stripe PaymentIntent
  → Insert order record (DB)
  → Enqueue order.created event (BullMQ)
  → Worker: send confirmation email
  → Worker: update inventory counts

Configuration

Parameter	Description	Default
`depth`	Exploration depth (quick, standard, deep)	`standard`
`focus`	Specific area to explore (backend, frontend, infra)	All
`output_format`	Report format (markdown, json, diagram)	`markdown`
`include_metrics`	Include code metrics (LOC, complexity, coverage)	`true`
`dependency_depth`	How deep to trace dependencies	`2`
`ignore_patterns`	Directories/files to skip	`["node_modules", "dist"]`

Best Practices

Follow the entry point chain to understand request flow. Start from the top-level entry point (main.ts, app.tsx, index.py) and trace how requests flow through middleware, routers, controllers, services, and data access layers. This reveals the actual architecture, which often differs from the directory structure's implied architecture. Document the flow as you trace it.
Map the dependency graph before reading implementation details. Use import/require statements to build a module dependency graph. Identify high-fan-in modules (many things depend on them — they are core abstractions) and high-fan-out modules (they depend on many things — they are orchestrators). This tells you where to start reading and which modules are most critical to understand.
Check test files to understand intended behavior. Test files often contain the clearest documentation of what a module does, what inputs it accepts, and what edge cases the developers considered. Read test files alongside source files, especially for complex business logic. The test descriptions serve as a specification document that is guaranteed to be up-to-date if the tests pass.
Look for patterns in the first three files of each directory. Most codebases have consistent internal patterns — how services are structured, how errors are handled, how data is validated. Read three representative files from a directory to extract the pattern, then skim the rest to verify consistency. This is far faster than reading every file and captures the same structural understanding.
Document as you explore rather than exploring first and documenting later. Write findings immediately into a structured document as you discover them. This forces you to articulate your understanding, reveals gaps in your mental model, and produces a deliverable artifact as a natural byproduct of exploration. Waiting until after exploration is complete often results in incomplete or disorganized documentation.

Common Issues

Getting lost in large codebases with thousands of files. Start with the build configuration and entry points, not the file tree. A package.json main or scripts.start field tells you where execution begins. Trace from there. Use file modification dates to identify actively developed areas versus legacy code. Ignore test files and generated code during the initial structural scan — they add volume without architectural insight.

Misidentifying the architecture pattern from directory names alone. A directory named controllers/ does not guarantee MVC architecture. A services/ directory might contain god classes that do everything. Verify patterns by reading the actual code. Check whether "controllers" truly only handle HTTP concerns and delegate to services, or whether they contain business logic. Report the actual architecture, not the aspired one.

Missing hidden dependencies and side effects. Not all dependencies are visible in import statements. Dynamic requires, dependency injection containers, environment variable lookups, and runtime plugins create invisible connections. Search for process.env, require() with variable arguments, and DI container registrations. Check initialization code and middleware chains for implicit dependencies that the import graph does not reveal.

⚠️ Loading Issue

Codebase Explorer Guru

Codebase Explorer Guru

When to Use This Agent

Quick Start

Core Concepts

Six-Phase Discovery Process

Phase 1: Project Discovery

Architecture Map Output

Configuration

Best Practices

Common Issues

Reviews

Write a review

Similar Templates

API Endpoint Builder

Documentation Auto-Generator

Ai Ethics Advisor Partner