Performance Profiler Agent
All-in-one agent covering performance, analysis, optimization, specialist. Includes structured workflows, validation checks, and reusable patterns for development tools.
Performance Profiler Agent
A comprehensive performance analysis agent that profiles applications across all technology stacks, providing deep analysis of CPU usage, memory allocation, I/O patterns, and response time breakdowns to identify and resolve performance bottlenecks.
When to Use This Agent
Choose Performance Profiler Agent when:
- Applications exhibit unexplained slowness or latency spikes
- Memory usage grows over time (potential memory leaks)
- CPU utilization is high without corresponding throughput
- Need to profile specific code paths or request flows
- Comparing performance between code versions or deployments
Consider alternatives when:
- Running load tests at scale (use a performance engineering agent)
- Optimizing database queries specifically (use a DBA agent)
- Setting up performance monitoring infrastructure (use a DevOps agent)
Quick Start
# .claude/agents/performance-profiler-agent.yml name: Performance Profiler Agent description: Profile and analyze application performance model: claude-sonnet tools: - Read - Bash - Glob - Grep - Edit
Example invocation:
claude "Profile the /api/search endpoint β it's responding in 3 seconds when it should be under 200ms. Find where the time is being spent"
Core Concepts
Profiling Toolkit by Language
| Language | CPU Profiler | Memory Profiler | Flame Graph |
|---|---|---|---|
| Node.js | --prof, clinic.js | --heap-prof, Chrome DevTools | 0x, clinic flame |
| Python | cProfile, py-spy | tracemalloc, memory_profiler | py-spy, snakeviz |
| Go | pprof (runtime/pprof) | pprof heap profile | go tool pprof |
| Java | JFR, async-profiler | JFR, VisualVM | async-profiler |
| Rust | perf, flamegraph | DHAT, heaptrack | cargo-flamegraph |
Node.js Profiling Workflow
# CPU profiling with clinic.js npx clinic doctor -- node server.js # Generates a report showing event loop delays, CPU usage, memory # Flame graph generation npx clinic flame -- node server.js # Run load while the profiler captures, then view the flame graph # Heap snapshot for memory leaks node --inspect server.js # Open chrome://inspect, take heap snapshots before and after load # Compare snapshots to find objects that were not garbage collected # Built-in profiling node --prof server.js # Run load, then process the output: node --prof-process isolate-*.log > profile.txt
Request Latency Breakdown
// Middleware to break down request time by phase import { performance } from 'perf_hooks'; function requestProfiler(req, res, next) { const timings: Record<string, number> = {}; const start = performance.now(); // Track middleware phase req.on('middleware-complete', () => { timings.middleware = performance.now() - start; }); // Track database phase const origQuery = db.query.bind(db); let dbTime = 0; db.query = async (...args) => { const queryStart = performance.now(); const result = await origQuery(...args); dbTime += performance.now() - queryStart; return result; }; // Track response res.on('finish', () => { timings.total = performance.now() - start; timings.database = dbTime; timings.application = timings.total - timings.database - (timings.middleware || 0); console.log(JSON.stringify({ method: req.method, path: req.path, statusCode: res.statusCode, timings, })); }); next(); }
Configuration
| Parameter | Description | Default |
|---|---|---|
profiler_type | Profiling focus (cpu, memory, io, all) | all |
language | Application language/runtime | Auto-detect |
duration | Profiling capture duration | 30s |
sample_rate | CPU sampling frequency (Hz) | 99 |
output_format | Profile output (flamegraph, text, json) | flamegraph |
compare_baseline | Compare against baseline profile | false |
Best Practices
-
Profile in an environment that matches production. Profiling on a developer laptop with 32GB RAM and an SSD gives different results than production with 4GB RAM and network-attached storage. Use staging environments with production-equivalent hardware and realistic data volumes. At minimum, ensure the same CPU architecture, memory limits, and I/O subsystem are present.
-
Capture profiles under load, not at idle. A profile of an idle application shows framework overhead and initialization, not the actual bottleneck. Generate realistic load (using k6, ab, or production traffic replay) while the profiler captures. Aim for profiles that capture at least 30 seconds of sustained load to smooth out one-time initialization costs and show steady-state behavior.
-
Read flame graphs from the bottom up. The widest bars at the bottom of a flame graph represent functions that consume the most total CPU time (including their children). Look for wide bars at the top β these are "hot" leaf functions doing actual work. Narrow towers represent deep call stacks. Wide plateaus represent functions spending time in a single child β investigate that child.
-
Take multiple heap snapshots at different intervals to detect memory leaks. A single heap snapshot shows current memory state but does not reveal trends. Take snapshots at T+0, T+5 minutes, and T+30 minutes under load. Objects whose count grows between snapshots without shrinking are likely leaks. Compare snapshots using Chrome DevTools' "Comparison" view to highlight growing allocations.
-
Profile both hot paths and cold paths. The first request to a service (cold start, empty caches, JIT not yet warm) can be 10x slower than subsequent requests. Profile both scenarios. Cold path optimization matters for serverless functions, user-facing first-load experiences, and services that restart frequently. Hot path optimization matters for sustained throughput.
Common Issues
Profiling overhead distorts the performance characteristics. Detailed profilers can add 2-10x overhead, making a 100ms request appear as 500ms and changing the relative cost of operations. Use low-overhead sampling profilers (py-spy, async-profiler, 0x) for production-like profiling. Reserve instrumentation-based profilers (cProfile, clinic doctor) for development environments where accuracy of relative costs matters more than absolute timing.
Memory leak investigations show memory growing but all objects appear reachable. Not all memory growth is a leak β some is expected caching, buffering, or data structure growth. Compare the heap against expected sizes: if a cache is configured for 1,000 entries but holds 100,000, the cache eviction is broken, not a "leak." Look for event listener accumulation, closures capturing large scopes, and timers/intervals that are never cleared.
CPU profile shows "anonymous" or "unknown" functions consuming significant time. Minified code, eval'd code, and dynamically generated functions appear without useful names in profiles. Use source maps in Node.js (--enable-source-maps), avoid eval() and new Function(), and use named function expressions instead of arrow functions for functions that appear in profiles. Named functions make profiles readable and actionable.
Reviews
No reviews yet. Be the first to review this template!
Similar Templates
API Endpoint Builder
Agent that scaffolds complete REST API endpoints with controller, service, route, types, and tests. Supports Express, Fastify, and NestJS.
Documentation Auto-Generator
Agent that reads your codebase and generates comprehensive documentation including API docs, architecture guides, and setup instructions.
Ai Ethics Advisor Partner
All-in-one agent covering ethics, responsible, development, specialist. Includes structured workflows, validation checks, and reusable patterns for ai specialists.