P

Performance Profiler Agent

All-in-one agent covering performance, analysis, optimization, specialist. Includes structured workflows, validation checks, and reusable patterns for development tools.

AgentClipticsdevelopment toolsv1.0.0MIT
0 views0 copies

Performance Profiler Agent

A comprehensive performance analysis agent that profiles applications across all technology stacks, providing deep analysis of CPU usage, memory allocation, I/O patterns, and response time breakdowns to identify and resolve performance bottlenecks.

When to Use This Agent

Choose Performance Profiler Agent when:

  • Applications exhibit unexplained slowness or latency spikes
  • Memory usage grows over time (potential memory leaks)
  • CPU utilization is high without corresponding throughput
  • Need to profile specific code paths or request flows
  • Comparing performance between code versions or deployments

Consider alternatives when:

  • Running load tests at scale (use a performance engineering agent)
  • Optimizing database queries specifically (use a DBA agent)
  • Setting up performance monitoring infrastructure (use a DevOps agent)

Quick Start

# .claude/agents/performance-profiler-agent.yml name: Performance Profiler Agent description: Profile and analyze application performance model: claude-sonnet tools: - Read - Bash - Glob - Grep - Edit

Example invocation:

claude "Profile the /api/search endpoint β€” it's responding in 3 seconds when it should be under 200ms. Find where the time is being spent"

Core Concepts

Profiling Toolkit by Language

LanguageCPU ProfilerMemory ProfilerFlame Graph
Node.js--prof, clinic.js--heap-prof, Chrome DevTools0x, clinic flame
PythoncProfile, py-spytracemalloc, memory_profilerpy-spy, snakeviz
Gopprof (runtime/pprof)pprof heap profilego tool pprof
JavaJFR, async-profilerJFR, VisualVMasync-profiler
Rustperf, flamegraphDHAT, heaptrackcargo-flamegraph

Node.js Profiling Workflow

# CPU profiling with clinic.js npx clinic doctor -- node server.js # Generates a report showing event loop delays, CPU usage, memory # Flame graph generation npx clinic flame -- node server.js # Run load while the profiler captures, then view the flame graph # Heap snapshot for memory leaks node --inspect server.js # Open chrome://inspect, take heap snapshots before and after load # Compare snapshots to find objects that were not garbage collected # Built-in profiling node --prof server.js # Run load, then process the output: node --prof-process isolate-*.log > profile.txt

Request Latency Breakdown

// Middleware to break down request time by phase import { performance } from 'perf_hooks'; function requestProfiler(req, res, next) { const timings: Record<string, number> = {}; const start = performance.now(); // Track middleware phase req.on('middleware-complete', () => { timings.middleware = performance.now() - start; }); // Track database phase const origQuery = db.query.bind(db); let dbTime = 0; db.query = async (...args) => { const queryStart = performance.now(); const result = await origQuery(...args); dbTime += performance.now() - queryStart; return result; }; // Track response res.on('finish', () => { timings.total = performance.now() - start; timings.database = dbTime; timings.application = timings.total - timings.database - (timings.middleware || 0); console.log(JSON.stringify({ method: req.method, path: req.path, statusCode: res.statusCode, timings, })); }); next(); }

Configuration

ParameterDescriptionDefault
profiler_typeProfiling focus (cpu, memory, io, all)all
languageApplication language/runtimeAuto-detect
durationProfiling capture duration30s
sample_rateCPU sampling frequency (Hz)99
output_formatProfile output (flamegraph, text, json)flamegraph
compare_baselineCompare against baseline profilefalse

Best Practices

  1. Profile in an environment that matches production. Profiling on a developer laptop with 32GB RAM and an SSD gives different results than production with 4GB RAM and network-attached storage. Use staging environments with production-equivalent hardware and realistic data volumes. At minimum, ensure the same CPU architecture, memory limits, and I/O subsystem are present.

  2. Capture profiles under load, not at idle. A profile of an idle application shows framework overhead and initialization, not the actual bottleneck. Generate realistic load (using k6, ab, or production traffic replay) while the profiler captures. Aim for profiles that capture at least 30 seconds of sustained load to smooth out one-time initialization costs and show steady-state behavior.

  3. Read flame graphs from the bottom up. The widest bars at the bottom of a flame graph represent functions that consume the most total CPU time (including their children). Look for wide bars at the top β€” these are "hot" leaf functions doing actual work. Narrow towers represent deep call stacks. Wide plateaus represent functions spending time in a single child β€” investigate that child.

  4. Take multiple heap snapshots at different intervals to detect memory leaks. A single heap snapshot shows current memory state but does not reveal trends. Take snapshots at T+0, T+5 minutes, and T+30 minutes under load. Objects whose count grows between snapshots without shrinking are likely leaks. Compare snapshots using Chrome DevTools' "Comparison" view to highlight growing allocations.

  5. Profile both hot paths and cold paths. The first request to a service (cold start, empty caches, JIT not yet warm) can be 10x slower than subsequent requests. Profile both scenarios. Cold path optimization matters for serverless functions, user-facing first-load experiences, and services that restart frequently. Hot path optimization matters for sustained throughput.

Common Issues

Profiling overhead distorts the performance characteristics. Detailed profilers can add 2-10x overhead, making a 100ms request appear as 500ms and changing the relative cost of operations. Use low-overhead sampling profilers (py-spy, async-profiler, 0x) for production-like profiling. Reserve instrumentation-based profilers (cProfile, clinic doctor) for development environments where accuracy of relative costs matters more than absolute timing.

Memory leak investigations show memory growing but all objects appear reachable. Not all memory growth is a leak β€” some is expected caching, buffering, or data structure growth. Compare the heap against expected sizes: if a cache is configured for 1,000 entries but holds 100,000, the cache eviction is broken, not a "leak." Look for event listener accumulation, closures capturing large scopes, and timers/intervals that are never cleared.

CPU profile shows "anonymous" or "unknown" functions consuming significant time. Minified code, eval'd code, and dynamically generated functions appear without useful names in profiles. Use source maps in Node.js (--enable-source-maps), avoid eval() and new Function(), and use named function expressions instead of arrow functions for functions that appear in profiles. Named functions make profiles readable and actionable.

Community

Reviews

Write a review

No reviews yet. Be the first to review this template!

Similar Templates