Performance Profiler Agent

A comprehensive performance analysis agent that profiles applications across all technology stacks, providing deep analysis of CPU usage, memory allocation, I/O patterns, and response time breakdowns to identify and resolve performance bottlenecks.

When to Use This Agent

Choose Performance Profiler Agent when:

Applications exhibit unexplained slowness or latency spikes
Memory usage grows over time (potential memory leaks)
CPU utilization is high without corresponding throughput
Need to profile specific code paths or request flows
Comparing performance between code versions or deployments

Consider alternatives when:

Running load tests at scale (use a performance engineering agent)
Optimizing database queries specifically (use a DBA agent)
Setting up performance monitoring infrastructure (use a DevOps agent)

Quick Start


# .claude/agents/performance-profiler-agent.yml
name: Performance Profiler Agent
description: Profile and analyze application performance
model: claude-sonnet
tools:
  - Read
  - Bash
  - Glob
  - Grep
  - Edit

Example invocation:

claude "Profile the /api/search endpoint — it's responding in 3 seconds when it should be under 200ms. Find where the time is being spent"

Core Concepts

Profiling Toolkit by Language

Language	CPU Profiler	Memory Profiler	Flame Graph
Node.js	`--prof`, clinic.js	`--heap-prof`, Chrome DevTools	0x, clinic flame
Python	cProfile, py-spy	tracemalloc, memory_profiler	py-spy, snakeviz
Go	pprof (runtime/pprof)	pprof heap profile	go tool pprof
Java	JFR, async-profiler	JFR, VisualVM	async-profiler
Rust	perf, flamegraph	DHAT, heaptrack	cargo-flamegraph

Node.js Profiling Workflow


# CPU profiling with clinic.js
npx clinic doctor -- node server.js
# Generates a report showing event loop delays, CPU usage, memory

# Flame graph generation
npx clinic flame -- node server.js
# Run load while the profiler captures, then view the flame graph

# Heap snapshot for memory leaks
node --inspect server.js
# Open chrome://inspect, take heap snapshots before and after load
# Compare snapshots to find objects that were not garbage collected

# Built-in profiling
node --prof server.js
# Run load, then process the output:
node --prof-process isolate-*.log > profile.txt

Request Latency Breakdown


// Middleware to break down request time by phase
import { performance } from 'perf_hooks';

function requestProfiler(req, res, next) {
  const timings: Record<string, number> = {};
  const start = performance.now();

  // Track middleware phase
  req.on('middleware-complete', () => {
    timings.middleware = performance.now() - start;
  });

  // Track database phase
  const origQuery = db.query.bind(db);
  let dbTime = 0;
  db.query = async (...args) => {
    const queryStart = performance.now();
    const result = await origQuery(...args);
    dbTime += performance.now() - queryStart;
    return result;
  };

  // Track response
  res.on('finish', () => {
    timings.total = performance.now() - start;
    timings.database = dbTime;
    timings.application = timings.total - timings.database - (timings.middleware || 0);

    console.log(JSON.stringify({
      method: req.method,
      path: req.path,
      statusCode: res.statusCode,
      timings,
    }));
  });

  next();
}

Configuration

Parameter	Description	Default
`profiler_type`	Profiling focus (cpu, memory, io, all)	`all`
`language`	Application language/runtime	Auto-detect
`duration`	Profiling capture duration	`30s`
`sample_rate`	CPU sampling frequency (Hz)	`99`
`output_format`	Profile output (flamegraph, text, json)	`flamegraph`
`compare_baseline`	Compare against baseline profile	`false`

Best Practices

Profile in an environment that matches production. Profiling on a developer laptop with 32GB RAM and an SSD gives different results than production with 4GB RAM and network-attached storage. Use staging environments with production-equivalent hardware and realistic data volumes. At minimum, ensure the same CPU architecture, memory limits, and I/O subsystem are present.
Capture profiles under load, not at idle. A profile of an idle application shows framework overhead and initialization, not the actual bottleneck. Generate realistic load (using k6, ab, or production traffic replay) while the profiler captures. Aim for profiles that capture at least 30 seconds of sustained load to smooth out one-time initialization costs and show steady-state behavior.
Read flame graphs from the bottom up. The widest bars at the bottom of a flame graph represent functions that consume the most total CPU time (including their children). Look for wide bars at the top — these are "hot" leaf functions doing actual work. Narrow towers represent deep call stacks. Wide plateaus represent functions spending time in a single child — investigate that child.
Take multiple heap snapshots at different intervals to detect memory leaks. A single heap snapshot shows current memory state but does not reveal trends. Take snapshots at T+0, T+5 minutes, and T+30 minutes under load. Objects whose count grows between snapshots without shrinking are likely leaks. Compare snapshots using Chrome DevTools' "Comparison" view to highlight growing allocations.
Profile both hot paths and cold paths. The first request to a service (cold start, empty caches, JIT not yet warm) can be 10x slower than subsequent requests. Profile both scenarios. Cold path optimization matters for serverless functions, user-facing first-load experiences, and services that restart frequently. Hot path optimization matters for sustained throughput.

Common Issues

Profiling overhead distorts the performance characteristics. Detailed profilers can add 2-10x overhead, making a 100ms request appear as 500ms and changing the relative cost of operations. Use low-overhead sampling profilers (py-spy, async-profiler, 0x) for production-like profiling. Reserve instrumentation-based profilers (cProfile, clinic doctor) for development environments where accuracy of relative costs matters more than absolute timing.

Memory leak investigations show memory growing but all objects appear reachable. Not all memory growth is a leak — some is expected caching, buffering, or data structure growth. Compare the heap against expected sizes: if a cache is configured for 1,000 entries but holds 100,000, the cache eviction is broken, not a "leak." Look for event listener accumulation, closures capturing large scopes, and timers/intervals that are never cleared.

CPU profile shows "anonymous" or "unknown" functions consuming significant time. Minified code, eval'd code, and dynamically generated functions appear without useful names in profiles. Use source maps in Node.js (--enable-source-maps), avoid eval() and new Function(), and use named function expressions instead of arrow functions for functions that appear in profiles. Named functions make profiles readable and actionable.

⚠️ Loading Issue

Performance Profiler Agent

Performance Profiler Agent

When to Use This Agent

Quick Start

Core Concepts

Profiling Toolkit by Language

Node.js Profiling Workflow

Request Latency Breakdown

Configuration

Best Practices

Common Issues

Reviews

Write a review

Similar Templates

API Endpoint Builder

Documentation Auto-Generator

Ai Ethics Advisor Partner