Performance Engineer Companion

A senior performance engineering agent that optimizes system performance by identifying bottlenecks, conducting load tests, tuning database queries, and ensuring applications meet scalability and latency requirements.

When to Use This Agent

Choose Performance Engineer Companion when:

Application response times exceed SLA thresholds
Preparing for expected traffic increases (launches, promotions, seasonal)
Conducting load and stress testing before production deployment
Optimizing database query performance and indexing strategy
Profiling CPU, memory, and I/O usage to identify bottlenecks

Consider alternatives when:

Debugging functional bugs (use a debugging agent)
Setting up monitoring infrastructure (use a DevOps agent)
Optimizing build/compilation times (use a build engineering agent)

Quick Start


# .claude/agents/performance-engineer-companion.yml
name: Performance Engineer Companion
description: Optimize application performance and scalability
model: claude-sonnet
tools:
  - Read
  - Edit
  - Bash
  - Glob
  - Grep

Example invocation:

claude "Profile our API endpoints, identify the slowest ones, and optimize the top 3 performance bottlenecks"

Core Concepts

Performance Analysis Framework

Layer	Metrics	Tools
Network	Latency, bandwidth, DNS resolution	curl timing, tcpdump
Application	Response time, throughput, error rate	APM (Datadog, New Relic)
Database	Query time, connection pool, lock contention	EXPLAIN ANALYZE, pg_stat
Infrastructure	CPU, memory, disk I/O, network I/O	top, vmstat, iostat
Frontend	LCP, FID, CLS, TTFB	Lighthouse, WebPageTest

Load Testing Configuration


// k6 load test script
import http from 'k6/http';
import { check, sleep } from 'k6';

export const options = {
  stages: [
    { duration: '2m', target: 50 },    // Ramp up
    { duration: '5m', target: 50 },    // Sustained load
    { duration: '2m', target: 200 },   // Spike test
    { duration: '5m', target: 200 },   // Sustained spike
    { duration: '2m', target: 0 },     // Ramp down
  ],
  thresholds: {
    http_req_duration: ['p(95)<500', 'p(99)<1000'],
    http_req_failed: ['rate<0.01'],
    http_reqs: ['rate>100'],
  },
};

export default function () {
  const res = http.get('https://api.example.com/products');
  check(res, {
    'status is 200': (r) => r.status === 200,
    'response time < 500ms': (r) => r.timings.duration < 500,
  });
  sleep(1);
}

Database Query Optimization


-- Before: Full table scan (12 seconds)
EXPLAIN ANALYZE
SELECT u.*, COUNT(o.id) as order_count
FROM users u
LEFT JOIN orders o ON o.user_id = u.id
WHERE u.created_at > '2025-01-01'
GROUP BY u.id
ORDER BY order_count DESC
LIMIT 50;

-- After: Index-optimized (45ms)
-- Step 1: Add covering index
CREATE INDEX idx_users_created_at ON users(created_at);
CREATE INDEX idx_orders_user_id ON orders(user_id);

-- Step 2: Restructure query
EXPLAIN ANALYZE
SELECT u.*, sub.order_count
FROM users u
JOIN (
  SELECT user_id, COUNT(*) as order_count
  FROM orders
  GROUP BY user_id
  ORDER BY order_count DESC
  LIMIT 50
) sub ON sub.user_id = u.id
WHERE u.created_at > '2025-01-01'
ORDER BY sub.order_count DESC;

Configuration

Parameter	Description	Default
`target_latency_p95`	Target P95 response time in ms	`500`
`target_latency_p99`	Target P99 response time in ms	`1000`
`load_test_tool`	Load testing framework (k6, artillery, locust)	`k6`
`profiler`	Application profiler (clinic.js, py-spy, pprof)	Auto-detect
`apm_provider`	APM tool for production monitoring	Auto-detect
`database`	Database engine for query optimization	Auto-detect

Best Practices

Profile before optimizing — the bottleneck is rarely where you think. Developers intuit that "the database is slow" or "the algorithm is inefficient" when the actual bottleneck is serialization overhead, connection pool exhaustion, or unnecessary middleware. Run a profiler (clinic.js for Node.js, py-spy for Python, pprof for Go) and let the data identify the hotspot. Spend optimization effort proportional to each component's contribution to total latency.
Measure latency in percentiles, not averages. An average response time of 200ms hides the fact that 1% of users experience 5-second responses. Always measure and optimize P95 and P99 latency. The worst-case user experience determines customer satisfaction, not the average. Set SLOs based on percentiles: "P95 < 500ms, P99 < 1000ms" and alert when these are breached.
Load test with realistic traffic patterns, not uniform requests. Real traffic includes a mix of endpoint frequencies, authenticated and anonymous users, cache-cold and cache-warm requests, and bursts. Model your load test after production traffic analysis. A uniform 100 req/s to a single endpoint does not predict how the system behaves under real-world mixed workloads.
Optimize the database query first, application code second. In most web applications, database queries account for 60-80% of response time. Use EXPLAIN ANALYZE to verify that queries use indexes, avoid sequential scans on large tables, and do not produce excessive row estimates. Adding an index is almost always a bigger win than optimizing the application code that processes the query results.
Implement caching at the layer closest to the consumer. Cache hierarchy from fastest to slowest: browser cache → CDN → application cache (Redis) → database query cache. Cache at the highest applicable layer. Static assets belong on a CDN. API responses that are the same for all users belong in Redis. Query results that are expensive to compute belong in a database materialized view. Each layer reduces load on the layers below it.

Common Issues

Performance degrades gradually over time as data volume grows. Queries that run fast with 10,000 rows become slow with 10 million rows. Database indexes that covered initial query patterns do not cover new access patterns. Monitor query performance trends weekly, not just during incidents. Add indexes proactively when table sizes cross thresholds. Implement data archival or partitioning strategies before slow queries impact users.

Load tests pass but production fails at the same traffic levels. Load tests often use a clean database, warm caches, and uniform traffic. Production has fragmented data, cold caches after deployments, and traffic spikes. Run load tests against production-scale data volumes. Include cache-miss scenarios by flushing caches before testing. Add spike tests that simulate real-world burst patterns (10x traffic in 30 seconds).

Optimizing one endpoint creates a bottleneck elsewhere. Adding aggressive caching to a frequently-read endpoint reduces database load but increases Redis memory usage and creates cache invalidation complexity. Adding connection pooling to handle more concurrent requests reveals that the downstream payment API has a rate limit. Performance optimization is a system-wide exercise — measure the impact on all components, not just the one being optimized.

⚠️ Loading Issue

Performance Engineer Companion

Performance Engineer Companion

When to Use This Agent

Quick Start

Core Concepts

Performance Analysis Framework

Load Testing Configuration

Database Query Optimization

Configuration

Best Practices

Common Issues

Reviews

Write a review

Similar Templates

API Endpoint Builder

Documentation Auto-Generator

Ai Ethics Advisor Partner