Performance Engineer Companion
Production-ready agent that handles agent, need, identify, eliminate. Includes structured workflows, validation checks, and reusable patterns for development tools.
Performance Engineer Companion
A senior performance engineering agent that optimizes system performance by identifying bottlenecks, conducting load tests, tuning database queries, and ensuring applications meet scalability and latency requirements.
When to Use This Agent
Choose Performance Engineer Companion when:
- Application response times exceed SLA thresholds
- Preparing for expected traffic increases (launches, promotions, seasonal)
- Conducting load and stress testing before production deployment
- Optimizing database query performance and indexing strategy
- Profiling CPU, memory, and I/O usage to identify bottlenecks
Consider alternatives when:
- Debugging functional bugs (use a debugging agent)
- Setting up monitoring infrastructure (use a DevOps agent)
- Optimizing build/compilation times (use a build engineering agent)
Quick Start
# .claude/agents/performance-engineer-companion.yml name: Performance Engineer Companion description: Optimize application performance and scalability model: claude-sonnet tools: - Read - Edit - Bash - Glob - Grep
Example invocation:
claude "Profile our API endpoints, identify the slowest ones, and optimize the top 3 performance bottlenecks"
Core Concepts
Performance Analysis Framework
| Layer | Metrics | Tools |
|---|---|---|
| Network | Latency, bandwidth, DNS resolution | curl timing, tcpdump |
| Application | Response time, throughput, error rate | APM (Datadog, New Relic) |
| Database | Query time, connection pool, lock contention | EXPLAIN ANALYZE, pg_stat |
| Infrastructure | CPU, memory, disk I/O, network I/O | top, vmstat, iostat |
| Frontend | LCP, FID, CLS, TTFB | Lighthouse, WebPageTest |
Load Testing Configuration
// k6 load test script import http from 'k6/http'; import { check, sleep } from 'k6'; export const options = { stages: [ { duration: '2m', target: 50 }, // Ramp up { duration: '5m', target: 50 }, // Sustained load { duration: '2m', target: 200 }, // Spike test { duration: '5m', target: 200 }, // Sustained spike { duration: '2m', target: 0 }, // Ramp down ], thresholds: { http_req_duration: ['p(95)<500', 'p(99)<1000'], http_req_failed: ['rate<0.01'], http_reqs: ['rate>100'], }, }; export default function () { const res = http.get('https://api.example.com/products'); check(res, { 'status is 200': (r) => r.status === 200, 'response time < 500ms': (r) => r.timings.duration < 500, }); sleep(1); }
Database Query Optimization
-- Before: Full table scan (12 seconds) EXPLAIN ANALYZE SELECT u.*, COUNT(o.id) as order_count FROM users u LEFT JOIN orders o ON o.user_id = u.id WHERE u.created_at > '2025-01-01' GROUP BY u.id ORDER BY order_count DESC LIMIT 50; -- After: Index-optimized (45ms) -- Step 1: Add covering index CREATE INDEX idx_users_created_at ON users(created_at); CREATE INDEX idx_orders_user_id ON orders(user_id); -- Step 2: Restructure query EXPLAIN ANALYZE SELECT u.*, sub.order_count FROM users u JOIN ( SELECT user_id, COUNT(*) as order_count FROM orders GROUP BY user_id ORDER BY order_count DESC LIMIT 50 ) sub ON sub.user_id = u.id WHERE u.created_at > '2025-01-01' ORDER BY sub.order_count DESC;
Configuration
| Parameter | Description | Default |
|---|---|---|
target_latency_p95 | Target P95 response time in ms | 500 |
target_latency_p99 | Target P99 response time in ms | 1000 |
load_test_tool | Load testing framework (k6, artillery, locust) | k6 |
profiler | Application profiler (clinic.js, py-spy, pprof) | Auto-detect |
apm_provider | APM tool for production monitoring | Auto-detect |
database | Database engine for query optimization | Auto-detect |
Best Practices
-
Profile before optimizing β the bottleneck is rarely where you think. Developers intuit that "the database is slow" or "the algorithm is inefficient" when the actual bottleneck is serialization overhead, connection pool exhaustion, or unnecessary middleware. Run a profiler (clinic.js for Node.js, py-spy for Python, pprof for Go) and let the data identify the hotspot. Spend optimization effort proportional to each component's contribution to total latency.
-
Measure latency in percentiles, not averages. An average response time of 200ms hides the fact that 1% of users experience 5-second responses. Always measure and optimize P95 and P99 latency. The worst-case user experience determines customer satisfaction, not the average. Set SLOs based on percentiles: "P95 < 500ms, P99 < 1000ms" and alert when these are breached.
-
Load test with realistic traffic patterns, not uniform requests. Real traffic includes a mix of endpoint frequencies, authenticated and anonymous users, cache-cold and cache-warm requests, and bursts. Model your load test after production traffic analysis. A uniform 100 req/s to a single endpoint does not predict how the system behaves under real-world mixed workloads.
-
Optimize the database query first, application code second. In most web applications, database queries account for 60-80% of response time. Use
EXPLAIN ANALYZEto verify that queries use indexes, avoid sequential scans on large tables, and do not produce excessive row estimates. Adding an index is almost always a bigger win than optimizing the application code that processes the query results. -
Implement caching at the layer closest to the consumer. Cache hierarchy from fastest to slowest: browser cache β CDN β application cache (Redis) β database query cache. Cache at the highest applicable layer. Static assets belong on a CDN. API responses that are the same for all users belong in Redis. Query results that are expensive to compute belong in a database materialized view. Each layer reduces load on the layers below it.
Common Issues
Performance degrades gradually over time as data volume grows. Queries that run fast with 10,000 rows become slow with 10 million rows. Database indexes that covered initial query patterns do not cover new access patterns. Monitor query performance trends weekly, not just during incidents. Add indexes proactively when table sizes cross thresholds. Implement data archival or partitioning strategies before slow queries impact users.
Load tests pass but production fails at the same traffic levels. Load tests often use a clean database, warm caches, and uniform traffic. Production has fragmented data, cold caches after deployments, and traffic spikes. Run load tests against production-scale data volumes. Include cache-miss scenarios by flushing caches before testing. Add spike tests that simulate real-world burst patterns (10x traffic in 30 seconds).
Optimizing one endpoint creates a bottleneck elsewhere. Adding aggressive caching to a frequently-read endpoint reduces database load but increases Redis memory usage and creates cache invalidation complexity. Adding connection pooling to handle more concurrent requests reveals that the downstream payment API has a rate limit. Performance optimization is a system-wide exercise β measure the impact on all components, not just the one being optimized.
Reviews
No reviews yet. Be the first to review this template!
Similar Templates
API Endpoint Builder
Agent that scaffolds complete REST API endpoints with controller, service, route, types, and tests. Supports Express, Fastify, and NestJS.
Documentation Auto-Generator
Agent that reads your codebase and generates comprehensive documentation including API docs, architecture guides, and setup instructions.
Ai Ethics Advisor Partner
All-in-one agent covering ethics, responsible, development, specialist. Includes structured workflows, validation checks, and reusable patterns for ai specialists.