PostgreSQL DBA Strategist

An expert agent for PostgreSQL database administration covering query optimization, index design, replication, partitioning, vacuum tuning, and production troubleshooting for high-performance PostgreSQL deployments.

When to Use This Agent

Choose PostgreSQL DBA Strategist when:

Optimizing PostgreSQL query performance with EXPLAIN ANALYZE
Designing partitioning strategies for large tables
Configuring streaming replication and failover
Tuning autovacuum for high-write workloads
Troubleshooting connection issues, bloat, and lock contention

Consider alternatives when:

Working with SQL Server or MySQL (use database-specific agents)
Using Neon serverless PostgreSQL specifically (use a Neon agent)
Designing application data models without DBA concerns (use a backend dev agent)

Quick Start


# .claude/agents/postgresql-dba-strategist.yml
name: PostgreSQL DBA
model: claude-sonnet-4-20250514
tools:
  - Read
  - Write
  - Bash
  - Glob
  - Grep
prompt: |
  You are a PostgreSQL DBA expert. Optimize queries, design
  indexes, configure replication, tune autovacuum, and troubleshoot
  production issues. Always use EXPLAIN ANALYZE for diagnosis
  and verify changes don't degrade other queries.

Example invocation:


claude --agent postgresql-dba-strategist "Our orders table has
  500M rows and sequential scans are killing performance. Design
  a partitioning strategy by order_date and migrate the existing
  data with minimal downtime."

Core Concepts

Query Optimization Workflow


-- Step 1: Capture the execution plan
EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT)
SELECT * FROM orders
WHERE created_at > '2024-01-01'
AND status = 'pending';

-- Key metrics to examine:
-- Planning Time vs Execution Time
-- Seq Scan vs Index Scan (look for unexpected Seq Scans)
-- Rows estimated vs actual (large gaps = stale statistics)
-- Buffers shared hit vs read (cache effectiveness)

Index Design Guidelines

Query Pattern	Index Type	Example
Equality filter	B-tree	`CREATE INDEX ON orders (status)`
Range filter	B-tree (column order matters)	`CREATE INDEX ON orders (created_at)`
Pattern matching	GIN trigram	`CREATE INDEX ON users USING GIN (name gin_trgm_ops)`
JSON queries	GIN	`CREATE INDEX ON events USING GIN (data jsonb_path_ops)`
Full-text search	GIN tsvector	`CREATE INDEX ON docs USING GIN (to_tsvector('english', body))`
Geospatial	GiST	`CREATE INDEX ON places USING GIST (location)`
Partial data	B-tree filtered	`CREATE INDEX ON orders (id) WHERE status = 'pending'`

Partitioning Strategy


-- Range partitioning by date
CREATE TABLE orders (
    id bigint GENERATED ALWAYS AS IDENTITY,
    created_at timestamptz NOT NULL,
    customer_id bigint,
    total numeric(10,2)
) PARTITION BY RANGE (created_at);

CREATE TABLE orders_2024_q1 PARTITION OF orders
    FOR VALUES FROM ('2024-01-01') TO ('2024-04-01');

CREATE TABLE orders_2024_q2 PARTITION OF orders
    FOR VALUES FROM ('2024-04-01') TO ('2024-07-01');

-- Automate future partition creation

Configuration

Parameter	Description	Default
`shared_buffers`	Shared memory for caching	25% of RAM
`effective_cache_size`	OS cache estimate for planner	75% of RAM
`work_mem`	Memory per sort/hash operation	64MB
`maintenance_work_mem`	Memory for VACUUM, CREATE INDEX	512MB
`wal_level`	Write-ahead log detail	replica
`max_connections`	Maximum client connections	200
`autovacuum_naptime`	Autovacuum check interval	30s

Best Practices

Always use EXPLAIN (ANALYZE, BUFFERS) for diagnosis, never guess. The execution plan reveals exactly what PostgreSQL does: which indexes it uses, how many rows it expects versus reality, and whether data comes from shared buffers or disk. A large gap between estimated and actual rows indicates stale statistics that need ANALYZE. Without this data, performance tuning is guesswork.
Tune autovacuum per table, not globally. High-write tables need more aggressive vacuuming than read-heavy tables. Set autovacuum_vacuum_scale_factor to 0.01 (1%) for large tables instead of the default 0.2 (20%). A 500M-row table with default settings won't autovacuum until 100M dead tuples accumulate, causing massive bloat. Table-level settings let you vacuum frequently on hot tables without wasting resources on cold ones.
Design indexes for your most common query patterns, not all patterns. Each index adds write overhead and consumes storage. Profile your actual query workload using pg_stat_statements to find the top 10 queries by total time. Index those patterns first. A covering index that handles your top 3 queries is better than 10 single-column indexes that each help one query.
Use connection pooling in production, always. PostgreSQL forks a process per connection, so 500 idle connections waste significant memory. Use PgBouncer in transaction mode between your application and PostgreSQL. This lets thousands of application connections share a pool of 50-100 actual database connections. Set max_connections based on the pooler's pool size plus administrative connections.
Partition tables before they become too large to manage. Adding partitioning to an existing 500M-row table requires data migration and downtime. Plan partitioning when tables are expected to exceed 50-100M rows. Range partitioning by date is the most common and works well for time-series data. Ensure queries include the partition key in WHERE clauses so the planner can eliminate irrelevant partitions.

Common Issues

Query suddenly slows down after working fine for months. Check for plan changes caused by stale statistics (ANALYZE the table), index bloat (rebuild with REINDEX CONCURRENTLY), or table bloat (check pg_stat_user_tables for dead tuple ratio). Also check if data volume crossed a tipping point where the planner switches from index scan to sequential scan. Force statistics update and compare execution plans before and after.

High CPU usage from autovacuum on busy tables. Autovacuum running on large tables consumes CPU and I/O, potentially affecting production queries. Tune autovacuum_vacuum_cost_delay to throttle vacuuming (increase from 2ms to 10-20ms for less aggressive behavior). Schedule manual VACUUM during off-peak hours for the largest tables. Set autovacuum_max_workers based on your CPU cores and I/O capacity.

Replication lag increases under write-heavy workloads. The replica can't apply WAL records as fast as the primary generates them. Ensure the replica has equivalent hardware and I/O capacity. Increase max_wal_senders and wal_keep_size on the primary. On the replica, tune max_parallel_workers and ensure no long-running queries block WAL application. Monitor lag with pg_stat_replication and alert when it exceeds acceptable thresholds.

⚠️ Loading Issue

Postgresql Dba Strategist

PostgreSQL DBA Strategist

When to Use This Agent

Quick Start

Core Concepts

Query Optimization Workflow

Index Design Guidelines

Partitioning Strategy

Configuration

Best Practices

Common Issues

Reviews

Write a review

Similar Templates

API Endpoint Builder

Documentation Auto-Generator

Ai Ethics Advisor Partner