Comprehensive Server Management

A practical skill for managing production servers covering process management, system monitoring, log analysis, security hardening, backup strategies, and troubleshooting common server issues.

When to Use This Skill

Choose this skill when:

Setting up and managing Node.js, Python, or Go applications in production
Configuring process managers (PM2, systemd) for application reliability
Monitoring server resources (CPU, memory, disk, network)
Implementing log rotation, centralized logging, and analysis
Troubleshooting performance issues, crashes, and connectivity problems

Consider alternatives when:

Working with container orchestration → use a Kubernetes or Docker skill
Managing cloud infrastructure → use an AWS/GCP/Azure skill
Setting up CI/CD pipelines → use a DevOps skill
Need database administration → use a DBA skill

Quick Start


# PM2 process management for Node.js
npm install -g pm2

# Start application with cluster mode
pm2 start app.js --name "api" -i max --max-memory-restart 500M

# ecosystem.config.js — PM2 configuration
module.exports = {
  apps: [{
    name: 'api',
    script: './dist/server.js',
    instances: 'max',
    exec_mode: 'cluster',
    max_memory_restart: '500M',
    env_production: {
      NODE_ENV: 'production',
      PORT: 3000,
    },
    error_file: './logs/err.log',
    out_file: './logs/out.log',
    merge_logs: true,
    log_date_format: 'YYYY-MM-DD HH:mm:ss Z',
  }],
};

Core Concepts

Process Management Tools

Tool	Best For	Key Features
PM2	Node.js applications	Cluster mode, log management, monitoring
systemd	System services	Boot integration, dependency management
supervisor	Python applications	Process groups, event listeners
nginx	Reverse proxy, static files	Load balancing, SSL termination

System Monitoring Script


#!/bin/bash
# Server health check script

# CPU usage
CPU=$(top -bn1 | grep "Cpu(s)" | awk '{print $2}')
echo "CPU Usage: ${CPU}%"

# Memory usage
MEM=$(free -m | awk 'NR==2{printf "%.1f%%", $3*100/$2}')
echo "Memory Usage: $MEM"

# Disk usage
DISK=$(df -h / | awk 'NR==2{print $5}')
echo "Disk Usage: $DISK"

# Top processes by CPU
echo -e "\nTop 5 CPU processes:"
ps aux --sort=-%cpu | head -6

# Open connections
CONNECTIONS=$(ss -tun | wc -l)
echo -e "\nActive connections: $CONNECTIONS"

# Application health check
HTTP_CODE=$(curl -s -o /dev/null -w "%{http_code}" http://localhost:3000/health)
echo "App health: HTTP $HTTP_CODE"

Nginx Reverse Proxy Configuration


upstream app_servers {
    least_conn;
    server 127.0.0.1:3000;
    server 127.0.0.1:3001;
    keepalive 32;
}

server {
    listen 443 ssl http2;
    server_name example.com;

    ssl_certificate /etc/letsencrypt/live/example.com/fullchain.pem;
    ssl_certificate_key /etc/letsencrypt/live/example.com/privkey.pem;
    ssl_protocols TLSv1.2 TLSv1.3;

    # Security headers
    add_header X-Frame-Options DENY;
    add_header X-Content-Type-Options nosniff;
    add_header Strict-Transport-Security "max-age=31536000; includeSubDomains" always;

    # Gzip compression
    gzip on;
    gzip_types text/plain application/json application/javascript text/css;
    gzip_min_length 1000;

    location / {
        proxy_pass http://app_servers;
        proxy_http_version 1.1;
        proxy_set_header Upgrade $http_upgrade;
        proxy_set_header Connection 'upgrade';
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header X-Forwarded-Proto $scheme;
        proxy_read_timeout 60s;
    }

    location /static/ {
        alias /var/www/static/;
        expires 30d;
        add_header Cache-Control "public, immutable";
    }
}

Configuration

Parameter	Type	Default	Description
`processManager`	string	`'pm2'`	Process manager: pm2, systemd, or supervisor
`clusterInstances`	string	`'max'`	Worker instances: max, number, or 0 (fork)
`maxMemoryRestart`	string	`'500M'`	Auto-restart threshold per worker
`logRotation`	string	`'daily'`	Log rotation: daily, weekly, or size-based
`monitoringInterval`	number	`60`	Health check interval in seconds
`sslProvider`	string	`'letsencrypt'`	SSL certificate: letsencrypt or custom

Best Practices

Run applications behind a reverse proxy, never directly exposed — Nginx or Caddy handles SSL termination, static file serving, rate limiting, and load balancing far more efficiently than application code. Direct exposure risks security vulnerabilities and performance issues.
Use cluster mode to utilize all CPU cores — Node.js is single-threaded. PM2's cluster mode spawns one worker per core with automatic load balancing. Set max_memory_restart to prevent memory leaks from accumulating indefinitely.
Implement structured logging with rotation — JSON-formatted logs enable automated parsing and alerting. Rotate logs daily and retain for 30-90 days. Ship logs to a centralized system (ELK, Loki) for cross-server analysis and long-term storage.
Automate server hardening with reproducible scripts — Disable root SSH login, configure fail2ban, enable unattended security updates, and restrict firewall rules. Use Ansible or shell scripts to ensure every server has identical security configurations.
Set up health checks that verify application logic, not just port availability — A health endpoint should verify database connectivity, cache availability, and critical service dependencies. A process listening on port 3000 with a broken database connection is not healthy.

Common Issues

Application restarts in a loop (crash loop) — Check PM2 logs with pm2 logs --err. Common causes: missing environment variables, unhandled promise rejections, port already in use. Set max_restarts: 10 and restart_delay: 5000 to prevent rapid restart loops.

Server runs out of disk space — Unrotated logs and temporary files fill disks gradually. Set up logrotate for all log files, clean /tmp periodically, and monitor disk usage with alerts at 80% capacity. Use du -sh /var/log/* to find the largest consumers.

High memory usage with no apparent leak — Node.js V8 heap grows to fill available memory before garbage collecting aggressively. Set --max-old-space-size to limit heap size. Take heap snapshots with process.memoryUsage() and compare over time to identify actual leaks.

⚠️ Loading Issue

Comprehensive Server Management

Comprehensive Server Management

When to Use This Skill

Quick Start

Core Concepts

Process Management Tools

System Monitoring Script

Nginx Reverse Proxy Configuration

Configuration

Best Practices

Common Issues

Reviews

Write a review

Similar Templates

Full-Stack Code Reviewer

Test Suite Generator

Pro Architecture Workspace