C

Comprehensive Server Management

Comprehensive skill designed for server, management, principles, decision. Includes structured workflows, validation checks, and reusable patterns for development.

SkillClipticsdevelopmentv1.0.0MIT
0 views0 copies

Comprehensive Server Management

A practical skill for managing production servers covering process management, system monitoring, log analysis, security hardening, backup strategies, and troubleshooting common server issues.

When to Use This Skill

Choose this skill when:

  • Setting up and managing Node.js, Python, or Go applications in production
  • Configuring process managers (PM2, systemd) for application reliability
  • Monitoring server resources (CPU, memory, disk, network)
  • Implementing log rotation, centralized logging, and analysis
  • Troubleshooting performance issues, crashes, and connectivity problems

Consider alternatives when:

  • Working with container orchestration → use a Kubernetes or Docker skill
  • Managing cloud infrastructure → use an AWS/GCP/Azure skill
  • Setting up CI/CD pipelines → use a DevOps skill
  • Need database administration → use a DBA skill

Quick Start

# PM2 process management for Node.js npm install -g pm2 # Start application with cluster mode pm2 start app.js --name "api" -i max --max-memory-restart 500M # ecosystem.config.js — PM2 configuration module.exports = { apps: [{ name: 'api', script: './dist/server.js', instances: 'max', exec_mode: 'cluster', max_memory_restart: '500M', env_production: { NODE_ENV: 'production', PORT: 3000, }, error_file: './logs/err.log', out_file: './logs/out.log', merge_logs: true, log_date_format: 'YYYY-MM-DD HH:mm:ss Z', }], };

Core Concepts

Process Management Tools

ToolBest ForKey Features
PM2Node.js applicationsCluster mode, log management, monitoring
systemdSystem servicesBoot integration, dependency management
supervisorPython applicationsProcess groups, event listeners
nginxReverse proxy, static filesLoad balancing, SSL termination

System Monitoring Script

#!/bin/bash # Server health check script # CPU usage CPU=$(top -bn1 | grep "Cpu(s)" | awk '{print $2}') echo "CPU Usage: ${CPU}%" # Memory usage MEM=$(free -m | awk 'NR==2{printf "%.1f%%", $3*100/$2}') echo "Memory Usage: $MEM" # Disk usage DISK=$(df -h / | awk 'NR==2{print $5}') echo "Disk Usage: $DISK" # Top processes by CPU echo -e "\nTop 5 CPU processes:" ps aux --sort=-%cpu | head -6 # Open connections CONNECTIONS=$(ss -tun | wc -l) echo -e "\nActive connections: $CONNECTIONS" # Application health check HTTP_CODE=$(curl -s -o /dev/null -w "%{http_code}" http://localhost:3000/health) echo "App health: HTTP $HTTP_CODE"

Nginx Reverse Proxy Configuration

upstream app_servers { least_conn; server 127.0.0.1:3000; server 127.0.0.1:3001; keepalive 32; } server { listen 443 ssl http2; server_name example.com; ssl_certificate /etc/letsencrypt/live/example.com/fullchain.pem; ssl_certificate_key /etc/letsencrypt/live/example.com/privkey.pem; ssl_protocols TLSv1.2 TLSv1.3; # Security headers add_header X-Frame-Options DENY; add_header X-Content-Type-Options nosniff; add_header Strict-Transport-Security "max-age=31536000; includeSubDomains" always; # Gzip compression gzip on; gzip_types text/plain application/json application/javascript text/css; gzip_min_length 1000; location / { proxy_pass http://app_servers; proxy_http_version 1.1; proxy_set_header Upgrade $http_upgrade; proxy_set_header Connection 'upgrade'; proxy_set_header Host $host; proxy_set_header X-Real-IP $remote_addr; proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for; proxy_set_header X-Forwarded-Proto $scheme; proxy_read_timeout 60s; } location /static/ { alias /var/www/static/; expires 30d; add_header Cache-Control "public, immutable"; } }

Configuration

ParameterTypeDefaultDescription
processManagerstring'pm2'Process manager: pm2, systemd, or supervisor
clusterInstancesstring'max'Worker instances: max, number, or 0 (fork)
maxMemoryRestartstring'500M'Auto-restart threshold per worker
logRotationstring'daily'Log rotation: daily, weekly, or size-based
monitoringIntervalnumber60Health check interval in seconds
sslProviderstring'letsencrypt'SSL certificate: letsencrypt or custom

Best Practices

  1. Run applications behind a reverse proxy, never directly exposed — Nginx or Caddy handles SSL termination, static file serving, rate limiting, and load balancing far more efficiently than application code. Direct exposure risks security vulnerabilities and performance issues.

  2. Use cluster mode to utilize all CPU cores — Node.js is single-threaded. PM2's cluster mode spawns one worker per core with automatic load balancing. Set max_memory_restart to prevent memory leaks from accumulating indefinitely.

  3. Implement structured logging with rotation — JSON-formatted logs enable automated parsing and alerting. Rotate logs daily and retain for 30-90 days. Ship logs to a centralized system (ELK, Loki) for cross-server analysis and long-term storage.

  4. Automate server hardening with reproducible scripts — Disable root SSH login, configure fail2ban, enable unattended security updates, and restrict firewall rules. Use Ansible or shell scripts to ensure every server has identical security configurations.

  5. Set up health checks that verify application logic, not just port availability — A health endpoint should verify database connectivity, cache availability, and critical service dependencies. A process listening on port 3000 with a broken database connection is not healthy.

Common Issues

Application restarts in a loop (crash loop) — Check PM2 logs with pm2 logs --err. Common causes: missing environment variables, unhandled promise rejections, port already in use. Set max_restarts: 10 and restart_delay: 5000 to prevent rapid restart loops.

Server runs out of disk space — Unrotated logs and temporary files fill disks gradually. Set up logrotate for all log files, clean /tmp periodically, and monitor disk usage with alerts at 80% capacity. Use du -sh /var/log/* to find the largest consumers.

High memory usage with no apparent leak — Node.js V8 heap grows to fill available memory before garbage collecting aggressively. Set --max-old-space-size to limit heap size. Take heap snapshots with process.memoryUsage() and compare over time to identify actual leaks.

Community

Reviews

Write a review

No reviews yet. Be the first to review this template!

Similar Templates