S

Setup Monitoring Observability Processor

Powerful command for setup, comprehensive, monitoring, observability. Includes structured workflows, validation checks, and reusable patterns for setup.

CommandClipticssetupv1.0.0MIT
0 views0 copies

Setup Monitoring Observability Processor

Deploy a complete observability stack with metrics collection, centralized logging, distributed tracing, alerting, and dashboards using Prometheus, Grafana, ELK, or cloud-native solutions.

When to Use This Command

Run this command when...

  • You need to set up application metrics, infrastructure monitoring, and custom dashboards for your production services
  • You want centralized logging with structured log formats, aggregation, and search capabilities
  • You need distributed tracing across microservices to identify performance bottlenecks and request flow issues
  • You want an alerting system with smart thresholds, escalation policies, and notification channels
  • You are building an observability stack using Prometheus, Grafana, Jaeger, or cloud-native alternatives

Quick Start

# .claude/commands/setup-monitoring-observability-processor.yaml name: Setup Monitoring Observability Processor description: Deploy complete observability stack with metrics, logs, and traces inputs: - name: focus description: "metrics, logging, tracing, alerting, or all" default: "all"
# Setup complete observability stack claude "setup-monitoring-observability --focus all" # Setup metrics with Prometheus + Grafana claude "setup-monitoring-observability --focus metrics --platform prometheus" # Setup centralized logging claude "setup-monitoring-observability --focus logging --platform elk"
Output:
  [detect] Application: Express.js with 12 API endpoints
  [metrics] Prometheus exporter configured (/metrics)
  [metrics] Grafana dashboard with 8 panels created
  [logging] Winston structured logging configured
  [logging] Log aggregation via Loki
  [tracing] OpenTelemetry SDK integrated
  [alerting] 5 alert rules configured (error rate, latency, CPU)
  Done. Observability stack ready. Access Grafana at :3001

Core Concepts

ConceptDescription
Metrics CollectionPrometheus-compatible metrics: request rate, error rate, latency percentiles, custom business KPIs
Centralized LoggingStructured JSON logs aggregated via Loki, ELK, or CloudWatch with search and filtering
Distributed TracingOpenTelemetry spans across service boundaries for request flow visualization and bottleneck detection
Smart AlertingThreshold-based and anomaly-detection alerts with escalation policies and silence windows
DashboardsPre-built Grafana dashboards for RED metrics (Rate, Errors, Duration) and system health
Observability Stack:
  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
  β”‚           Application                β”‚
  β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”‚
  β”‚  β”‚ Metrics β”‚ Logging  β”‚  Tracing  β”‚  β”‚
  β”‚  β”‚ prom-   β”‚ winston/ β”‚ OpenTele- β”‚  β”‚
  β”‚  β”‚ client  β”‚ pino     β”‚ metry SDK β”‚  β”‚
  β”‚  β””β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”˜  β”‚
  β””β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”˜
          β–Ό         β–Ό           β–Ό
  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”
  β”‚Prometheusβ”‚ β”‚  Loki / β”‚ β”‚ Jaeger /β”‚
  β”‚          β”‚ β”‚   ELK   β”‚ β”‚ Tempo   β”‚
  β””β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”˜
       β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                    β–Ό
              β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
              β”‚ Grafana  β”‚
              β”‚Dashboard β”‚
              β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Configuration

ParameterTypeDefaultDescription
focusstring"all"Component: metrics, logging, tracing, alerting, or all
platformstring"prometheus"Stack: prometheus (+ Grafana), elk, datadog, cloudwatch, or newrelic
retentionstring"30d"Data retention period for metrics and logs
alertsbooleantrueConfigure alerting rules and notification channels
dockerbooleantrueInclude Docker Compose files for the monitoring infrastructure

Best Practices

  1. Instrument RED metrics first -- Rate, Errors, and Duration cover 90% of monitoring needs. Start with these three metrics per service before adding custom business KPIs.
  2. Use structured logging from day one -- JSON-formatted logs are searchable and parseable. Switching from unstructured to structured logging in a running system is painful and error-prone.
  3. Set alert thresholds based on SLOs -- Define Service Level Objectives first (e.g., 99.9% uptime, P95 latency < 200ms), then create alerts that fire when SLO budgets are at risk.
  4. Add trace context to logs -- Include trace IDs in log entries so you can correlate logs with distributed traces for a complete picture of request handling.
  5. Review dashboards weekly -- Unused dashboards and stale alerts accumulate noise. Prune panels that nobody looks at and adjust alert thresholds that cause false positives.

Common Issues

  1. High cardinality metric labels -- Using user IDs or request paths as metric labels creates millions of time series and crashes Prometheus. Use bounded label values like status codes or endpoint groups.
  2. Log volume overwhelming storage -- Debug-level logging in production generates massive volumes. Set log level to info in production and use debug only in development or when actively investigating issues.
  3. Trace sampling rate too high -- Tracing every request adds overhead and storage cost. Start with 10% sampling and increase for specific services or error scenarios.
Community

Reviews

Write a review

No reviews yet. Be the first to review this template!

Similar Templates