Setup Monitoring Observability Processor
Powerful command for setup, comprehensive, monitoring, observability. Includes structured workflows, validation checks, and reusable patterns for setup.
Setup Monitoring Observability Processor
Deploy a complete observability stack with metrics collection, centralized logging, distributed tracing, alerting, and dashboards using Prometheus, Grafana, ELK, or cloud-native solutions.
When to Use This Command
Run this command when...
- You need to set up application metrics, infrastructure monitoring, and custom dashboards for your production services
- You want centralized logging with structured log formats, aggregation, and search capabilities
- You need distributed tracing across microservices to identify performance bottlenecks and request flow issues
- You want an alerting system with smart thresholds, escalation policies, and notification channels
- You are building an observability stack using Prometheus, Grafana, Jaeger, or cloud-native alternatives
Quick Start
# .claude/commands/setup-monitoring-observability-processor.yaml name: Setup Monitoring Observability Processor description: Deploy complete observability stack with metrics, logs, and traces inputs: - name: focus description: "metrics, logging, tracing, alerting, or all" default: "all"
# Setup complete observability stack claude "setup-monitoring-observability --focus all" # Setup metrics with Prometheus + Grafana claude "setup-monitoring-observability --focus metrics --platform prometheus" # Setup centralized logging claude "setup-monitoring-observability --focus logging --platform elk"
Output:
[detect] Application: Express.js with 12 API endpoints
[metrics] Prometheus exporter configured (/metrics)
[metrics] Grafana dashboard with 8 panels created
[logging] Winston structured logging configured
[logging] Log aggregation via Loki
[tracing] OpenTelemetry SDK integrated
[alerting] 5 alert rules configured (error rate, latency, CPU)
Done. Observability stack ready. Access Grafana at :3001
Core Concepts
| Concept | Description |
|---|---|
| Metrics Collection | Prometheus-compatible metrics: request rate, error rate, latency percentiles, custom business KPIs |
| Centralized Logging | Structured JSON logs aggregated via Loki, ELK, or CloudWatch with search and filtering |
| Distributed Tracing | OpenTelemetry spans across service boundaries for request flow visualization and bottleneck detection |
| Smart Alerting | Threshold-based and anomaly-detection alerts with escalation policies and silence windows |
| Dashboards | Pre-built Grafana dashboards for RED metrics (Rate, Errors, Duration) and system health |
Observability Stack:
ββββββββββββββββββββββββββββββββββββββββ
β Application β
β βββββββββββ¬βββββββββββ¬ββββββββββββ β
β β Metrics β Logging β Tracing β β
β β prom- β winston/ β OpenTele- β β
β β client β pino β metry SDK β β
β ββββββ¬βββββ΄βββββ¬ββββββ΄ββββββ¬ββββββ β
βββββββββΌββββββββββΌββββββββββββΌβββββββββ
βΌ βΌ βΌ
ββββββββββββ βββββββββββ βββββββββββ
βPrometheusβ β Loki / β β Jaeger /β
β β β ELK β β Tempo β
ββββββ¬ββββββ ββββββ¬βββββ ββββββ¬βββββ
ββββββββββββββΌββββββββββββ
βΌ
ββββββββββββ
β Grafana β
βDashboard β
ββββββββββββ
Configuration
| Parameter | Type | Default | Description |
|---|---|---|---|
focus | string | "all" | Component: metrics, logging, tracing, alerting, or all |
platform | string | "prometheus" | Stack: prometheus (+ Grafana), elk, datadog, cloudwatch, or newrelic |
retention | string | "30d" | Data retention period for metrics and logs |
alerts | boolean | true | Configure alerting rules and notification channels |
docker | boolean | true | Include Docker Compose files for the monitoring infrastructure |
Best Practices
- Instrument RED metrics first -- Rate, Errors, and Duration cover 90% of monitoring needs. Start with these three metrics per service before adding custom business KPIs.
- Use structured logging from day one -- JSON-formatted logs are searchable and parseable. Switching from unstructured to structured logging in a running system is painful and error-prone.
- Set alert thresholds based on SLOs -- Define Service Level Objectives first (e.g., 99.9% uptime, P95 latency < 200ms), then create alerts that fire when SLO budgets are at risk.
- Add trace context to logs -- Include trace IDs in log entries so you can correlate logs with distributed traces for a complete picture of request handling.
- Review dashboards weekly -- Unused dashboards and stale alerts accumulate noise. Prune panels that nobody looks at and adjust alert thresholds that cause false positives.
Common Issues
- High cardinality metric labels -- Using user IDs or request paths as metric labels creates millions of time series and crashes Prometheus. Use bounded label values like status codes or endpoint groups.
- Log volume overwhelming storage -- Debug-level logging in production generates massive volumes. Set log level to
infoin production and usedebugonly in development or when actively investigating issues. - Trace sampling rate too high -- Tracing every request adds overhead and storage cost. Start with 10% sampling and increase for specific services or error scenarios.
Reviews
No reviews yet. Be the first to review this template!
Similar Templates
Git Commit Message Generator
Generates well-structured conventional commit messages by analyzing staged changes. Follows Conventional Commits spec with scope detection.
React Component Scaffolder
Scaffolds a complete React component with TypeScript types, Tailwind styles, Storybook stories, and unit tests. Follows project conventions automatically.
CI/CD Pipeline Generator
Generates GitHub Actions workflows for CI/CD including linting, testing, building, and deploying. Detects project stack automatically.