SLO Implement Command
Define and implement Service Level Objectives (SLOs) with corresponding SLIs, error budgets, and alerting rules. Generates monitoring configurations for Prometheus, Grafana, Datadog, or custom metrics systems to track service reliability.
Command
/slo-implement
Description
Helps you define meaningful SLOs for your services, generates the monitoring configuration to track them, and sets up error budget alerts. Follows Google SRE best practices for reliability engineering.
Behavior
- Analyze the service to understand user-facing interactions
- Propose SLIs (Service Level Indicators) aligned with user experience
- Set SLO targets with justification
- Calculate error budgets and burn rate alerts
- Generate monitoring configuration (Prometheus rules, Grafana dashboards)
SLI Types
| Type | Measures | Example |
|---|---|---|
| Availability | Successful requests / total requests | 99.9% of requests return non-5xx |
| Latency | Requests faster than threshold | 95% of requests complete < 200ms |
| Correctness | Correct results / total results | 99.99% of calculations are accurate |
| Freshness | Data updated within threshold | 99% of data updated within 1 minute |
| Throughput | Processed items / expected items | 99.5% of queue items processed |
Output Format
SLO Definition Document
service: payment-api slos: - name: availability description: "Payment API returns successful responses" sli: type: availability good_events: "http_requests_total{status!~'5..'}" total_events: "http_requests_total" target: 99.95% window: 30d error_budget: 0.05% # ~21.6 minutes/month consequences: budget_exhausted: "Freeze deployments, focus on reliability" 50_percent_remaining: "Increase monitoring, limit risky changes" - name: latency description: "Payment API responds quickly" sli: type: latency threshold: 500ms good_events: "http_request_duration_seconds_bucket{le='0.5'}" total_events: "http_requests_total" target: 99% window: 30d
Prometheus Recording Rules
groups: - name: slo_payment_api interval: 30s rules: # Availability SLI - record: slo:payment_api:availability:ratio_rate5m expr: | sum(rate(http_requests_total{service="payment-api",status!~"5.."}[5m])) / sum(rate(http_requests_total{service="payment-api"}[5m])) # Error budget remaining (30-day window) - record: slo:payment_api:availability:error_budget_remaining expr: | 1 - ( (1 - slo:payment_api:availability:ratio_rate30d) / (1 - 0.9995) )
Burn Rate Alerts
groups: - name: slo_alerts_payment_api rules: # Fast burn: 14.4x budget consumption (pages in 2 hours) - alert: PaymentAPIHighErrorBurnRate expr: | slo:payment_api:availability:error_rate5m > (14.4 * 0.0005) and slo:payment_api:availability:error_rate1h > (14.4 * 0.0005) for: 2m labels: severity: critical annotations: summary: "Payment API burning error budget 14.4x faster than sustainable" budget_remaining: "{{ $value | humanizePercentage }}" # Slow burn: 3x budget consumption (tickets in 3 days) - alert: PaymentAPIElevatedErrorRate expr: | slo:payment_api:availability:error_rate6h > (3 * 0.0005) for: 30m labels: severity: warning
Rules
- SLOs should reflect user experience, not system internals
- Start conservative - it's easier to tighten SLOs than loosen them
- Every SLO needs an error budget policy defining actions when budget is low
- Max 3-5 SLOs per service to maintain focus
- Review SLOs quarterly and adjust based on actual performance
Examples
# Define SLOs interactively for a service /slo-implement payment-api # Generate Prometheus rules /slo-implement payment-api --format prometheus # Generate Datadog monitors /slo-implement payment-api --format datadog
Reviews
No reviews yet. Be the first to review this template!
Similar Templates
Git Commit Message Generator
Generates well-structured conventional commit messages by analyzing staged changes. Follows Conventional Commits spec with scope detection.
React Component Scaffolder
Scaffolds a complete React component with TypeScript types, Tailwind styles, Storybook stories, and unit tests. Follows project conventions automatically.
CI/CD Pipeline Generator
Generates GitHub Actions workflows for CI/CD including linting, testing, building, and deploying. Detects project stack automatically.