SLO Implement Command

Command

/slo-implement

Description

Helps you define meaningful SLOs for your services, generates the monitoring configuration to track them, and sets up error budget alerts. Follows Google SRE best practices for reliability engineering.

Behavior

Analyze the service to understand user-facing interactions
Propose SLIs (Service Level Indicators) aligned with user experience
Set SLO targets with justification
Calculate error budgets and burn rate alerts
Generate monitoring configuration (Prometheus rules, Grafana dashboards)

SLI Types

Type	Measures	Example
Availability	Successful requests / total requests	99.9% of requests return non-5xx
Latency	Requests faster than threshold	95% of requests complete < 200ms
Correctness	Correct results / total results	99.99% of calculations are accurate
Freshness	Data updated within threshold	99% of data updated within 1 minute
Throughput	Processed items / expected items	99.5% of queue items processed

Output Format

SLO Definition Document


service: payment-api
slos:
  - name: availability
    description: "Payment API returns successful responses"
    sli:
      type: availability
      good_events: "http_requests_total{status!~'5..'}"
      total_events: "http_requests_total"
    target: 99.95%
    window: 30d
    error_budget: 0.05%  # ~21.6 minutes/month
    consequences:
      budget_exhausted: "Freeze deployments, focus on reliability"
      50_percent_remaining: "Increase monitoring, limit risky changes"

  - name: latency
    description: "Payment API responds quickly"
    sli:
      type: latency
      threshold: 500ms
      good_events: "http_request_duration_seconds_bucket{le='0.5'}"
      total_events: "http_requests_total"
    target: 99%
    window: 30d

Prometheus Recording Rules


groups:
  - name: slo_payment_api
    interval: 30s
    rules:
      # Availability SLI
      - record: slo:payment_api:availability:ratio_rate5m
        expr: |
          sum(rate(http_requests_total{service="payment-api",status!~"5.."}[5m]))
          /
          sum(rate(http_requests_total{service="payment-api"}[5m]))

      # Error budget remaining (30-day window)
      - record: slo:payment_api:availability:error_budget_remaining
        expr: |
          1 - (
            (1 - slo:payment_api:availability:ratio_rate30d)
            / (1 - 0.9995)
          )

Burn Rate Alerts


groups:
  - name: slo_alerts_payment_api
    rules:
      # Fast burn: 14.4x budget consumption (pages in 2 hours)
      - alert: PaymentAPIHighErrorBurnRate
        expr: |
          slo:payment_api:availability:error_rate5m > (14.4 * 0.0005)
          and
          slo:payment_api:availability:error_rate1h > (14.4 * 0.0005)
        for: 2m
        labels:
          severity: critical
        annotations:
          summary: "Payment API burning error budget 14.4x faster than sustainable"
          budget_remaining: "{{ $value | humanizePercentage }}"

      # Slow burn: 3x budget consumption (tickets in 3 days)
      - alert: PaymentAPIElevatedErrorRate
        expr: |
          slo:payment_api:availability:error_rate6h > (3 * 0.0005)
        for: 30m
        labels:
          severity: warning

Rules

SLOs should reflect user experience, not system internals
Start conservative - it's easier to tighten SLOs than loosen them
Every SLO needs an error budget policy defining actions when budget is low
Max 3-5 SLOs per service to maintain focus
Review SLOs quarterly and adjust based on actual performance

Examples


# Define SLOs interactively for a service
/slo-implement payment-api

# Generate Prometheus rules
/slo-implement payment-api --format prometheus

# Generate Datadog monitors
/slo-implement payment-api --format datadog

⚠️ Loading Issue

Command

Description

Behavior

SLI Types

Output Format

SLO Definition Document

Prometheus Recording Rules

Burn Rate Alerts

Rules

Examples

Reviews

Write a review

Similar Templates

Git Commit Message Generator

React Component Scaffolder

CI/CD Pipeline Generator