DevOps Expert Consultant

A DevOps expert following the DevOps Infinity Loop principle, ensuring continuous integration, delivery, and improvement across the entire software development lifecycle with expertise in CI/CD, infrastructure automation, monitoring, and team practices.

When to Use This Agent

Choose DevOps Expert Consultant when:

Implementing CI/CD pipelines and deployment automation
Setting up infrastructure as code and configuration management
Designing monitoring, logging, and alerting strategies
Improving development workflow efficiency and release frequency
Implementing DevOps practices (GitOps, ChatOps, InnerSource)

Consider alternatives when:

Designing cloud architecture (use a cloud architect agent)
Writing application code (use a development agent)
Managing specific cloud services (use the cloud-provider-specific agent)

Quick Start


# .claude/agents/devops-expert-consultant.yml
name: DevOps Expert Consultant
description: DevOps practices, CI/CD, and infrastructure automation
model: claude-sonnet
tools:
  - Read
  - Write
  - Edit
  - Bash
  - Glob
  - Grep
  - WebSearch

Example invocation:

claude "Assess our current DevOps maturity, identify the top 5 improvements, and create an implementation roadmap for the next quarter"

Core Concepts

DevOps Infinity Loop

Phase	Focus	Key Practices
Plan	Requirements, backlog	Sprint planning, value stream mapping
Code	Development, review	Trunk-based dev, pair programming
Build	Compilation, packaging	CI, automated builds, artifact management
Test	Quality verification	Automated testing, shift-left testing
Release	Deployment preparation	Feature flags, release branches
Deploy	Production delivery	CD, canary, blue-green, GitOps
Operate	Infrastructure management	IaC, auto-scaling, self-healing
Monitor	Observability	APM, logging, alerting, SLOs

DevOps Maturity Assessment


## DevOps Maturity: Current State

| Practice | Level | Target | Gap |
|----------|-------|--------|-----|
| Version Control | 4/5 (Git, branching strategy) | 5 | Trunk-based dev |
| CI | 3/5 (builds on PR) | 5 | Faster feedback, caching |
| CD | 2/5 (manual deploys) | 4 | Automated staging + prod |
| IaC | 1/5 (manual infra) | 4 | Terraform/Bicep for all |
| Testing | 3/5 (unit + some E2E) | 4 | Integration + contract |
| Monitoring | 2/5 (basic metrics) | 4 | APM, distributed tracing |
| Incident Mgmt | 2/5 (ad-hoc response) | 4 | Runbooks, post-mortems |
| Security | 2/5 (manual reviews) | 4 | SAST, DAST, dependency scan |

### Priority Improvements (Next Quarter)
1. Automate deployments to staging (CD)
2. Implement IaC for all environments
3. Add APM and structured logging
4. Implement automated security scanning
5. Establish incident response process

GitOps Workflow


# ArgoCD Application manifest
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
  name: myapp
  namespace: argocd
spec:
  project: default
  source:
    repoURL: https://github.com/myorg/myapp-manifests
    targetRevision: main
    path: overlays/production
  destination:
    server: https://kubernetes.default.svc
    namespace: production
  syncPolicy:
    automated:
      prune: true
      selfHeal: true
    syncOptions:
      - CreateNamespace=true
    retry:
      limit: 3
      backoff:
        duration: 5s
        factor: 2
        maxDuration: 1m

Configuration

Parameter	Description	Default
`ci_platform`	CI/CD platform (github-actions, gitlab, jenkins)	Auto-detect
`iac_tool`	Infrastructure as Code (terraform, bicep, pulumi)	Auto-detect
`gitops_tool`	GitOps platform (argocd, flux, none)	`none`
`monitoring_stack`	Observability tools (prometheus, datadog, newrelic)	Auto-detect
`container_runtime`	Container platform (docker, containerd, podman)	`docker`
`maturity_target`	Target maturity level (basic, intermediate, advanced)	`intermediate`

Best Practices

Implement trunk-based development with short-lived feature branches. Long-lived branches create merge conflicts, delayed integration, and hidden issues. Feature branches should live for 1-2 days maximum, merging to main through pull requests with automated checks. Use feature flags to deploy incomplete features safely. Trunk-based development enables continuous integration — long-lived branches make CI impossible by definition.
Automate everything that runs more than twice. If a deployment procedure, environment setup, or troubleshooting step is documented in a runbook, automate it. Manual procedures drift from documentation, introduce human error, and do not scale. Start with the most frequent manual tasks: environment provisioning, deployment, database migrations, and log collection. Each automation frees engineering time for higher-value work.
Shift security left by integrating scanning into CI. Run SAST (static analysis), dependency vulnerability scanning, and secret detection in every CI pipeline. Developers should see security findings in their pull request, not in a quarterly security audit. Tools like Trivy (containers), Snyk (dependencies), and Gitleaks (secrets) integrate into CI pipelines in minutes and catch issues before they reach production.
Define and measure deployment frequency, lead time, MTTR, and change failure rate. These four DORA metrics measure DevOps effectiveness. Track them from the start and set improvement targets. Elite teams deploy multiple times per day with less than one hour lead time, recover from incidents in less than one hour, and have a change failure rate below 15%. Measure these metrics to guide your improvement investments.
Treat infrastructure as cattle, not pets. Servers should be replaceable, not unique. Use immutable infrastructure: deploy new instances with updated configurations rather than modifying existing ones. Never SSH into production servers to make changes — if a change is needed, update the IaC code and redeploy. This makes infrastructure reproducible, scalable, and recoverable.

Common Issues

CI pipeline is too slow, discouraging developers from running it. A CI pipeline that takes 30+ minutes causes developers to batch changes, skip checks, and push without waiting for results. Profile each step, cache dependencies aggressively, parallelize independent jobs, and use larger runners. Target: CI completion in under 10 minutes for the critical path (lint + unit tests + build). Run slow checks (E2E, security scan) asynchronously and report results later.

Infrastructure drift between environments causes "works in staging, fails in production." Manual changes in one environment that are not replicated to others create invisible differences. Implement drift detection: periodically compare actual infrastructure state against IaC definitions. Use tools like terraform plan or az deployment group what-if in scheduled CI jobs to detect and alert on drift.

Team adopts DevOps tools without changing practices. Installing Jenkins, Terraform, and Datadog does not create a DevOps culture. DevOps requires practices: shared ownership of production reliability, blameless post-mortems, measurement-driven improvement, and collaboration between development and operations. Start with practices (daily standups between dev and ops, shared on-call rotation, post-mortem reviews) and let tool choices follow from the practices.

⚠️ Loading Issue

Devops Expert Consultant

DevOps Expert Consultant

When to Use This Agent

Quick Start

Core Concepts

DevOps Infinity Loop

DevOps Maturity Assessment

GitOps Workflow

Configuration

Best Practices

Common Issues

Reviews

Write a review

Similar Templates

API Endpoint Builder

Documentation Auto-Generator

Ai Ethics Advisor Partner