Devops Expert Consultant
Comprehensive agent designed for devops, specialist, following, infinity. Includes structured workflows, validation checks, and reusable patterns for devops infrastructure.
DevOps Expert Consultant
A DevOps expert following the DevOps Infinity Loop principle, ensuring continuous integration, delivery, and improvement across the entire software development lifecycle with expertise in CI/CD, infrastructure automation, monitoring, and team practices.
When to Use This Agent
Choose DevOps Expert Consultant when:
- Implementing CI/CD pipelines and deployment automation
- Setting up infrastructure as code and configuration management
- Designing monitoring, logging, and alerting strategies
- Improving development workflow efficiency and release frequency
- Implementing DevOps practices (GitOps, ChatOps, InnerSource)
Consider alternatives when:
- Designing cloud architecture (use a cloud architect agent)
- Writing application code (use a development agent)
- Managing specific cloud services (use the cloud-provider-specific agent)
Quick Start
# .claude/agents/devops-expert-consultant.yml name: DevOps Expert Consultant description: DevOps practices, CI/CD, and infrastructure automation model: claude-sonnet tools: - Read - Write - Edit - Bash - Glob - Grep - WebSearch
Example invocation:
claude "Assess our current DevOps maturity, identify the top 5 improvements, and create an implementation roadmap for the next quarter"
Core Concepts
DevOps Infinity Loop
| Phase | Focus | Key Practices |
|---|---|---|
| Plan | Requirements, backlog | Sprint planning, value stream mapping |
| Code | Development, review | Trunk-based dev, pair programming |
| Build | Compilation, packaging | CI, automated builds, artifact management |
| Test | Quality verification | Automated testing, shift-left testing |
| Release | Deployment preparation | Feature flags, release branches |
| Deploy | Production delivery | CD, canary, blue-green, GitOps |
| Operate | Infrastructure management | IaC, auto-scaling, self-healing |
| Monitor | Observability | APM, logging, alerting, SLOs |
DevOps Maturity Assessment
## DevOps Maturity: Current State | Practice | Level | Target | Gap | |----------|-------|--------|-----| | Version Control | 4/5 (Git, branching strategy) | 5 | Trunk-based dev | | CI | 3/5 (builds on PR) | 5 | Faster feedback, caching | | CD | 2/5 (manual deploys) | 4 | Automated staging + prod | | IaC | 1/5 (manual infra) | 4 | Terraform/Bicep for all | | Testing | 3/5 (unit + some E2E) | 4 | Integration + contract | | Monitoring | 2/5 (basic metrics) | 4 | APM, distributed tracing | | Incident Mgmt | 2/5 (ad-hoc response) | 4 | Runbooks, post-mortems | | Security | 2/5 (manual reviews) | 4 | SAST, DAST, dependency scan | ### Priority Improvements (Next Quarter) 1. Automate deployments to staging (CD) 2. Implement IaC for all environments 3. Add APM and structured logging 4. Implement automated security scanning 5. Establish incident response process
GitOps Workflow
# ArgoCD Application manifest apiVersion: argoproj.io/v1alpha1 kind: Application metadata: name: myapp namespace: argocd spec: project: default source: repoURL: https://github.com/myorg/myapp-manifests targetRevision: main path: overlays/production destination: server: https://kubernetes.default.svc namespace: production syncPolicy: automated: prune: true selfHeal: true syncOptions: - CreateNamespace=true retry: limit: 3 backoff: duration: 5s factor: 2 maxDuration: 1m
Configuration
| Parameter | Description | Default |
|---|---|---|
ci_platform | CI/CD platform (github-actions, gitlab, jenkins) | Auto-detect |
iac_tool | Infrastructure as Code (terraform, bicep, pulumi) | Auto-detect |
gitops_tool | GitOps platform (argocd, flux, none) | none |
monitoring_stack | Observability tools (prometheus, datadog, newrelic) | Auto-detect |
container_runtime | Container platform (docker, containerd, podman) | docker |
maturity_target | Target maturity level (basic, intermediate, advanced) | intermediate |
Best Practices
-
Implement trunk-based development with short-lived feature branches. Long-lived branches create merge conflicts, delayed integration, and hidden issues. Feature branches should live for 1-2 days maximum, merging to main through pull requests with automated checks. Use feature flags to deploy incomplete features safely. Trunk-based development enables continuous integration — long-lived branches make CI impossible by definition.
-
Automate everything that runs more than twice. If a deployment procedure, environment setup, or troubleshooting step is documented in a runbook, automate it. Manual procedures drift from documentation, introduce human error, and do not scale. Start with the most frequent manual tasks: environment provisioning, deployment, database migrations, and log collection. Each automation frees engineering time for higher-value work.
-
Shift security left by integrating scanning into CI. Run SAST (static analysis), dependency vulnerability scanning, and secret detection in every CI pipeline. Developers should see security findings in their pull request, not in a quarterly security audit. Tools like Trivy (containers), Snyk (dependencies), and Gitleaks (secrets) integrate into CI pipelines in minutes and catch issues before they reach production.
-
Define and measure deployment frequency, lead time, MTTR, and change failure rate. These four DORA metrics measure DevOps effectiveness. Track them from the start and set improvement targets. Elite teams deploy multiple times per day with less than one hour lead time, recover from incidents in less than one hour, and have a change failure rate below 15%. Measure these metrics to guide your improvement investments.
-
Treat infrastructure as cattle, not pets. Servers should be replaceable, not unique. Use immutable infrastructure: deploy new instances with updated configurations rather than modifying existing ones. Never SSH into production servers to make changes — if a change is needed, update the IaC code and redeploy. This makes infrastructure reproducible, scalable, and recoverable.
Common Issues
CI pipeline is too slow, discouraging developers from running it. A CI pipeline that takes 30+ minutes causes developers to batch changes, skip checks, and push without waiting for results. Profile each step, cache dependencies aggressively, parallelize independent jobs, and use larger runners. Target: CI completion in under 10 minutes for the critical path (lint + unit tests + build). Run slow checks (E2E, security scan) asynchronously and report results later.
Infrastructure drift between environments causes "works in staging, fails in production." Manual changes in one environment that are not replicated to others create invisible differences. Implement drift detection: periodically compare actual infrastructure state against IaC definitions. Use tools like terraform plan or az deployment group what-if in scheduled CI jobs to detect and alert on drift.
Team adopts DevOps tools without changing practices. Installing Jenkins, Terraform, and Datadog does not create a DevOps culture. DevOps requires practices: shared ownership of production reliability, blameless post-mortems, measurement-driven improvement, and collaboration between development and operations. Start with practices (daily standups between dev and ops, shared on-call rotation, post-mortem reviews) and let tool choices follow from the practices.
Reviews
No reviews yet. Be the first to review this template!
Similar Templates
API Endpoint Builder
Agent that scaffolds complete REST API endpoints with controller, service, route, types, and tests. Supports Express, Fastify, and NestJS.
Documentation Auto-Generator
Agent that reads your codebase and generates comprehensive documentation including API docs, architecture guides, and setup instructions.
Ai Ethics Advisor Partner
All-in-one agent covering ethics, responsible, development, specialist. Includes structured workflows, validation checks, and reusable patterns for ai specialists.