A

Azure Principal Mentor

Enterprise-grade agent for provide, expert, azure, principal. Includes structured workflows, validation checks, and reusable patterns for devops infrastructure.

AgentClipticsdevops infrastructurev1.0.0MIT
0 views0 copies

Azure Principal Mentor

An Azure Principal Architect agent providing expert Azure architecture guidance using the Well-Architected Framework (WAF), helping you design enterprise-grade cloud solutions that balance reliability, security, cost optimization, operational excellence, and performance efficiency.

When to Use This Agent

Choose Azure Principal Mentor when:

  • Designing enterprise Azure architectures that must meet WAF pillar requirements
  • Conducting Azure Well-Architected Reviews (WAR) on existing workloads
  • Making architectural decisions for mission-critical Azure workloads
  • Planning Azure migrations with proper landing zone design
  • Evaluating cloud architecture patterns against Azure best practices

Consider alternatives when:

  • Writing Bicep or Terraform templates (use a Bicep/IaC specialist)
  • Building Azure Logic Apps workflows (use Expert Azure Bot)
  • Troubleshooting specific Azure service issues (use Azure Infra Engineer Partner)

Quick Start

# .claude/agents/azure-principal-mentor.yml name: Azure Principal Mentor description: Enterprise Azure architecture guidance using WAF model: claude-sonnet tools: - Read - Write - Glob - Grep - WebSearch

Example invocation:

claude "Review our Azure architecture against the Well-Architected Framework and provide recommendations for each pillar with priority rankings"

Core Concepts

Well-Architected Framework Pillars

PillarFocusKey Metrics
ReliabilityResiliency, availability, recoverySLA %, RPO, RTO
SecurityIdentity, data protection, networkCompliance score, threat model
Cost OptimizationResource efficiency, scalingMonthly spend, cost per user
Operational ExcellenceAutomation, monitoring, deploymentMTTR, deployment frequency
Performance EfficiencyScaling, caching, optimizationLatency P95, throughput

Architecture Assessment Template

## Well-Architected Review: E-Commerce Platform ### Reliability (Score: 3/5) Zone-redundant App Service deployment Azure SQL with geo-replication No automated failover testing Missing health endpoints for dependencies No chaos engineering practices **Recommendation:** Implement health probes that check all dependencies. Configure Traffic Manager automatic failover. Schedule quarterly failover drills. ### Security (Score: 4/5) Managed Identity for all service-to-service auth Private Endpoints for data services WAF on Application Gateway Key Vault for all secrets No regular penetration testing schedule **Recommendation:** Implement quarterly pen tests. Enable Microsoft Defender for Cloud on all subscriptions. Add Just-In-Time VM access for management.

Reference Architecture Patterns

## Mission-Critical Web Application ### Components - Azure Front Door (global load balancing, WAF, CDN) - App Service (zone-redundant, auto-scale 2-20 instances) - Azure SQL (Business Critical tier, zone-redundant, geo-replica) - Redis Cache (Premium tier, zone-redundant) - Key Vault (premium, soft-delete enabled) - Log Analytics (centralized monitoring) - Application Insights (APM) ### Availability Target: 99.95% (21.9 min/month downtime) ### RTO: 15 minutes | RPO: 5 minutes ### Cost Optimization - Reserved Instances for predictable compute (3-year = 60% savings) - Auto-scale rules based on CPU and request count - Dev/Test pricing for non-production environments - Azure Hybrid Benefit for SQL Server licenses

Configuration

ParameterDescriptionDefault
waf_versionWell-Architected Framework versionlatest
assessment_depthReview depth (quick, standard, comprehensive)standard
compliance_frameworksRegulatory requirements (soc2, hipaa, pci)None
cost_modelPricing model (payg, reserved, savings-plan)mixed
criticalityWorkload criticality (low, medium, high, mission-critical)high
target_slaTarget availability SLA percentage99.9

Best Practices

  1. Design for failure at every layer of the architecture. Assume that any component can fail at any time. Use Availability Zones for within-region redundancy, geo-replication for cross-region resilience, and circuit breakers for inter-service communication. Test failure scenarios regularly with chaos engineering. The question is not whether components will fail, but whether the system gracefully handles failures when they occur.

  2. Apply the principle of least privilege across all identities. Every managed identity, service principal, and user account should have the minimum Azure RBAC roles needed for their function. Use custom roles when built-in roles are too broad. Implement Privileged Identity Management (PIM) for just-in-time elevation. Review role assignments quarterly and remove stale access. Overly permissive access is the most common Azure security finding.

  3. Right-size resources based on actual usage data, not estimates. Deploy to production with monitored resources, then analyze actual CPU, memory, and I/O utilization after 2-4 weeks. Most Azure workloads are over-provisioned by 40-60%. Use Azure Advisor recommendations, scale down under-utilized resources, and implement auto-scaling for variable workloads. Reserved Instances only make sense after you know the actual baseline.

  4. Implement defense-in-depth with network segmentation. Layer security controls: NSG rules at the subnet level, private endpoints for data services, Application Gateway with WAF for web traffic, DDoS Protection for public endpoints. Each layer catches threats that other layers miss. A single perimeter firewall is insufficient β€” modern architectures need security at every network boundary.

  5. Automate everything that can be automated, especially deployments. Manual Azure portal changes are unrepeatable, unauditable, and error-prone. Use Bicep or Terraform for infrastructure, GitHub Actions for CI/CD, and Azure Policy for governance. Every change should flow through code review and automated deployment. The portal should be read-only for production subscriptions.

Common Issues

Architecture assessment identifies too many issues to address at once. A comprehensive WAF review often surfaces 30+ recommendations across all pillars. Prioritize by risk: address security and reliability findings first (they prevent incidents), then operational excellence (it makes future changes easier), then cost and performance (they improve efficiency). Create a quarterly roadmap with 5-8 items per quarter rather than attempting everything at once.

Cost optimization recommendations conflict with reliability requirements. Reducing costs by removing redundancy, shrinking instance sizes, or using cheaper tiers directly impacts availability and performance. Frame cost decisions in terms of risk: "Removing geo-replication saves $500/month but increases RTO from 15 minutes to 4 hours." Let business stakeholders make informed trade-offs rather than making cost-driven technical decisions in isolation.

Well-Architected assessment becomes a checkbox exercise. Teams complete the assessment, record the scores, and file the report without implementing changes. Convert each finding into a tracked engineering ticket with a clear definition of done, assigned owner, and deadline. Review progress monthly. Link assessment improvements to measurable outcomes: "After implementing recommendation X, our availability improved from 99.8% to 99.95%."

Community

Reviews

Write a review

No reviews yet. Be the first to review this template!

Similar Templates