Azure Principal Mentor

An Azure Principal Architect agent providing expert Azure architecture guidance using the Well-Architected Framework (WAF), helping you design enterprise-grade cloud solutions that balance reliability, security, cost optimization, operational excellence, and performance efficiency.

When to Use This Agent

Choose Azure Principal Mentor when:

Designing enterprise Azure architectures that must meet WAF pillar requirements
Conducting Azure Well-Architected Reviews (WAR) on existing workloads
Making architectural decisions for mission-critical Azure workloads
Planning Azure migrations with proper landing zone design
Evaluating cloud architecture patterns against Azure best practices

Consider alternatives when:

Writing Bicep or Terraform templates (use a Bicep/IaC specialist)
Building Azure Logic Apps workflows (use Expert Azure Bot)
Troubleshooting specific Azure service issues (use Azure Infra Engineer Partner)

Quick Start


# .claude/agents/azure-principal-mentor.yml
name: Azure Principal Mentor
description: Enterprise Azure architecture guidance using WAF
model: claude-sonnet
tools:
  - Read
  - Write
  - Glob
  - Grep
  - WebSearch

Example invocation:

claude "Review our Azure architecture against the Well-Architected Framework and provide recommendations for each pillar with priority rankings"

Core Concepts

Well-Architected Framework Pillars

Pillar	Focus	Key Metrics
Reliability	Resiliency, availability, recovery	SLA %, RPO, RTO
Security	Identity, data protection, network	Compliance score, threat model
Cost Optimization	Resource efficiency, scaling	Monthly spend, cost per user
Operational Excellence	Automation, monitoring, deployment	MTTR, deployment frequency
Performance Efficiency	Scaling, caching, optimization	Latency P95, throughput

Architecture Assessment Template


## Well-Architected Review: E-Commerce Platform

### Reliability (Score: 3/5)
 Zone-redundant App Service deployment
 Azure SQL with geo-replication
 No automated failover testing
 Missing health endpoints for dependencies
 No chaos engineering practices

**Recommendation:** Implement health probes that check all
dependencies. Configure Traffic Manager automatic failover.
Schedule quarterly failover drills.

### Security (Score: 4/5)
 Managed Identity for all service-to-service auth
 Private Endpoints for data services
 WAF on Application Gateway
 Key Vault for all secrets
 No regular penetration testing schedule

**Recommendation:** Implement quarterly pen tests. Enable
Microsoft Defender for Cloud on all subscriptions. Add
Just-In-Time VM access for management.

Reference Architecture Patterns


## Mission-Critical Web Application

### Components
- Azure Front Door (global load balancing, WAF, CDN)
- App Service (zone-redundant, auto-scale 2-20 instances)
- Azure SQL (Business Critical tier, zone-redundant, geo-replica)
- Redis Cache (Premium tier, zone-redundant)
- Key Vault (premium, soft-delete enabled)
- Log Analytics (centralized monitoring)
- Application Insights (APM)

### Availability Target: 99.95% (21.9 min/month downtime)
### RTO: 15 minutes | RPO: 5 minutes

### Cost Optimization
- Reserved Instances for predictable compute (3-year = 60% savings)
- Auto-scale rules based on CPU and request count
- Dev/Test pricing for non-production environments
- Azure Hybrid Benefit for SQL Server licenses

Configuration

Parameter	Description	Default
`waf_version`	Well-Architected Framework version	`latest`
`assessment_depth`	Review depth (quick, standard, comprehensive)	`standard`
`compliance_frameworks`	Regulatory requirements (soc2, hipaa, pci)	None
`cost_model`	Pricing model (payg, reserved, savings-plan)	`mixed`
`criticality`	Workload criticality (low, medium, high, mission-critical)	`high`
`target_sla`	Target availability SLA percentage	`99.9`

Best Practices

Design for failure at every layer of the architecture. Assume that any component can fail at any time. Use Availability Zones for within-region redundancy, geo-replication for cross-region resilience, and circuit breakers for inter-service communication. Test failure scenarios regularly with chaos engineering. The question is not whether components will fail, but whether the system gracefully handles failures when they occur.
Apply the principle of least privilege across all identities. Every managed identity, service principal, and user account should have the minimum Azure RBAC roles needed for their function. Use custom roles when built-in roles are too broad. Implement Privileged Identity Management (PIM) for just-in-time elevation. Review role assignments quarterly and remove stale access. Overly permissive access is the most common Azure security finding.
Right-size resources based on actual usage data, not estimates. Deploy to production with monitored resources, then analyze actual CPU, memory, and I/O utilization after 2-4 weeks. Most Azure workloads are over-provisioned by 40-60%. Use Azure Advisor recommendations, scale down under-utilized resources, and implement auto-scaling for variable workloads. Reserved Instances only make sense after you know the actual baseline.
Implement defense-in-depth with network segmentation. Layer security controls: NSG rules at the subnet level, private endpoints for data services, Application Gateway with WAF for web traffic, DDoS Protection for public endpoints. Each layer catches threats that other layers miss. A single perimeter firewall is insufficient — modern architectures need security at every network boundary.
Automate everything that can be automated, especially deployments. Manual Azure portal changes are unrepeatable, unauditable, and error-prone. Use Bicep or Terraform for infrastructure, GitHub Actions for CI/CD, and Azure Policy for governance. Every change should flow through code review and automated deployment. The portal should be read-only for production subscriptions.

Common Issues

Architecture assessment identifies too many issues to address at once. A comprehensive WAF review often surfaces 30+ recommendations across all pillars. Prioritize by risk: address security and reliability findings first (they prevent incidents), then operational excellence (it makes future changes easier), then cost and performance (they improve efficiency). Create a quarterly roadmap with 5-8 items per quarter rather than attempting everything at once.

Cost optimization recommendations conflict with reliability requirements. Reducing costs by removing redundancy, shrinking instance sizes, or using cheaper tiers directly impacts availability and performance. Frame cost decisions in terms of risk: "Removing geo-replication saves $500/month but increases RTO from 15 minutes to 4 hours." Let business stakeholders make informed trade-offs rather than making cost-driven technical decisions in isolation.

Well-Architected assessment becomes a checkbox exercise. Teams complete the assessment, record the scores, and file the report without implementing changes. Convert each finding into a tracked engineering ticket with a clear definition of done, assigned owner, and deadline. Review progress monthly. Link assessment improvements to measurable outcomes: "After implementing recommendation X, our availability improved from 99.8% to 99.95%."

⚠️ Loading Issue

Azure Principal Mentor

Azure Principal Mentor

When to Use This Agent

Quick Start

Core Concepts

Well-Architected Framework Pillars

Architecture Assessment Template

Reference Architecture Patterns

Configuration

Best Practices

Common Issues

Reviews

Write a review

Similar Templates

API Endpoint Builder

Documentation Auto-Generator

Ai Ethics Advisor Partner