Master Senior Suite

A comprehensive skill for senior software engineers covering code ownership, technical leadership, mentoring, production incident management, and engineering best practices. Bridges the gap between writing code and leading technical initiatives.

When to Use This Skill

Choose this skill when:

Taking ownership of a critical system or service as a senior engineer
Leading technical initiatives across multiple teams or projects
Mentoring junior developers on code quality and engineering practices
Managing production incidents and establishing post-mortem processes
Defining engineering standards, style guides, and review processes

Consider alternatives when:

Need specific architectural patterns → use an architect skill
Working on a specific technology → use that technology's skill
Managing project timelines → use a project management tool
Building team processes from scratch → use an engineering manager skill

Quick Start


# System Ownership Checklist

## Week 1: Understanding
- [ ] Read all existing documentation and ADRs
- [ ] Map service dependencies (upstream and downstream)
- [ ] Review recent incident reports and post-mortems
- [ ] Identify on-call runbook gaps

## Week 2: Observability
- [ ] Verify monitoring covers all critical paths
- [ ] Set up alerts for SLO violations
- [ ] Create or update dashboards for key metrics
- [ ] Ensure structured logging with correlation IDs

## Week 3: Reliability
- [ ] Review error budgets and SLO compliance
- [ ] Identify single points of failure
- [ ] Verify backup and recovery procedures
- [ ] Test failover scenarios

## Week 4: Improvement
- [ ] Prioritize tech debt backlog
- [ ] Propose architectural improvements
- [ ] Document operational runbooks
- [ ] Share knowledge with team

Core Concepts

Senior Engineer Responsibility Matrix

Area	Responsibility	Artifacts
Code Quality	Set standards, review PRs, refactor	Style guides, linter configs
System Design	Lead design reviews, write ADRs	Design docs, ADRs, diagrams
Reliability	Own SLOs, reduce incidents	Runbooks, dashboards, alerts
Mentoring	Pair programming, code reviews	1:1 notes, growth plans
Technical Debt	Track, prioritize, advocate	Tech debt register, proposals
Incident Response	Lead incidents, write post-mortems	Post-mortems, action items

Incident Management Framework


// Incident severity classification
const severityLevels = {
  SEV1: {
    description: 'Complete service outage or data loss',
    responseTime: '15 minutes',
    communication: 'All-hands, exec notification, status page',
    commanderRequired: true,
  },
  SEV2: {
    description: 'Major feature degraded, workaround exists',
    responseTime: '30 minutes',
    communication: 'Team channel, status page',
    commanderRequired: true,
  },
  SEV3: {
    description: 'Minor feature impacted, limited users',
    responseTime: '4 hours',
    communication: 'Team channel',
    commanderRequired: false,
  },
  SEV4: {
    description: 'Cosmetic issue or minor inconvenience',
    responseTime: 'Next business day',
    communication: 'Ticket',
    commanderRequired: false,
  },
};

// Post-mortem template structure
interface PostMortem {
  title: string;
  severity: string;
  duration: string;
  impact: string;
  timeline: { time: string; event: string }[];
  rootCause: string;
  contributing: string[];
  actionItems: { action: string; owner: string; deadline: string }[];
  lessonsLearned: string[];
}

Technical Debt Tracking


# Tech Debt Register

| ID | Title | Impact | Effort | Priority | Owner |
|----|-------|--------|--------|----------|-------|
| TD-001 | Migrate from callbacks to async/await | Reduces bugs, improves readability | M (2 sprints) | High | @alice |
| TD-002 | Replace hand-rolled auth with Passport | Security risk, maintenance burden | L (4 sprints) | Critical | @bob |
| TD-003 | Add integration tests for payment flow | Production bugs undetected | S (3 days) | High | @carol |
| TD-004 | Upgrade Node.js from 16 to 20 | EOL runtime, missing features | M (2 sprints) | Medium | @dave |

Configuration

Parameter	Type	Default	Description
`incidentSeverityLevels`	number	`4`	Number of severity levels (3-5)
`postMortemRequired`	string	`'SEV1,SEV2'`	Severities requiring formal post-mortems
`techDebtBudget`	number	`20`	Percentage of sprint capacity for tech debt
`codeReviewSLA`	number	`4`	Hours to first review response
`onCallRotation`	string	`'weekly'`	On-call rotation: daily, weekly, or biweekly
`documentationReview`	string	`'quarterly'`	Documentation freshness review cadence

Best Practices

Own the system, not just the code — Senior engineers are responsible for reliability, performance, security, and operability — not just feature code. Know your system's SLOs, error budgets, dependencies, and failure modes.
Write blameless post-mortems focused on systems, not people — Incidents happen because systems allow them, not because individuals are careless. Focus post-mortems on systemic improvements: better alerts, safer deploy processes, more thorough testing.
Allocate 20% of capacity to technical debt — Without dedicated time, tech debt compounds until velocity drops to near zero. Track it explicitly, prioritize by risk and impact, and make progress visible to stakeholders.
Mentor through pairing and reviews, not lectures — The most effective mentoring happens during real work: pair programming on production code, detailed code review comments that explain reasoning, and collaborative design sessions.
Make knowledge sharing systematic, not heroic — Write runbooks for every operational procedure, document decisions in ADRs, and rotate on-call responsibilities. If only one person can handle a production issue, that's a single point of failure.

Common Issues

On-call fatigue from noisy alerts — Too many alerts leads to alert fatigue and ignored pages. Audit alerts quarterly: if an alert fires more than once without requiring action, tune it or delete it. Every alert should be actionable and map to a runbook.

Tech debt proposals consistently deprioritized — Frame tech debt in business terms: "This migration reduces incident frequency by 40%, saving 20 engineering hours per month." Attach tech debt to feature work where possible: "While adding search, we'll also fix the query performance issues."

Knowledge silos around senior engineers — Paradoxically, being the only person who understands a system makes you indispensable but creates risk. Actively pair with others on your systems, write thorough documentation, and delegate ownership of subsystems.

⚠️ Loading Issue

Master Senior Suite

Master Senior Suite

When to Use This Skill

Quick Start

Core Concepts

Senior Engineer Responsibility Matrix

Incident Management Framework

Technical Debt Tracking

Configuration

Best Practices

Common Issues

Reviews

Write a review

Similar Templates

Full-Stack Code Reviewer

Test Suite Generator

Pro Architecture Workspace