Master Senior Suite
Battle-tested skill for comprehensive, backend, development, skill. Includes structured workflows, validation checks, and reusable patterns for development.
Master Senior Suite
A comprehensive skill for senior software engineers covering code ownership, technical leadership, mentoring, production incident management, and engineering best practices. Bridges the gap between writing code and leading technical initiatives.
When to Use This Skill
Choose this skill when:
- Taking ownership of a critical system or service as a senior engineer
- Leading technical initiatives across multiple teams or projects
- Mentoring junior developers on code quality and engineering practices
- Managing production incidents and establishing post-mortem processes
- Defining engineering standards, style guides, and review processes
Consider alternatives when:
- Need specific architectural patterns → use an architect skill
- Working on a specific technology → use that technology's skill
- Managing project timelines → use a project management tool
- Building team processes from scratch → use an engineering manager skill
Quick Start
# System Ownership Checklist ## Week 1: Understanding - [ ] Read all existing documentation and ADRs - [ ] Map service dependencies (upstream and downstream) - [ ] Review recent incident reports and post-mortems - [ ] Identify on-call runbook gaps ## Week 2: Observability - [ ] Verify monitoring covers all critical paths - [ ] Set up alerts for SLO violations - [ ] Create or update dashboards for key metrics - [ ] Ensure structured logging with correlation IDs ## Week 3: Reliability - [ ] Review error budgets and SLO compliance - [ ] Identify single points of failure - [ ] Verify backup and recovery procedures - [ ] Test failover scenarios ## Week 4: Improvement - [ ] Prioritize tech debt backlog - [ ] Propose architectural improvements - [ ] Document operational runbooks - [ ] Share knowledge with team
Core Concepts
Senior Engineer Responsibility Matrix
| Area | Responsibility | Artifacts |
|---|---|---|
| Code Quality | Set standards, review PRs, refactor | Style guides, linter configs |
| System Design | Lead design reviews, write ADRs | Design docs, ADRs, diagrams |
| Reliability | Own SLOs, reduce incidents | Runbooks, dashboards, alerts |
| Mentoring | Pair programming, code reviews | 1:1 notes, growth plans |
| Technical Debt | Track, prioritize, advocate | Tech debt register, proposals |
| Incident Response | Lead incidents, write post-mortems | Post-mortems, action items |
Incident Management Framework
// Incident severity classification const severityLevels = { SEV1: { description: 'Complete service outage or data loss', responseTime: '15 minutes', communication: 'All-hands, exec notification, status page', commanderRequired: true, }, SEV2: { description: 'Major feature degraded, workaround exists', responseTime: '30 minutes', communication: 'Team channel, status page', commanderRequired: true, }, SEV3: { description: 'Minor feature impacted, limited users', responseTime: '4 hours', communication: 'Team channel', commanderRequired: false, }, SEV4: { description: 'Cosmetic issue or minor inconvenience', responseTime: 'Next business day', communication: 'Ticket', commanderRequired: false, }, }; // Post-mortem template structure interface PostMortem { title: string; severity: string; duration: string; impact: string; timeline: { time: string; event: string }[]; rootCause: string; contributing: string[]; actionItems: { action: string; owner: string; deadline: string }[]; lessonsLearned: string[]; }
Technical Debt Tracking
# Tech Debt Register | ID | Title | Impact | Effort | Priority | Owner | |----|-------|--------|--------|----------|-------| | TD-001 | Migrate from callbacks to async/await | Reduces bugs, improves readability | M (2 sprints) | High | @alice | | TD-002 | Replace hand-rolled auth with Passport | Security risk, maintenance burden | L (4 sprints) | Critical | @bob | | TD-003 | Add integration tests for payment flow | Production bugs undetected | S (3 days) | High | @carol | | TD-004 | Upgrade Node.js from 16 to 20 | EOL runtime, missing features | M (2 sprints) | Medium | @dave |
Configuration
| Parameter | Type | Default | Description |
|---|---|---|---|
incidentSeverityLevels | number | 4 | Number of severity levels (3-5) |
postMortemRequired | string | 'SEV1,SEV2' | Severities requiring formal post-mortems |
techDebtBudget | number | 20 | Percentage of sprint capacity for tech debt |
codeReviewSLA | number | 4 | Hours to first review response |
onCallRotation | string | 'weekly' | On-call rotation: daily, weekly, or biweekly |
documentationReview | string | 'quarterly' | Documentation freshness review cadence |
Best Practices
-
Own the system, not just the code — Senior engineers are responsible for reliability, performance, security, and operability — not just feature code. Know your system's SLOs, error budgets, dependencies, and failure modes.
-
Write blameless post-mortems focused on systems, not people — Incidents happen because systems allow them, not because individuals are careless. Focus post-mortems on systemic improvements: better alerts, safer deploy processes, more thorough testing.
-
Allocate 20% of capacity to technical debt — Without dedicated time, tech debt compounds until velocity drops to near zero. Track it explicitly, prioritize by risk and impact, and make progress visible to stakeholders.
-
Mentor through pairing and reviews, not lectures — The most effective mentoring happens during real work: pair programming on production code, detailed code review comments that explain reasoning, and collaborative design sessions.
-
Make knowledge sharing systematic, not heroic — Write runbooks for every operational procedure, document decisions in ADRs, and rotate on-call responsibilities. If only one person can handle a production issue, that's a single point of failure.
Common Issues
On-call fatigue from noisy alerts — Too many alerts leads to alert fatigue and ignored pages. Audit alerts quarterly: if an alert fires more than once without requiring action, tune it or delete it. Every alert should be actionable and map to a runbook.
Tech debt proposals consistently deprioritized — Frame tech debt in business terms: "This migration reduces incident frequency by 40%, saving 20 engineering hours per month." Attach tech debt to feature work where possible: "While adding search, we'll also fix the query performance issues."
Knowledge silos around senior engineers — Paradoxically, being the only person who understands a system makes you indispensable but creates risk. Actively pair with others on your systems, write thorough documentation, and delegate ownership of subsystems.
Reviews
No reviews yet. Be the first to review this template!
Similar Templates
Full-Stack Code Reviewer
Comprehensive code review skill that checks for security vulnerabilities, performance issues, accessibility, and best practices across frontend and backend code.
Test Suite Generator
Generates comprehensive test suites with unit tests, integration tests, and edge cases. Supports Jest, Vitest, Pytest, and Go testing.
Pro Architecture Workspace
Battle-tested skill for architectural, decision, making, framework. Includes structured workflows, validation checks, and reusable patterns for development.