M

Master Senior Suite

Battle-tested skill for comprehensive, backend, development, skill. Includes structured workflows, validation checks, and reusable patterns for development.

SkillClipticsdevelopmentv1.0.0MIT
0 views0 copies

Master Senior Suite

A comprehensive skill for senior software engineers covering code ownership, technical leadership, mentoring, production incident management, and engineering best practices. Bridges the gap between writing code and leading technical initiatives.

When to Use This Skill

Choose this skill when:

  • Taking ownership of a critical system or service as a senior engineer
  • Leading technical initiatives across multiple teams or projects
  • Mentoring junior developers on code quality and engineering practices
  • Managing production incidents and establishing post-mortem processes
  • Defining engineering standards, style guides, and review processes

Consider alternatives when:

  • Need specific architectural patterns → use an architect skill
  • Working on a specific technology → use that technology's skill
  • Managing project timelines → use a project management tool
  • Building team processes from scratch → use an engineering manager skill

Quick Start

# System Ownership Checklist ## Week 1: Understanding - [ ] Read all existing documentation and ADRs - [ ] Map service dependencies (upstream and downstream) - [ ] Review recent incident reports and post-mortems - [ ] Identify on-call runbook gaps ## Week 2: Observability - [ ] Verify monitoring covers all critical paths - [ ] Set up alerts for SLO violations - [ ] Create or update dashboards for key metrics - [ ] Ensure structured logging with correlation IDs ## Week 3: Reliability - [ ] Review error budgets and SLO compliance - [ ] Identify single points of failure - [ ] Verify backup and recovery procedures - [ ] Test failover scenarios ## Week 4: Improvement - [ ] Prioritize tech debt backlog - [ ] Propose architectural improvements - [ ] Document operational runbooks - [ ] Share knowledge with team

Core Concepts

Senior Engineer Responsibility Matrix

AreaResponsibilityArtifacts
Code QualitySet standards, review PRs, refactorStyle guides, linter configs
System DesignLead design reviews, write ADRsDesign docs, ADRs, diagrams
ReliabilityOwn SLOs, reduce incidentsRunbooks, dashboards, alerts
MentoringPair programming, code reviews1:1 notes, growth plans
Technical DebtTrack, prioritize, advocateTech debt register, proposals
Incident ResponseLead incidents, write post-mortemsPost-mortems, action items

Incident Management Framework

// Incident severity classification const severityLevels = { SEV1: { description: 'Complete service outage or data loss', responseTime: '15 minutes', communication: 'All-hands, exec notification, status page', commanderRequired: true, }, SEV2: { description: 'Major feature degraded, workaround exists', responseTime: '30 minutes', communication: 'Team channel, status page', commanderRequired: true, }, SEV3: { description: 'Minor feature impacted, limited users', responseTime: '4 hours', communication: 'Team channel', commanderRequired: false, }, SEV4: { description: 'Cosmetic issue or minor inconvenience', responseTime: 'Next business day', communication: 'Ticket', commanderRequired: false, }, }; // Post-mortem template structure interface PostMortem { title: string; severity: string; duration: string; impact: string; timeline: { time: string; event: string }[]; rootCause: string; contributing: string[]; actionItems: { action: string; owner: string; deadline: string }[]; lessonsLearned: string[]; }

Technical Debt Tracking

# Tech Debt Register | ID | Title | Impact | Effort | Priority | Owner | |----|-------|--------|--------|----------|-------| | TD-001 | Migrate from callbacks to async/await | Reduces bugs, improves readability | M (2 sprints) | High | @alice | | TD-002 | Replace hand-rolled auth with Passport | Security risk, maintenance burden | L (4 sprints) | Critical | @bob | | TD-003 | Add integration tests for payment flow | Production bugs undetected | S (3 days) | High | @carol | | TD-004 | Upgrade Node.js from 16 to 20 | EOL runtime, missing features | M (2 sprints) | Medium | @dave |

Configuration

ParameterTypeDefaultDescription
incidentSeverityLevelsnumber4Number of severity levels (3-5)
postMortemRequiredstring'SEV1,SEV2'Severities requiring formal post-mortems
techDebtBudgetnumber20Percentage of sprint capacity for tech debt
codeReviewSLAnumber4Hours to first review response
onCallRotationstring'weekly'On-call rotation: daily, weekly, or biweekly
documentationReviewstring'quarterly'Documentation freshness review cadence

Best Practices

  1. Own the system, not just the code — Senior engineers are responsible for reliability, performance, security, and operability — not just feature code. Know your system's SLOs, error budgets, dependencies, and failure modes.

  2. Write blameless post-mortems focused on systems, not people — Incidents happen because systems allow them, not because individuals are careless. Focus post-mortems on systemic improvements: better alerts, safer deploy processes, more thorough testing.

  3. Allocate 20% of capacity to technical debt — Without dedicated time, tech debt compounds until velocity drops to near zero. Track it explicitly, prioritize by risk and impact, and make progress visible to stakeholders.

  4. Mentor through pairing and reviews, not lectures — The most effective mentoring happens during real work: pair programming on production code, detailed code review comments that explain reasoning, and collaborative design sessions.

  5. Make knowledge sharing systematic, not heroic — Write runbooks for every operational procedure, document decisions in ADRs, and rotate on-call responsibilities. If only one person can handle a production issue, that's a single point of failure.

Common Issues

On-call fatigue from noisy alerts — Too many alerts leads to alert fatigue and ignored pages. Audit alerts quarterly: if an alert fires more than once without requiring action, tune it or delete it. Every alert should be actionable and map to a runbook.

Tech debt proposals consistently deprioritized — Frame tech debt in business terms: "This migration reduces incident frequency by 40%, saving 20 engineering hours per month." Attach tech debt to feature work where possible: "While adding search, we'll also fix the query performance issues."

Knowledge silos around senior engineers — Paradoxically, being the only person who understands a system makes you indispensable but creates risk. Actively pair with others on your systems, write thorough documentation, and delegate ownership of subsystems.

Community

Reviews

Write a review

No reviews yet. Be the first to review this template!

Similar Templates