E

Easy Troubleshooting Executor

Boost productivity using this generate, systematic, troubleshooting, documentation. Includes structured workflows, validation checks, and reusable patterns for documentation.

CommandClipticsdocumentationv1.0.0MIT
0 views0 copies

Easy Troubleshooting Executor

Systematically diagnose and resolve application issues using guided troubleshooting workflows and automated checks.

When to Use This Command

Run this command when you need to:

  • Diagnose a production issue by running automated health checks across all system components
  • Follow a structured troubleshooting workflow that checks common failure modes systematically
  • Generate a diagnostic report with root cause analysis and recommended fixes

Consider alternatives when:

  • You already know the root cause and just need to apply a specific fix
  • The issue requires real-time debugging with breakpoints and interactive inspection

Quick Start

Configuration

name: easy-troubleshooting-executor type: command category: documentation

Example Invocation

claude command:run easy-troubleshooting-executor --symptom "api-timeout" --service backend

Example Output

Symptom: API requests timing out
Service: backend

Diagnostic Sequence:
  [1/8] Network connectivity........PASS (latency: 2ms)
  [2/8] DNS resolution..............PASS (resolved in 12ms)
  [3/8] Service health endpoint.....PASS (200 OK, 45ms)
  [4/8] Database connectivity.......FAIL (connection refused)
  [5/8] Redis connectivity..........PASS (PONG in 1ms)
  [6/8] Memory usage................WARN (78% utilized)
  [7/8] CPU usage...................PASS (23% utilized)
  [8/8] Disk space..................PASS (62% available)

Root Cause Identified:
  Database server at db.internal:5432 is not accepting connections.
  Connection pool exhausted due to max_connections limit reached.

Recommended Fix:
  1. Increase max_connections from 100 to 200 in postgresql.conf
  2. Restart PostgreSQL service
  3. Verify application reconnects successfully

Workaround (immediate): Restart the backend service to reset connection pool.

Core Concepts

Troubleshooting System Overview

AspectDetails
Symptom MappingMaps user-reported symptoms to diagnostic check sequences
Health ProbesHTTP, TCP, DNS, and process-level health checks
Resource MonitoringCPU, memory, disk, and network utilization assessment
Dependency TracingTests connectivity to databases, caches, queues, and APIs
Root Cause AnalysisCorrelates failures across checks to identify the root cause

Diagnostic Workflow

  Symptom Reported
        |
        v
  +--------------------+
  | Map to Check Suite |---> Timeout -> network + DB + resources
  +--------------------+
        |
        v
  +--------------------+
  | Run Checks         |---> Sequential diagnostic probes
  +--------------------+
        |
        v
  +--------------------+
  | Correlate Failures |---> Which failures explain symptom?
  +--------------------+
        |
        v
  +--------------------+
  | Identify Root Cause|---> DB down -> pool exhausted -> timeout
  +--------------------+
        |
        v
  +--------------------+
  | Recommend Fix      |---> Steps to resolve + workaround
  +--------------------+

Configuration

ParameterTypeDefaultDescription
symptomstringrequiredThe observed problem: api-timeout, high-error-rate, slow-response, crash
servicestringallService to diagnose: backend, frontend, database, cache, all
depthstringstandardDiagnostic depth: quick (critical only), standard, deep (all checks)
outputstringterminalOutput format: terminal, json, markdown
timeout_secinteger10Timeout for each individual diagnostic check

Best Practices

  1. Start Broad Then Narrow - Run all diagnostic checks first to get a complete picture. Jumping to a suspected root cause without checking other components misses correlated failures.

  2. Check Dependencies Bottom-Up - Verify infrastructure (network, DNS, disk) before application-level checks. An application timeout caused by a full disk is easy to miss if you only check application logs.

  3. Save Diagnostic Reports - Store troubleshooting reports with timestamps so you can compare the current state to previous incidents. Patterns in diagnostic history reveal recurring issues that need permanent fixes.

  4. Include Workarounds With Fixes - Always provide an immediate workaround alongside the proper fix. The workaround gets the service running while the team implements the real solution.

  5. Automate Recurring Diagnostics - If the same symptom appears more than twice, automate the diagnostic checks into a monitoring alert. Proactive detection is always faster than reactive troubleshooting.

Common Issues

  1. Diagnostic Check Hangs - A TCP check to an unresponsive host hangs indefinitely. Set strict timeouts on every check and treat a timeout as a diagnostic signal (the service is likely unreachable or overloaded).

  2. False Root Cause Identification - A warning (78% memory) is flagged as root cause when the real issue is a database outage. Prioritize FAIL results over WARN results and correlate multiple failures before concluding.

  3. Insufficient Permissions for Checks - The diagnostic tool cannot query system metrics or database stats due to missing permissions. Document required permissions and verify access before running deep diagnostics.

Community

Reviews

Write a review

No reviews yet. Be the first to review this template!

Similar Templates