Comprehensive Systematic Debugging

A methodical skill for diagnosing and resolving software bugs through structured root cause analysis. Covers hypothesis-driven debugging, binary search techniques, logging strategies, and common bug patterns across different technology stacks.

When to Use This Skill

Choose this skill when:

Facing a bug that resists quick fixes and needs systematic investigation
Debugging issues that span multiple components or services
Dealing with intermittent bugs that don't reproduce consistently
Analyzing production issues from logs, traces, and metrics
Teaching or establishing debugging processes for a team

Consider alternatives when:

Writing tests to prevent bugs → use a testing patterns skill
Optimizing performance → use a performance profiling skill
Debugging a specific framework → use that framework's skill
Setting up monitoring → use a DevOps/observability skill

Quick Start


# Systematic Debugging Process

## Step 1: REPRODUCE
- Can you trigger the bug reliably?
- What are the exact steps to reproduce?
- What environment (OS, browser, version)?
- Does it happen in all environments?

## Step 2: CHARACTERIZE
- When did it start? (git bisect)
- What changed recently? (commits, configs, dependencies)
- What are the exact error messages?
- What's the expected vs actual behavior?

## Step 3: ISOLATE
- Which component/layer is responsible?
- Is it frontend, backend, database, or network?
- Can you reproduce with minimal code?
- Does it happen with hardcoded values?

## Step 4: HYPOTHESIZE
- What are the possible causes?
- Which is most likely given the evidence?
- What experiment would confirm or refute?

## Step 5: FIX
- Make the minimal change that fixes the issue
- Verify the fix addresses the root cause
- Add a regression test
- Check for related bugs with the same root cause

## Step 6: PREVENT
- Why wasn't this caught earlier?
- What test/check would catch it?
- Are there similar patterns in the codebase?

Core Concepts

Bug Pattern Recognition

Symptom	Likely Cause	Investigation
Works locally, fails in prod	Env vars, secrets, configs	Diff environment configs
Intermittent failures	Race conditions, timing	Add correlation logging
Slow degradation over time	Memory leak, connection leak	Profile memory, check pool sizes
Works for some users	Data-dependent, permissions	Compare user data and roles
Fails after deploy	New code regression	git bisect to find breaking commit
UI renders but data is wrong	API/transform bug	Inspect network responses

Binary Search Debugging


// Git bisect to find the breaking commit
// 1. Find a known good commit and the current bad commit
// 2. Git bisect automates binary search

// git bisect start
// git bisect bad HEAD
// git bisect good v2.1.0
// # Git checks out middle commit — test it
// git bisect good  # or  git bisect bad
// # Repeat until the first bad commit is found

// Automated bisect with test script:
// git bisect run npm test

// Code-level binary search for intermittent bugs:
function debugBinarySearch(data: any[]) {
  // Suspect: one item in `data` causes the crash
  const mid = Math.floor(data.length / 2);
  const firstHalf = data.slice(0, mid);
  const secondHalf = data.slice(mid);

  console.log('Testing first half:', firstHalf.length, 'items');
  processData(firstHalf); // crashes? → bug is in first half

  console.log('Testing second half:', secondHalf.length, 'items');
  processData(secondHalf); // crashes? → bug is in second half
  // Repeat with the crashing half until you find the single bad item
}

Structured Logging for Debugging


// Add contextual logging for debugging production issues
function debuggableRequest(handler: RequestHandler): RequestHandler {
  return async (req, res, next) => {
    const debugContext = {
      requestId: crypto.randomUUID(),
      userId: req.user?.id,
      method: req.method,
      path: req.path,
      timestamp: Date.now(),
    };

    req.log = logger.child(debugContext);
    req.log.info('Request started');

    const start = performance.now();
    try {
      await handler(req, res, next);
      req.log.info({
        status: res.statusCode,
        duration: (performance.now() - start).toFixed(2),
      }, 'Request completed');
    } catch (err) {
      req.log.error({
        error: err.message,
        stack: err.stack,
        duration: (performance.now() - start).toFixed(2),
      }, 'Request failed');
      throw err;
    }
  };
}

Configuration

Parameter	Type	Default	Description
`loggingLevel`	string	`'debug'`	Debugging log level: debug, trace, or verbose
`bisectStrategy`	string	`'git-bisect'`	Bisect: git-bisect, code-bisect, or manual
`reproductionTimeout`	number	`300`	Max seconds to attempt reproduction
`hypothesisLimit`	number	`5`	Max hypotheses to test before reassessing
`stackTraceDepth`	number	`20`	Stack trace lines to capture
`correlationIdHeader`	string	`'x-request-id'`	Header for request correlation

Best Practices

Never fix without understanding the root cause — A fix that makes the symptom disappear without explaining why it works will eventually fail. If you can't explain the bug to a colleague, you haven't found the root cause yet.
Reproduce first, hypothesize second — Until you can reliably trigger the bug, any hypothesis is guessing. Write a test case that demonstrates the failure. If you can't reproduce it, add logging and wait for it to happen again.
Change one thing at a time when testing hypotheses — If you change three things and the bug disappears, you don't know which change fixed it. Revert two of the three changes and verify the fix still works.
Use binary search to narrow the problem space — Whether it's git bisect for commits, commenting out half the code, or testing with half the data, binary search is the fastest way to isolate the responsible component.
Document the bug, root cause, and fix — Future developers (including yourself) will encounter similar bugs. Document what the symptoms were, why the bug existed, how you found it, and how you fixed it. This turns debugging into organizational knowledge.

Common Issues

Bug disappears when adding debug logging — This is a classic Heisenbug caused by timing changes. The logging adds enough delay to avoid a race condition. Investigate thread synchronization, shared state access, and timing-dependent logic rather than relying on the logging "fix."

Stack trace points to framework code, not application code — Read the stack trace bottom-up to find where your code enters the framework. The bug is usually in the last application frame before the framework call. Check what arguments you passed to the framework function.

Fix works in development but not in production — Environment differences cause this: different database data, stricter security headers, different Node.js versions, missing environment variables, or different file permissions. Compare environment configurations systematically.

⚠️ Loading Issue

Comprehensive Systematic Debugging

Comprehensive Systematic Debugging

When to Use This Skill

Quick Start

Core Concepts

Bug Pattern Recognition

Binary Search Debugging

Structured Logging for Debugging

Configuration

Best Practices

Common Issues

Reviews

Write a review

Similar Templates

Full-Stack Code Reviewer

Test Suite Generator

Pro Architecture Workspace