Root Cause Tracing Kit

Systematic root cause analysis toolkit that traces bugs, failures, and regressions back to their origin through structured investigation techniques and causal chain mapping.

When to Use This Skill

Choose Root Cause Tracing when:

A bug keeps recurring despite surface-level fixes
Production incidents need thorough post-mortem investigation
Test failures have unclear or non-obvious causes
Performance degradations appear without clear code changes
You need to document causal chains for team knowledge sharing

Consider alternatives when:

The bug is immediately obvious from the error message
You need quick hotfixes rather than deep analysis
Issues are in third-party libraries you cannot modify

Quick Start


# Activate the root cause tracing skill
claude skill activate advanced-root-cause-tracing-kit

# Investigate a failing test
claude "Trace the root cause of the failing payment integration test"

# Analyze a production incident
claude "Root cause analysis: users seeing 500 errors on checkout since deploy v2.4.1"

Example Investigation Flow


// Symptom: OrderService.createOrder() throws NullPointerException
// Step 1: Identify the failing line
async createOrder(userId: string, items: CartItem[]) {
  const user = await this.userRepo.findById(userId);
  const address = user.defaultAddress; // NPE here - user.defaultAddress is null

  // Step 2: Trace why defaultAddress is null
  // -> User was created via SSO flow
  // -> SSO registration skips address collection step
  // -> Root cause: SSO onboarding flow missing address prompt

  // Fix: Add address check with fallback
  const address = user.defaultAddress ?? await this.promptForAddress(userId);
}

Core Concepts

Investigation Methodology

Phase	Action	Output
Symptom Collection	Gather error logs, stack traces, user reports	Symptom map
Timeline Construction	Identify when the issue first appeared	Change window
Hypothesis Formation	List possible causes ranked by likelihood	Hypothesis tree
Evidence Gathering	Test each hypothesis with data	Confirmed/eliminated causes
Causal Chain Mapping	Trace confirmed cause to its origin	Root cause document
Fix Verification	Confirm fix addresses root cause, not symptom	Regression test

Tracing Techniques

Technique	Best For	Approach
Binary Search (git bisect)	Regressions with known good state	Bisect commits to find breaking change
Dependency Tracing	Failures after library updates	Compare dependency trees before/after
Data Flow Analysis	Incorrect output values	Trace variable values through execution path
Log Correlation	Distributed system failures	Correlate timestamps across service logs
Fault Tree Analysis	Complex system failures	Top-down decomposition of failure modes

Causal Chain Template


## Incident: [Title]
**Symptom**: Users cannot complete checkout
**Impact**: 15% of orders failing since 2024-03-10 14:00 UTC

### Causal Chain
1. **Immediate cause**: PaymentGateway.charge() returns timeout error
2. **Contributing factor**: Gateway connection pool exhausted (max 10)
3. **Underlying cause**: New retry logic holds connections during backoff
4. **Root cause**: Retry implementation uses synchronous sleep instead of releasing connection
5. **Systemic factor**: No connection pool monitoring or alerting

### Fix
- Primary: Use async retry with connection release between attempts
- Secondary: Add connection pool utilization alerts at 80% threshold
- Preventive: Add integration test for concurrent payment processing

Configuration

Parameter	Description	Default
`max_depth`	Maximum causal chain depth to investigate	`7`
`include_git_history`	Search git history for related changes	`true`
`log_window`	Time window for log analysis	`48h`
`hypothesis_limit`	Maximum hypotheses to evaluate in parallel	`5`
`include_dependencies`	Check dependency changes in analysis	`true`
`output_format`	Report format: markdown, json, or jira	`markdown`

Best Practices

Never fix symptoms without understanding causes — Patching the immediate error without tracing the root cause leads to recurring failures and growing technical debt.
Use the "5 Whys" technique systematically — When you find a cause, ask "why did this happen?" at least five times. Each answer peels back a layer closer to the true root cause.
Preserve evidence before fixing — Capture logs, database states, heap dumps, and reproduction steps before applying any fix. Evidence disappears quickly in production systems.
Build a timeline of changes — Compare the failure onset time against deployment logs, config changes, dependency updates, and infrastructure events to narrow the investigation window.
Document every root cause analysis — Even minor investigations produce institutional knowledge. Maintain an RCA database that teams can search to identify patterns across incidents.

Common Issues

Investigation leads to dead ends with no clear root cause. Expand the investigation scope beyond code. Check infrastructure changes, DNS updates, certificate rotations, third-party API modifications, and data migration scripts. Many "code bugs" are actually environment or configuration issues that don't appear in git history.

Root cause fix introduces new regressions. Always write a regression test that reproduces the original failure before implementing the fix. Run the full test suite after fixing and specifically test adjacent functionality that shares the same code paths or data structures.

Team disagrees on what constitutes the "root cause" versus a contributing factor. Use the causal chain format to separate immediate triggers from underlying causes and systemic factors. The root cause is the deepest fixable point in the chain — going deeper hits organizational or architectural constraints that require separate planning.

⚠️ Loading Issue

Advanced Root Cause Tracing Kit

Root Cause Tracing Kit

When to Use This Skill

Quick Start

Example Investigation Flow

Core Concepts

Investigation Methodology

Tracing Techniques

Causal Chain Template

Configuration

Best Practices

Common Issues

Reviews

Write a review

Similar Templates

Full-Stack Code Reviewer

Test Suite Generator

Pro Architecture Workspace