技能 编程开发 假设驱动的调试与根因分析

假设驱动的调试与根因分析

v20260407
hypothesis-testing
本技能指导用户应用科学方法进行软件调试。它摒弃盲目尝试,通过观察症状、提出可证伪假设、预测结果、设计实验和分析结果五个步骤,系统地定位故障源头,从而高效、准确地找出技术问题的根本原因。
获取技能
291 次下载
概览

Hypothesis-Driven Debugging

You are applying the scientific method to debugging. Form clear hypotheses, design tests that can definitively confirm or reject them, and systematically narrow down to the truth.

Core Principle

Every debugging action should test a specific hypothesis. Random changes are not debugging.

The Scientific Debugging Method

1. Observe - Gather Facts

Before forming hypotheses, collect observations:

  • What exactly happens? (specific symptoms)
  • When does it happen? (timing, frequency)
  • Where does it happen? (environment, component)
  • What changed recently? (code, config, data)

Write down observations objectively:

Observations:
- API returns 500 error on POST /orders
- Happens only when cart has > 10 items
- Started after deployment on 2024-01-15
- Works fine in staging environment
- Error logs show "connection refused" to inventory service

2. Hypothesize - Form Testable Theories

Examples (bad → good):

  • "Something is wrong with the network" → "The inventory service connection pool is exhausted when processing orders with >10 items"
  • "There might be a race condition" → "The order processing timeout (5s) is insufficient for large orders"

3. Predict - Define Expected Results

For each hypothesis, define what you expect to observe if it is true versus false:

Hypothesis: Connection pool exhausted for large orders

If TRUE:
- Active connections should hit max (20) during large orders
- Small orders should still work during this time
- Increasing pool size should fix the issue

If FALSE:
- Connection count stays well below max
- Small orders also fail during the issue
- Pool size change has no effect

4. Test - Experiment Systematically

Design tests that definitively confirm or reject:

Test Plan for Connection Pool Hypothesis:

1. Add connection pool monitoring
   - Log active connections before/after each request
   - Expected if true: Count reaches 20 during failures

2. Artificial stress test
   - Send 5 large orders simultaneously
   - Expected if true: Failures start when pool exhausted

3. Increase pool size to 50
   - Repeat stress test
   - Expected if true: Failures stop or threshold moves

4. Control test with small orders
   - Send 20 small orders simultaneously
   - Expected if true: No failures (faster processing)

5. Analyze - Interpret Results

After testing:

  • Did results match predictions for TRUE or FALSE?
  • Are results conclusive or ambiguous?
  • Do results suggest a different hypothesis?
Results:
- Connection count reached 20/20 during failures ✓
- Small orders succeeded during same period ✓
- Pool size increase to 50 → failures stopped ✓

Conclusion: Hypothesis CONFIRMED
Connection pool exhaustion is the proximate cause.

New question: Why do large orders exhaust the pool?
New hypothesis: Large orders make multiple inventory calls per item

Hypothesis Tracking Template

## Bug: [Description]

### Hypothesis 1: [Theory]
**Status:** Testing | Confirmed | Rejected
**Probability:** High | Medium | Low

**Evidence For:**
- [Evidence 1]
- [Evidence 2]

**Evidence Against:**
- [Evidence 1]

**Test Plan:**
1. [Test 1] - Expected result if true
2. [Test 2] - Expected result if false

**Test Results:**
- [Result 1]: [Supports/Contradicts]
- [Result 2]: [Supports/Contradicts]

**Conclusion:** [Confirmed/Rejected] because [reasoning]

---

### Hypothesis 2: [Next Theory]
...

Testing Techniques by Hypothesis Type

Testing Timing Hypotheses

// Add timing instrumentation
const start = performance.now();
await suspectedSlowOperation();
const duration = performance.now() - start;
console.log(`Operation took ${duration}ms`);
// Hypothesis confirmed if duration > expected

Testing Data Hypotheses

// Validate data at key points
function processWithValidation(data) {
  console.assert(data.id != null, 'Missing id');
  console.assert(data.items?.length > 0, 'Empty items');
  console.assert(typeof data.total === 'number', 'Invalid total');
  // If assertions fail, data hypothesis likely true
}

Testing State Hypotheses

// Snapshot state before and after
const stateBefore = JSON.stringify(currentState);
suspectedStateMutation();
const stateAfter = JSON.stringify(currentState);
if (stateBefore !== stateAfter) {
  console.log('State changed:', diff(stateBefore, stateAfter));
}

Decision Tree

Is the hypothesis testable?
├── NO → Refine it to be more specific
└── YES → Can I test it without side effects?
    ├── NO → Design a safe test (staging, logs-only)
    └── YES → Run the test
        └── Results conclusive?
            ├── NO → Design a better test
            └── YES → Hypothesis confirmed or rejected?
                ├── CONFIRMED → Root cause found?
                │   ├── YES → Fix and verify
                │   └── NO → Form next hypothesis (why?)
                └── REJECTED → Form next hypothesis

Integration with Other Skills

  • root-cause-analysis: Hypothesis testing is a key technique within RCA
  • trace-and-isolate: Use tracing to gather evidence for hypotheses
  • testing/red-green-refactor: Write test that confirms the bug before fixing
信息
Category 编程开发
Name hypothesis-testing
版本 v20260407
大小 6.15KB
更新时间 2026-04-12
语言