技能 编程开发 AI协作过程证据分析器

AI协作过程证据分析器

v20260612
collab-proof
本工具用于深入分析开发者与AI模型(如Claude)的协作过程,自动捕捉并结构化那些未被开发者主动记录的关键证据。它通过分析版本控制差异(git diff)和对话上下文,评估技术复杂度、识别决策分歧点,并量化AI的贡献,为项目回顾和知识沉淀提供系统化的依据。
获取技能
280 次下载
概览

collab-proof

Surfaces AI collaboration evidence the developer didn't consciously record. Vela 3-layer pipeline × ADHD 4-frame reasoning — prompt-native, zero dependencies.


Layer 01 — Signal detection

Run git log --oneline -10 and git diff --stat HEAD~3..HEAD first.

Classify signal level using this rubric (pick the highest that matches):

HIGH → full artifacts (DECISIONS.md + session-history + WORKLOG + HTML)

  • New file created, OR
  • 4+ files modified, OR
  • Explicit option comparison in conversation ("vs", "instead of", "chose X over Y"), OR
  • Design discussion lasted 15+ exchanges, OR
  • Bug with root cause diagnosis — conversation contains WHY the bug happened (not just "fixed X" but "the bug was caused by Y because Z")

BUG_FIXING special rule — override file count: Even if only 1 file changed, classify as HIGH if the conversation contains:

  • Root cause explanation ("the bug was...", "this happened because...", "the issue is...")
  • Diagnosis process ("I checked...", "turned out...", "the problem was...")
  • Fix rationale ("chose this approach because...", "instead of X, used Y because...") File count doesn't matter for bugs — a well-diagnosed single-file fix is more valuable than a 10-file feature with no discussion.

MEDIUM → WORKLOG only

  • 1–3 files modified with no root cause discussion, OR
  • Minor feature added, no tradeoffs discussed

LOW → silence, tell user "Routine session — nothing recorded."

  • No code changes, only planning/discussion, OR
  • Single trivial change with no context ("change this text", "fix typo", "rename variable")

Show the user: Signal: HIGH / MEDIUM / LOW — [one-line reason]


Layer 02 — WorkIntentClassifier

Run all four frames simultaneously against conversation context + git diff. Score each frame 0.0–1.0 using the rubric below. Then apply pruning and classification rules.

Frame scoring rubric

Frame A — Technical (code churn complexity)

  • 1.0 New module/file created, complex logic added (state machine, Lua script, novel algorithm)
  • 0.5 Existing function logic modified, simple API endpoint added
  • 0.1 Typo fix, comment change, plain text edit

Frame B — Uncertainty (developer doubt signals)

  • 1.0 Code written then fully rolled back, explicit doubt expressed ("이게 맞나?", "동작 안 하네"), git revert
  • 0.5 Advice sought from Claude mid-implementation, 2+ revision requests on same area
  • 0.0 Uninterrupted directive execution — developer knew exactly what to build

Frame C — Fork (decision branch presence)

  • 1.0 Two or more alternatives explicitly compared in conversation (A vs B)
  • 0.5 No explicit comparison but tradeoff mentioned (performance vs readability)
  • 0.0 Single standard approach applied, no alternatives considered

Frame D — AI contribution (Claude's actual impact)

  • 1.0 Claude identified a bug/edge case the developer hadn't noticed and proposed the fix
  • 0.6 Claude generated structural boilerplate/skeleton that significantly accelerated execution
  • 0.2 Claude reformatted or transcribed developer-directed code without independent contribution

Pruning rule

Prune any frame scoring < 0.4.

Exception — High-Speed Execution Guard: If Frame A >= 0.8 AND Frame D >= 0.6, do NOT prune and do NOT silence the session, even if Frame B = 0.0 and Frame C = 0.0. This is a boilerplate-heavy FEATURE_BUILDING session. Classify immediately as FEATURE_BUILDING with HIGH signal. Rationale: zero uncertainty in a fast-moving session is a feature, not a reason to discard it.


Intent classification

Surviving frames Dominant intent Meaning
A high + D mid-high (B, C low) FEATURE_BUILDING High-velocity feature generation, Claude scaffolding
B high + A/D high BUG_FIXING or STUCK Active debugging or unresolved looping
C high + A high REFACTORING or EXPLORING Architecture exploration, weighing alternatives
All frames < 0.4 FLOW_STATE or LOW Routine typing, silence unless Layer 01 was HIGH

If multiple intents tie, pick the one with the highest combined frame score. Record the runner-up — it belongs in the session narrative.


Internal output format

Before proceeding to Layer 03, resolve to this structure (show it to the user):

{
  "frames": {
    "technical": 0.0,
    "uncertainty": 0.0,
    "fork": 0.0,
    "ai_contribution": 0.0
  },
  "pruned": ["list of pruned frame names"],
  "intent": "FEATURE_BUILDING",
  "signal": "HIGH",
  "calibration_note": "one sentence explaining any exception rule applied"
}

Layer 03 — Output

If HIGH signal

Append to DECISIONS.md — one entry per real fork (Frame C must confirm alternatives existed):

## [YYYY-MM-DD] <title>

**Context**: [Frame A — what forced this choice]
**Decision**: what was chosen
**Alternatives considered**: [Frame C — road not taken]
**Reasoning**: why — prefix "inferred:" if reconstructed from context
**AI contribution**:
  - Identified: [Frame D — something developer missed]
  - Suggested: [Frame D — approach or alternative]
  - Developer-driven: [what the developer decided independently]
**Intent class**: [from Layer 02]
**Signal score**: HIGH
**Outcome**: implemented | pending | reversed

If no real fork existed → write nothing. Never fabricate decisions.

BUG_FIXING intent: use this format instead:

## [YYYY-MM-DD] <bug title>

**Root cause**: what actually caused the bug — the WHY, not just the what
**Symptom**: what the developer observed
**Fix**: what was changed
**Why this fix**: rationale — inferred if not stated explicitly
**Alternative fixes considered**: other approaches discussed (if any)
**AI contribution**:
  - Identified: [Frame D — did Claude spot the root cause?]
  - Suggested: [Frame D — fix approach or diagnostic step]
  - Developer-driven: [what the developer diagnosed/decided independently]
**Intent class**: BUG_FIXING
**Signal score**: HIGH
**Outcome**: fixed | workaround | deferred

Create session-history/YYYY-MM-DD-HHMM.md:

# Session [YYYY-MM-DD HH:MM]

**Intent**: [class] (runner-up: [class if any])
**Signal**: HIGH
**Frames active**: A ([score]) / B ([score]) / C ([score]) / D ([score])

## What shipped
[grounded in git log]

## What was figured out
[Frame B + C — the reasoning, tradeoffs, debugging — what developers forget]

## Decisions made this session
[refs to DECISIONS.md entries]

## Where it got hard
[Frame B findings — uncertainty, reverts, EXPLORING/STUCK signals]

## AI contribution summary
[Frame D synthesis — one honest paragraph, calibrated]

## Next steps inferred
[what's obviously incomplete]

Append to WORKLOG.md:

YYYY-MM-DD HH:MM | [intent] | HIGH | D:[score] | cache:[hit%]% | tok:[total] | <verb phrase> — <why it mattered>

Fields:

  • D:[score] — Frame D AI contribution score (0.0–1.0)
  • cache:[hit%]% — cache hit rate from token analysis (or cache:n/a if no data)
  • tok:[total] — total tokens this session (input + cache_read + cache_create + output, in K e.g. 45K)
  • verb phrase — what shipped, grounded in git log

Collect token usage (bash — run this and capture output):

python3 -c "
import json, sys
from pathlib import Path

projects = Path.home() / '.claude/projects'
files = sorted(projects.rglob('*.jsonl'), key=lambda f: f.stat().st_mtime, reverse=True)
if not files:
    print('no_data'); sys.exit()

with open(files[0]) as fp:
    lines = [json.loads(l) for l in fp if l.strip()]

ti = to = cr = cc = 0
turns = []
for i, line in enumerate(lines):
    if line.get('type') == 'assistant':
        u = line.get('message', {}).get('usage', {})
        if not u: continue
        inp = u.get('input_tokens', 0)
        ti += inp; to += u.get('output_tokens', 0)
        cr += u.get('cache_read_input_tokens', 0)
        cc += u.get('cache_creation_input_tokens', 0)
        prompt = ''
        for j in range(i-1, -1, -1):
            if lines[j].get('type') == 'user':
                c = lines[j].get('message', {}).get('content', '')
                prompt = (c if isinstance(c, str) else next((x.get('text','') for x in c if isinstance(x,dict) and x.get('type')=='text'), ''))[:80]
                break
        turns.append((inp, prompt))

total = ti + cr + cc
hit = cr / total * 100 if total else 0
print(f'input={ti} output={to} cache_read={cr} cache_create={cc} hit={hit:.0f} turns={len(turns)}')
turns.sort(reverse=True)
for idx, (tok, p) in enumerate(turns[:3]):
    print(f'top{idx+1}={tok}|{p}')
"

Parse the output and include token stats in the session narrative. Then:

Generate session-history/YYYY-MM-DD-HHMM-proof.html — write a self-contained HTML file. Structure and class names are fixed — do not rename or reorder sections.

Fixed CSS tokens (use exactly):

  • Background: #0d1117, Card: #161b22, Border: #30363d
  • Font: font-family: 'Courier New', monospace
  • Frame score colors: high#3fb950, low#f85149, pruned → #8b949e
  • AI line colors: ai-identified#a371f7, ai-suggested#d29922, ai-developer#3fb950

Fixed HTML structure (class names must match exactly):

<div class="header">
  <div class="header-top">
    <div class="project-name">
    <span class="badge">                    <!-- intent class -->
  <div class="meta-row">                    <!-- date, branch, signal level text -->
  <div class="signal-container">
    <div class="signal-label">
    <div class="signal-track">
      <div class="signal-fill">             <!-- width % driven by signal score -->

<div class="section">                       <!-- frames -->
  <div class="section-title"> ... <span class="count">Layer 02 · ADHD tree-of-thought</span>
  <div class="frames-grid">
    <div class="frame-card">               <!-- pruned: class="frame-card pruned" -->
      <div class="frame-label">            <!-- Frame A / B / C / D -->
      <div class="frame-name">
      <div class="frame-score high|low">   <!-- score value -->

<div class="section">                       <!-- decisions — skip section if none -->
  <div class="section-title"> ... <span class="count">N recorded</span>
  <div class="decision-card">              <!-- one per DECISIONS.md entry -->
    <div class="decision-header">
      <div class="decision-title">
      <div class="decision-date">
    <div class="decision-fields">
      <div class="field-row">
        <div class="field-label">          <!-- Context / Decision / Alternatives / Reasoning -->
        <div class="field-value">
      <div class="field-row">              <!-- AI contribution row -->
        <div class="field-label">AI contribution</div>
        <div class="field-value">
          <div class="ai-block">
            <div class="ai-line ai-identified|ai-suggested|ai-developer">
              <span class="tag">IDENTIFIED|SUGGESTED|DEV-DRIVEN</span>
      <div class="field-row">              <!-- Outcome row -->
        <div class="field-label">Outcome</div>
        <div class="field-value">
          <span class="outcome-badge outcome-implemented|outcome-pending|outcome-reversed">

<div class="section">                       <!-- session narrative -->
  <div class="section-title">Session narrative</div>
  <div class="narrative-grid">
    <div class="narrative-card">           <!-- What shipped -->
    <div class="narrative-card">           <!-- What was figured out -->
    <div class="narrative-card">           <!-- Where it got hard -->
    <div class="narrative-card">           <!-- Next steps inferred -->

<div class="section">                       <!-- AI contribution summary -->
  <div class="section-title">AI contribution summary</div>
  <div class="narrative-card">             <!-- Frame D synthesis paragraph -->

<div class="section">                       <!-- token usage -->
  <div class="section-title">Token usage</div>
  <div class="narrative-card">             <!-- cache hit rate bar + top turns + optimization note -->

<div class="section">                       <!-- worklog tail -->
  <div class="section-title"> ... <span class="count">last N entries</span>
  <div class="worklog-entry">              <!-- one per recent WORKLOG line -->

<div class="footer">                        <!-- last commit hash · "Generated by collab-proof · timestamp" -->

Write the HTML using bash:

cat > session-history/YYYY-MM-DD-HHMM-proof.html << 'HTMLEOF'
<!DOCTYPE html>
... (full HTML with inline CSS, no external resources)
HTMLEOF

After writing, show: open session-history/YYYY-MM-DD-HHMM-proof.html


If MEDIUM signal

Append one line to WORKLOG.md only:

YYYY-MM-DD HH:MM | [intent] | MEDIUM | D:[score] | cache:[hit%]% | tok:[total] | <verb phrase>

If LOW signal

Tell user: "Signal: LOW — Routine session, nothing recorded."


Honesty rules

  • Never invent decisions not in the conversation or implied by the diff
  • "inferred:" prefix when reasoning is reconstructed
  • Frame D must be calibrated — neither overclaim nor dismiss
  • If all frames score < 0.4 → write nothing

PreCompact snapshot (context compaction defence)

When context compaction is about to happen (triggered by the PreCompact hook), run a lightweight mid-session checkpoint before context is lost:

  1. Compute current Layer 01 signal level from available context
  2. Score all four frames against what's visible now
  3. Write a snapshot to session-history/.tmp-TIMESTAMP.json:
{
  "timestamp": "YYYY-MM-DD HH:MM:SS",
  "trigger": "pre-compact",
  "signal": "HIGH / MEDIUM / LOW",
  "frames": { "technical": 0.0, "uncertainty": 0.0, "fork": 0.0, "ai_contribution": 0.0 },
  "intent": "FEATURE_BUILDING",
  "key_moments": [
    "one-line description of the most important decision or finding so far"
  ]
}

When /collab-proof runs at session end:

  • Read all session-history/.tmp-*.json files
  • Merge frame scores (take max per frame across all snapshots)
  • Combine key_moments arrays — these preserve tradeoff discussions that were compacted away
  • Delete .tmp-*.json files after merging
信息
Category 编程开发
Name collab-proof
版本 v20260612
大小 13.03KB
更新时间 2026-06-13
语言