Run a Semgrep scan with automatic language detection, parallel execution via Task subagents, and merged SARIF output.
--metrics=off — Semgrep sends telemetry by default; --config auto also phones home. Every semgrep command must include --metrics=off to prevent data leakage during security audits.semgrep-rule-creator skillsemgrep-rule-variant-creator skillAll scan results, SARIF files, and temporary data are stored in a single output directory.
OUTPUT_DIR../static_analysis_semgrep_1. If that already exists, increment to _2, _3, etc.In both cases, always create the directory with mkdir -p before writing any files.
# Resolve output directory
if [ -n "$USER_SPECIFIED_DIR" ]; then
OUTPUT_DIR="$USER_SPECIFIED_DIR"
else
BASE="static_analysis_semgrep"
N=1
while [ -e "${BASE}_${N}" ]; do
N=$((N + 1))
done
OUTPUT_DIR="${BASE}_${N}"
fi
mkdir -p "$OUTPUT_DIR/raw" "$OUTPUT_DIR/results"
The output directory is resolved once at the start of Step 1 and used throughout all subsequent steps.
$OUTPUT_DIR/
├── rulesets.txt # Approved rulesets (logged after Step 3)
├── raw/ # Per-scan raw output (unfiltered)
│ ├── python-python.json
│ ├── python-python.sarif
│ ├── python-django.json
│ ├── python-django.sarif
│ └── ...
└── results/ # Final merged output
└── results.sarif
Required: Semgrep CLI (semgrep --version). If not installed, see Semgrep installation docs.
Optional: Semgrep Pro — enables cross-file taint tracking, inter-procedural analysis, and additional languages (Apex, C#, Elixir). Check with:
semgrep --pro --validate --config p/default 2>/dev/null && echo "Pro available" || echo "OSS only"
Limitations: OSS mode cannot track data flow across files. Pro mode uses -j 1 for cross-file analysis (slower per ruleset, but parallel rulesets compensate).
Select mode in Step 2 of the workflow. Mode affects both scanner flags and post-processing.
| Mode | Coverage | Findings Reported |
|---|---|---|
| Run all | All rulesets, all severity levels | Everything |
| Important only | All rulesets, pre- and post-filtered | Security vulns only, medium-high confidence/impact |
Important only applies two filter layers:
--severity MEDIUM --severity HIGH --severity CRITICAL (CLI flag)category=security, confidence∈{MEDIUM,HIGH}, impact∈{MEDIUM,HIGH}
See scan-modes.md for metadata criteria and jq filter commands.
┌──────────────────────────────────────────────────────────────────┐
│ MAIN AGENT (this skill) │
│ Step 1: Detect languages + check Pro availability │
│ Step 2: Select scan mode + rulesets (ref: rulesets.md) │
│ Step 3: Present plan + rulesets, get approval [⛔ HARD GATE] │
│ Step 4: Spawn parallel scan Tasks (approved rulesets + mode) │
│ Step 5: Merge results and report │
└──────────────────────────────────────────────────────────────────┘
│ Step 4
▼
┌─────────────────┐
│ Scan Tasks │
│ (parallel) │
├─────────────────┤
│ Python scanner │
│ JS/TS scanner │
│ Go scanner │
│ Docker scanner │
└─────────────────┘
Follow the detailed workflow in scan-workflow.md. Summary:
| Step | Action | Gate | Key Reference |
|---|---|---|---|
| 1 | Resolve output dir, detect languages + Pro availability | — | Use Glob, not Bash |
| 2 | Select scan mode + rulesets | — | rulesets.md |
| 3 | Present plan, get explicit approval | ⛔ HARD | AskUserQuestion |
| 4 | Spawn parallel scan Tasks | — | scanner-task-prompt.md |
| 5 | Merge results and report | — | Merge script (below) |
Task enforcement: On invocation, create 5 tasks with blockedBy dependencies (each step blocks the previous). Step 3 is a HARD GATE — mark complete ONLY after user explicitly approves.
Merge command (Step 5):
uv run {baseDir}/scripts/merge_sarif.py $OUTPUT_DIR/raw $OUTPUT_DIR/results/results.sarif
| Agent | Tools | Purpose |
|---|---|---|
static-analysis:semgrep-scanner |
Bash | Executes parallel semgrep scans for a language category |
Use subagent_type: static-analysis:semgrep-scanner in Step 4 when spawning Task subagents.
| Shortcut | Why It's Wrong |
|---|---|
| "User asked for scan, that's approval" | Original request ≠ plan approval. Present plan, use AskUserQuestion, await explicit "yes" |
| "Step 3 task is blocking, just mark complete" | Lying about task status defeats enforcement. Only mark complete after real approval |
| "I already know what they want" | Assumptions cause scanning wrong directories/rulesets. Present plan for verification |
| "Just use default rulesets" | User must see and approve exact rulesets before scan |
| "Add extra rulesets without asking" | Modifying approved list without consent breaks trust |
| "Third-party rulesets are optional" | Trail of Bits, 0xdea, Decurity catch vulnerabilities not in official registry — REQUIRED |
| "Use --config auto" | Sends metrics; less control over rulesets |
| "One Task at a time" | Defeats parallelism; spawn all Tasks together |
| "Pro is too slow, skip --pro" | Cross-file analysis catches 250% more true positives; worth the time |
| "Semgrep handles GitHub URLs natively" | URL handling fails on repos with non-standard YAML; always clone first |
| "Cleanup is optional" | Cloned repos pollute the user's workspace and accumulate across runs |
"Use . or relative path as target" |
Subagents need absolute paths to avoid ambiguity |
| "Let the user pick an output dir later" | Output directory must be resolved at Step 1, before any files are created |
| File | Content |
|---|---|
| rulesets.md | Complete ruleset catalog and selection algorithm |
| scan-modes.md | Pre/post-filter criteria and jq commands |
| scanner-task-prompt.md | Template for spawning scanner subagents |
| Workflow | Purpose |
|---|---|
| scan-workflow.md | Complete 5-step scan execution process |
$OUTPUT_DIR
semgrep command used --metrics=off
$OUTPUT_DIR/rulesets.txt
$OUTPUT_DIR/raw/
results.sarif exists in $OUTPUT_DIR/results/ and is valid JSONraw/
$OUTPUT_DIR/repos/