Supported languages: Python, JavaScript/TypeScript, Go, Java/Kotlin, C/C++, C#, Ruby, Swift.
Skill resources: Reference files and templates are located at {baseDir}/references/ and {baseDir}/workflows/.
Database quality is non-negotiable. A database that builds is not automatically good. Always run quality assessment (file counts, baseline LoC, extractor errors) and compare against expected source files. A cached build produces zero useful extraction.
Data extensions catch what CodeQL misses. Even projects using standard frameworks (Django, Spring, Express) have custom wrappers around database calls, request parsing, or shell execution. Skipping the create-data-extensions workflow means missing vulnerabilities in project-specific code paths.
Explicit suite references prevent silent query dropping. Never pass pack names directly to codeql database analyze — each pack's defaultSuiteFile applies hidden filters that can produce zero results. Always generate a custom .qls suite file.
Zero findings needs investigation, not celebration. Zero results can indicate poor database quality, missing models, wrong query packs, or silent suite filtering. Investigate before reporting clean.
macOS Apple Silicon requires workarounds for compiled languages. Exit code 137 is arm64e/arm64 mismatch, not a build failure. Try Homebrew arm64 tools or Rosetta before falling back to build-mode=none.
Follow workflows step by step. Once a workflow is selected, execute it step by step without skipping phases. Each phase gates the next — skipping quality assessment or data extensions leads to incomplete analysis.
All generated files (database, build logs, diagnostics, extensions, results) are stored in a single output directory.
OUTPUT_DIR../static_analysis_codeql_1. If that already exists, increment to _2, _3, etc.In both cases, always create the directory with mkdir -p before writing any files.
# Resolve output directory
if [ -n "$USER_SPECIFIED_DIR" ]; then
OUTPUT_DIR="$USER_SPECIFIED_DIR"
else
BASE="static_analysis_codeql"
N=1
while [ -e "${BASE}_${N}" ]; do
N=$((N + 1))
done
OUTPUT_DIR="${BASE}_${N}"
fi
mkdir -p "$OUTPUT_DIR"
The output directory is resolved once at the start before any workflow executes. All workflows receive $OUTPUT_DIR and store their artifacts there:
$OUTPUT_DIR/
├── rulesets.txt # Selected query packs (logged after Step 3)
├── codeql.db/ # CodeQL database (dir containing codeql-database.yml)
├── build.log # Build log
├── codeql-config.yml # Exclusion config (interpreted languages)
├── diagnostics/ # Diagnostic queries and CSVs
├── extensions/ # Data extension YAMLs
├── raw/ # Unfiltered analysis output
│ ├── results.sarif
│ └── <mode>.qls
└── results/ # Final results (filtered for important-only, copied for run-all)
└── results.sarif
A CodeQL database is identified by the presence of a codeql-database.yml marker file inside its directory. When searching for existing databases, always collect all matches — there may be multiple databases from previous runs or for different languages.
Discovery command:
# Find ALL CodeQL databases (top-level and one subdirectory deep)
find . -maxdepth 3 -name "codeql-database.yml" -not -path "*/\.*" 2>/dev/null \
| while read -r yml; do dirname "$yml"; done
$OUTPUT_DIR: find "$OUTPUT_DIR" -maxdepth 2 -name "codeql-database.yml"
find . -maxdepth 3 -name "codeql-database.yml" — covers databases at the project top level (./db-name/) and one subdirectory deep (./subdir/db-name/). Does not search deeper.Never assume a database is named codeql.db — discover it by its marker file.
When multiple databases are found:
For each discovered database, collect metadata to help the user choose:
# For each database, extract language and creation time
for db in $FOUND_DBS; do
CODEQL_LANG=$(codeql resolve database --format=json -- "$db" 2>/dev/null | jq -r '.languages[0]')
CREATED=$(grep '^creationMetadata:' -A5 "$db/codeql-database.yml" 2>/dev/null | grep 'creationTime' | awk '{print $2}')
echo "$db — language: $CODEQL_LANG, created: $CREATED"
done
Then use AskUserQuestion to let the user select which database to use, or to build a new one. Skip AskUserQuestion if the user explicitly stated which database to use or to build a new one in their prompt.
For the common case ("scan this codebase for vulnerabilities"):
# 1. Verify CodeQL is installed
if ! command -v codeql >/dev/null 2>&1; then
echo "NOT INSTALLED: codeql binary not found on PATH"
else
codeql --version || echo "ERROR: codeql found but --version failed (check installation)"
fi
# 2. Resolve output directory
BASE="static_analysis_codeql"; N=1
while [ -e "${BASE}_${N}" ]; do N=$((N + 1)); done
OUTPUT_DIR="${BASE}_${N}"; mkdir -p "$OUTPUT_DIR"
Then execute the full pipeline: build database → create data extensions → run analysis using the workflows below.
These shortcuts lead to missed findings. Do not accept them:
security-extended misses entirely.arm64e/arm64 mismatch, not a fundamental build failure. See macos-arm64e-workaround.md.defaultSuiteFile applies hidden filters and can produce zero results. Always use an explicit suite reference.$OUTPUT_DIR. Scattering files in the working directory makes cleanup impossible and risks overwriting previous runs.This skill has three workflows. Once a workflow is selected, execute it step by step without skipping phases.
| Workflow | Purpose |
|---|---|
| build-database | Create CodeQL database using build methods in sequence |
| create-data-extensions | Detect or generate data extension models for project APIs |
| run-analysis | Select rulesets, execute queries, process results |
If user explicitly specifies what to do (e.g., "build a database", "run analysis on ./my-db"), execute that workflow directly. Do NOT call AskUserQuestion for database selection if the user's prompt already makes their intent clear — e.g., "build a new database", "analyze the codeql database in static_analysis_codeql_2", "run a full scan from scratch".
Default pipeline for "test", "scan", "analyze", or similar: Discover existing databases first, then decide.
# Find ALL CodeQL databases by looking for codeql-database.yml marker file
# Search top-level dirs and one subdirectory deep
FOUND_DBS=()
while IFS= read -r yml; do
db_dir=$(dirname "$yml")
codeql resolve database -- "$db_dir" >/dev/null 2>&1 && FOUND_DBS+=("$db_dir")
done < <(find . -maxdepth 3 -name "codeql-database.yml" -not -path "*/\.*" 2>/dev/null)
echo "Found ${#FOUND_DBS[@]} existing database(s)"
| Condition | Action |
|---|---|
| No databases found | Resolve new $OUTPUT_DIR, execute build → extensions → analysis (full pipeline) |
| One database found | Use AskUserQuestion: reuse it or build new? |
| Multiple databases found | Use AskUserQuestion: list all with metadata, let user pick one or build new |
| User explicitly stated intent | Skip AskUserQuestion, act on their instructions directly |
When existing databases are found and the user did not explicitly specify which to use, present via AskUserQuestion:
header: "Existing CodeQL Databases"
question: "I found existing CodeQL database(s). What would you like to do?"
options:
- label: "<db_path_1> (language: python, created: 2026-02-24)"
description: "Reuse this database"
- label: "<db_path_2> (language: cpp, created: 2026-02-23)"
description: "Reuse this database"
- label: "Build a new database"
description: "Create a fresh database in a new output directory"
After selection:
$OUTPUT_DIR to its parent directory (or the directory containing it), set $DB_NAME to the selected path, then proceed to extensions → analysis.$OUTPUT_DIR, execute build → extensions → analysis.If the user's intent is ambiguous (neither database selection nor workflow is clear), ask:
I can help with CodeQL analysis. What would you like to do?
1. **Full scan (Recommended)** - Build database, create extensions, then run analysis
2. **Build database** - Create a new CodeQL database from this codebase
3. **Create data extensions** - Generate custom source/sink models for project APIs
4. **Run analysis** - Run security queries on existing database
[If databases found: "I found N existing database(s): <list paths with language>"]
[Show output directory: "Output will be stored in <OUTPUT_DIR>"]
| File | Content |
|---|---|
| Workflows | |
| workflows/build-database.md | Database creation with build method sequence |
| workflows/create-data-extensions.md | Data extension generation pipeline |
| workflows/run-analysis.md | Query execution and result processing |
| References | |
| references/macos-arm64e-workaround.md | Apple Silicon build tracing workarounds |
| references/build-fixes.md | Build failure fix catalog |
| references/quality-assessment.md | Database quality metrics and improvements |
| references/extension-yaml-format.md | Data extension YAML column definitions and examples |
| references/sarif-processing.md | jq commands for SARIF output processing |
| references/diagnostic-query-templates.md | QL queries for source/sink enumeration |
| references/important-only-suite.md | Important-only suite template and generation |
| references/run-all-suite.md | Run-all suite template |
| references/ruleset-catalog.md | Available query packs by language |
| references/threat-models.md | Threat model configuration |
| references/language-details.md | Language-specific build and extraction details |
| references/performance-tuning.md | Memory, threading, and timeout configuration |
A complete CodeQL analysis run should satisfy:
$OUTPUT_DIR
codeql-database.yml marker) with quality assessment passed (baseline LoC > 0, errors < 5%)$OUTPUT_DIR/extensions/ or explicitly skipped with justification$OUTPUT_DIR/rulesets.txt
$OUTPUT_DIR/raw/results.sarif
$OUTPUT_DIR/results/results.sarif (filtered for important-only, copied for run-all)$OUTPUT_DIR/build.log with all commands, fixes, and quality assessments