Legal and Authorized-Use Notice: This skill is for authorized AI security testing and educational purposes only. Probe only models, API keys, and endpoints you own or have explicit written permission to test. Automated probing of third-party LLM APIs may violate their terms of service and consume billable tokens. Unauthorized probing of systems you do not control may be illegal.
garak (Generative AI Red-teaming and Assessment Kit) is an open-source LLM vulnerability scanner maintained by NVIDIA. It plays the role that a network vulnerability scanner like Nessus plays for hosts, but for large language models: it sends thousands of adversarial prompts ("probes") at a target model, captures the generations, and runs automated "detectors" over the responses to decide whether each attempt succeeded. Probe families cover prompt injection (promptinject, latentinjection), jailbreaks (dan), training-data and system-prompt leakage (leakreplay), malware generation (malwaregen), cross-site-scripting payload emission (xss), encoding-based bypasses (encoding), toxicity, and more. garak is described in the paper "garak: A Framework for Security Probing Large Language Models" (arXiv:2406.11036) and is distributed from the NVIDIA/garak GitHub repository.
The scanner is generator-agnostic. It can target Hugging Face models loaded locally, OpenAI-compatible APIs, AWS Bedrock, Replicate, Cohere, NIM endpoints, GGUF/llama.cpp models, and arbitrary REST endpoints via a JSON generator spec. After a run, garak emits a .report.jsonl line-delimited log of every attempt and detector verdict, a human-readable .report.html, a garak.log debug log, and a hit log of confirmed vulnerabilities. The terminal output prints a per-probe, per-detector pass/fail summary with a hit rate (for example dan.Dan_11_0 jailbreak: FAIL ok on 38/40), which is the primary artifact you interpret.
This skill maps to the MITRE ATLAS techniques AML.T0051 (LLM Prompt Injection) and AML.T0054 (LLM Jailbreak) because garak operationalizes both: it crafts prompt-injection and jailbreak inputs at scale and measures whether the target's guardrails hold. It supports the NIST AI RMF MEASURE-2.7 subcategory by providing repeatable, quantitative security/resilience measurement of a deployed AI system.
python -m venv .venv && source .venv/bin/activate # Windows: .venv\Scripts\activate
python -m pip install -U garak
garak --version
python -m pip install -U git+https://github.com/NVIDIA/garak.git@main
export OPENAI_API_KEY="sk-...")..report.jsonl.This skill uses MITRE ATLAS (the adversarial-ML companion to ATT&CK) technique IDs.
| ID | Tactic | Official Name | Relevance |
|---|---|---|---|
| AML.T0051 | ML Attack Staging / Impact | LLM Prompt Injection | garak's promptinject and latentinjection probes craft malicious prompts that subvert intended model behavior. |
| AML.T0054 | Privilege Escalation / Defense Evasion | LLM Jailbreak | garak's dan and related probes attempt to bypass safety guardrails so the model produces restricted content. |
garak --list_probes
garak --list_detectors
garak --list_generators
promptinject — PromptInject-framework direct injection.latentinjection — instructions hidden in documents/encoded text (indirect injection).dan — "Do Anything Now" and related jailbreaks (e.g. dan.Dan_11_0).leakreplay — coax the model into reproducing memorized/training or hidden-prompt text.encoding — base64/ROT13/etc. injection bypasses.xss — emit cross-site-scripting payloads (markdown/HTML exfil).malwaregen — request AV-evading or malicious code.python -m garak --target_type huggingface --target_name gpt2 --probes dan.Dan_11_0
python -m garak \
--target_type huggingface \
--target_name meta-llama/Llama-3.2-1B-Instruct \
--probes promptinject,dan,leakreplay \
--report_prefix llama32_baseline
export OPENAI_API_KEY="sk-..."
python -m garak \
--target_type openai \
--target_name gpt-4o-mini \
--probes promptinject,latentinjection,leakreplay \
--generations 5 \
--parallel_attempts 8 \
--report_prefix gpt4omini_injection
--generations controls how many completions per prompt (more = more statistical confidence, more cost).--parallel_attempts raises throughput for remote APIs.rest.json:
{
"rest": {
"RestGenerator": {
"name": "my-llm-gateway",
"uri": "https://llm.internal.example/v1/chat",
"method": "post",
"headers": { "Authorization": "Bearer $ENV_TOKEN", "Content-Type": "application/json" },
"req_template_json_object": { "model": "internal-bot", "prompt": "$INPUT" },
"response_json": true,
"response_json_field": "$.output"
}
}
}
export ENV_TOKEN="..."
python -m garak \
--target_type rest \
-G rest.json \
--probes promptinject,dan \
--report_prefix internal_gateway
--config:
python -m garak --config assessment.yaml
# assessment.yaml
plugins:
model_type: openai
model_name: gpt-4o-mini
probe_spec: promptinject,latentinjection,dan,leakreplay,xss,malwaregen
run:
generations: 5
parallel_attempts: 8
reporting:
report_prefix: quarterly_llm_assessment
--probes entirely.probe.Class detector: PASS|FAIL ok on N/M. A FAIL with a low ok fraction means the model frequently produced the unsafe behavior — a high-severity finding.# Every attempt with detector verdicts is one JSON line
jq -r 'select(.entry_type=="eval") | "\(.probe)\t\(.detector)\t\(.passed)/\(.total)"' \
garak.<timestamp>.report.jsonl | sort
.report.html in a browser for the formatted scorecard and per-probe breakdown.--report_prefix.| Resource | Purpose | Link |
|---|---|---|
| NVIDIA/garak | Source, probe list, issues | https://github.com/NVIDIA/garak |
| garak documentation | CLI reference, generator configs | https://docs.garak.ai/ and https://reference.garak.ai/ |
| garak paper (arXiv:2406.11036) | Methodology and design | https://arxiv.org/abs/2406.11036 |
| OWASP Top 10 for LLM Applications | Risk taxonomy probes map to | https://genai.owasp.org/ |
| MITRE ATLAS | AML technique definitions | https://atlas.mitre.org/ |
| Probe family | Targets | OWASP LLM mapping |
|---|---|---|
promptinject |
Direct prompt injection | LLM01 |
latentinjection |
Indirect / hidden-context injection | LLM01 |
dan |
Jailbreak / guardrail bypass | LLM01 / safety |
leakreplay |
Training-data & prompt leakage | LLM02 / LLM07 |
encoding |
Encoding-based filter bypass | LLM01 |
xss |
Markdown/HTML exfiltration payloads | LLM02 |
malwaregen |
Malicious code generation | misuse |
garak --version succeeds.--list_probes / --list_detectors.--report_prefix set..report.jsonl, .report.html, and garak.log produced and located.