Authorized use only: These tests interact with vector stores and embedding models in RAG systems you own or are authorized to assess. Embedding inversion and cross-tenant probing against systems you do not control may expose third-party data and is prohibited without authorization.
Retrieval-Augmented Generation (RAG) systems convert documents into embedding vectors stored in a vector database (Pinecone, Qdrant, Weaviate, Chroma, pgvector, FAISS) and retrieve the nearest vectors to ground LLM responses. OWASP LLM08:2025 Vector and Embedding Weaknesses covers the security risks unique to this layer:
The parent technique is AML.T0024 — Exfiltration via ML Inference API: an attacker uses legitimate inference/query access to exfiltrate data (source text via inversion, membership, or model extraction). This skill provides a repeatable assessment of all five weakness classes.
# Vector DB clients + embeddings + similarity tooling
python -m pip install numpy scikit-learn sentence-transformers
python -m pip install qdrant-client chromadb pinecone-client weaviate-client
# (optional) text-embedding inversion research baseline
python -m pip install vec2text
| ID | Tactic | Official Technique Name | Role in this skill |
|---|---|---|---|
| AML.T0024 | ATLAS: Exfiltration | Exfiltration via ML Inference API | Using query/embedding access to exfiltrate source data |
| AML.T0024.000 | ATLAS: Exfiltration | Infer Training Data Membership | Membership-inference probe against the corpus |
| AML.T0024.001 | ATLAS: Exfiltration | Invert ML Model | Embedding-inversion reconstruction of source text |
| AML.T0020 | ATLAS: Resource Development | Poison Training Data | Knowledge-base poisoning of the corpus |
| AML.T0051.001 | ATLAS: Initial Access | LLM Prompt Injection: Indirect | Injection payloads embedded in retrieved chunks |
Document the embedding model + dimensions, the vector store and its tenancy model, the chunking strategy, retrieval top_k and similarity metric (cosine/dot/L2), and any metadata filters applied at query time.
# Example: inspect a Qdrant collection
from qdrant_client import QdrantClient
client = QdrantClient(url="http://localhost:6333")
info = client.get_collection("docs")
print(info.config.params.vectors) # size + distance metric
print(client.count("docs")) # corpus size
Embeddings of similar text are close; an attacker with the embedding endpoint can iteratively reconstruct text whose embedding matches a target vector. Measure how much a nearest-neighbour-in-embedding-space recovers, using cosine similarity between candidate reconstructions and the target.
import numpy as np
from sentence_transformers import SentenceTransformer
from sklearn.metrics.pairwise import cosine_similarity
model = SentenceTransformer("all-MiniLM-L6-v2")
secret = "Patient John Doe, MRN 553120, diagnosed with hypertension."
target_vec = model.encode([secret])
# Attacker has only target_vec and the embedding endpoint. Hill-climb candidate text.
candidates = [
"Patient name and medical record number with a diagnosis.",
"John Doe medical record hypertension diagnosis",
"Patient John Doe MRN diagnosed hypertension",
]
cand_vecs = model.encode(candidates)
sims = cosine_similarity(target_vec, cand_vecs)[0]
for c, s in sorted(zip(candidates, sims), key=lambda x: -x[1]):
print(f"{s:.3f} {c}")
# High similarity for a near-verbatim guess => inversion risk is real for this model.
For a research-grade reconstruction baseline, vec2text can be used against compatible embedding models to demonstrate full-text recovery.
Determine whether a specific document is in the corpus by measuring the top-1 retrieval similarity for an exact-quote query: in-corpus items return a markedly higher max similarity than out-of-corpus controls.
def membership_score(client, collection, embed, text):
vec = embed([text])[0].tolist()
hits = client.search(collection_name=collection, query_vector=vec, limit=1)
return hits[0].score if hits else 0.0
in_corpus = membership_score(client, "docs", model.encode, "<exact quote from a known chunk>")
control = membership_score(client, "docs", model.encode, "An unrelated random sentence.")
print(f"in-corpus={in_corpus:.3f} control={control:.3f} delta={in_corpus-control:.3f}")
# A large positive delta indicates the item is in the corpus (membership leak).
Confirm that tenant B cannot retrieve tenant A's chunks. Issue tenant-B-authenticated queries that should be filtered, and verify no tenant-A tenant_id appears in results.
# Query as tenant B; expect ONLY tenant_id == "B" results.
from qdrant_client.models import Filter, FieldCondition, MatchValue
vec = model.encode(["confidential salary information"])[0].tolist()
hits = client.search(
collection_name="docs",
query_vector=vec,
limit=10,
query_filter=Filter(must=[FieldCondition(key="tenant_id", match=MatchValue(value="B"))]),
)
leaked = [h for h in hits if h.payload.get("tenant_id") != "B"]
print("CROSS-TENANT LEAK" if leaked else "isolation OK", "->", len(leaked), "foreign rows")
# Critical test: repeat WITHOUT the filter to confirm the server, not the client,
# enforces isolation. If unfiltered queries return tenant A data, isolation is client-side only.
hits_nofilter = client.search(collection_name="docs", query_vector=vec, limit=10)
print("server-side isolation FAILS" if any(h.payload.get("tenant_id") != "B" for h in hits_nofilter) else "OK")
Insert a benign poisoned chunk crafted to be retrieved for many unrelated queries, then measure how often it appears in top_k.
from qdrant_client.models import PointStruct
# Benign marker payload (no real injection) to measure retrieval dominance.
poison = "POISON-CANARY. " + " ".join(
["password reset billing refund account login support error help"] * 8
)
client.upsert("docs_test", points=[
PointStruct(id=999999, vector=model.encode([poison])[0].tolist(),
payload={"tenant_id": "B", "source": "poison-test"})
])
queries = ["how do I get a refund", "reset my password", "what is the weather"]
for q in queries:
hits = client.search("docs_test", model.encode([q])[0].tolist(), limit=5)
dominated = any(h.payload.get("source") == "poison-test" for h in hits)
print(f"{'POISON in top5' if dominated else 'clean'}: {q}")
Scan retrieved chunk text for injection markers before it is concatenated into the prompt.
import re
INJECTION_PATTERNS = [
r"ignore (all|previous|the above) instructions",
r"system prompt", r"you are now", r"disregard", r"</?(system|instructions)>",
]
def chunk_is_injection(text):
low = text.lower()
return [p for p in INJECTION_PATTERNS if re.search(p, low)]
for hit in client.search("docs", model.encode(["help"])[0].tolist(), limit=10):
flags = chunk_is_injection(hit.payload.get("text", ""))
if flags:
print("INDIRECT INJECTION in chunk", hit.id, flags)
defending-llms-with-guardrails).| Tool | Purpose | Primary Source |
|---|---|---|
| OWASP LLM08 | Vector and Embedding Weaknesses guidance | https://genai.owasp.org/llmrisk/llm082025-vector-and-embedding-weaknesses/ |
| sentence-transformers | Embedding generation for testing | https://www.sbert.net/ |
| Qdrant client | Vector store + filtered search | https://qdrant.tech/documentation/ |
| Chroma / Weaviate / Pinecone | Alternative vector stores | https://docs.trychroma.com/ |
| vec2text | Embedding-inversion research baseline | https://github.com/jxmorris12/vec2text |
| MITRE ATLAS | AML.T0024 Exfiltration via ML Inference API | https://atlas.mitre.org/ |