Skills Engineering Structured OSINT for Attack Surface Assessment

Structured OSINT for Attack Surface Assessment

v20260427
osint-methodology
A comprehensive, multi-stage methodology for external red-team operations, attack surface mapping, and threat actor investigation. Covers five stages of reconnaissance (seed discovery, asset expansion, enrichment, exposure analysis, reporting), identity fabric mapping (M365, Okta, etc.), cloud/mobile attack surface enumeration, breach correlation, and advanced forensic techniques. Used for authorized assessments and intelligence gathering.
Get Skill
66 downloads
Overview

OSINT Methodology — External Red-Team Edition

0. When to use this skill / When NOT

Use this skill when:

  • Planning or executing external reconnaissance against an authorized target (red team, bug bounty in-scope, ASM engagement).
  • Mapping an organization's external attack surface end-to-end (subdomains → assets → exposure → attack paths).
  • Investigating a person, entity, or threat actor where evidence discipline matters.
  • Tracing cryptocurrency flows, geolocating media, performing image/video forensics, or chronolocating events.
  • Building a structured OSINT campaign that needs reproducibility, severity grading, and clean handoffs.
  • Producing client-facing deliverables (exec summaries, technical reports, reproduction packages) from offensive engagements.

Do NOT use this skill when:

  • The user is asking for active exploitation, post-exploitation, lateral movement, AD privilege escalation, malware development, or anything beyond reconnaissance — those are out of scope.
  • The user is asking for blue-team / defensive content (SIEM rules, detection engineering) — different domain.
  • The target's authorization is unclear and the user is asking you to act against a third-party asset they don't own — see §1 below; gently surface the scope question before proceeding.

1. Authorization & Legal Posture

This skill is intended for assets the operator owns or has written authorization to assess (red-team rules of engagement, bug-bounty in-scope assets, ASM contracts).

Soft scope check: when a user asks you to act against a target whose authorization isn't established earlier in the conversation, ask once before proceeding:

"Quick scope check: is this a target you own or have written authorization to assess (e.g., a red-team engagement, in-scope bug-bounty asset, or your own infrastructure)? I want to make sure we stay on the right side of the engagement boundary."

Once authorization is asserted, proceed without re-asking. If the user explicitly states the engagement type (e.g., "this is for our pentest of acme.com under contract"), you don't need to ask again.

Always-on guardrails (regardless of authorization):

  • Never weaken auth, rate limits, banners, or any safety control that enforces scope on the target side.
  • Never run destructive probes (true SYN scans on production, masscan at line rate, fuzzing/brute-force) outside an explicit DEEP / --aggressive mode.
  • Never paste real PII, valid credentials, session tokens, API keys, or other secrets into cloud-hosted LLMs or third-party services.
  • Never take action against assets outside the documented scope, even if "obviously related" (subsidiaries, vendors, employees' personal accounts, etc.).

2. Confidence Levels

Every assertion you make during an engagement should carry a confidence level. Three levels:

Level Meaning Examples
TENTATIVE Plausible based on indirect evidence; unverified. Snippet-only Google dork match; email pattern inferred from name; subdomain returned by one passive source only; favicon-hash overlap (two hosts share a favicon — could be shared infra, could be a coincidence).
FIRM Directly observed but uncorroborated. Subdomain that resolves to an IP; HEAD-confirmed bucket exists (private); CT-log entry shows certificate; Shodan banner returned.
CONFIRMED Multiple independent corroborations OR directly verified. Live-validated PMAK token (read-only /me returned 200); breach corpus + crt.sh + DNS all agree; bucket listable AND files retrievable; user enumerated AND password reset flow returns valid hint.

Rule of three for attribution: require three independent weak signals, OR one strong + one weak, before asserting linkage. Don't single-source attribute.

2.1 Confidence Upgrade Workflows

Confidence isn't static — every TENTATIVE asset should have a documented path to FIRM and to CONFIRMED. Use these per-asset-type rules.

Asset type TENTATIVE → FIRM FIRM → CONFIRMED
Subdomain Returned by ≥2 independent passive sources, OR DNS A/AAAA/CNAME resolves successfully. Serves on a standard port (80/443/22/etc.) AND HTTP banner / TLS cert / SSH banner returned.
IP Discovered via ≥2 sources (passive DNS, ASN lookup, Shodan). Active probe responds (TCP SYN-ACK on at least one port, or ICMP echo reply).
WebApp URL extracted from JS / API / archive but not yet hit. HTTP request returns 2xx/3xx/4xx (any non-network-error response) AND content-length > 0.
Email Generated from a name pattern OR returned by snippet-only dork. Listed in Hunter.io / EmailRep / IntelX / breach corpus, OR MAIL FROM/RCPT TO SMTP probe returns 250 (without delivery — abort at DATA).
Bucket (S3/GCS/Azure) Permutation candidate; no probe yet. HEAD returns 200, 301, or 403 (existence confirmed). Then CONFIRMED when GET returns object listing or known object retrieval.
Endpoint (API / wayback) Extracted from JS regex / Wayback / Postman. HTTP request returns non-404 (route exists). Then CONFIRMED when the endpoint's behavior is fingerprinted (auth posture, response shape, rate limits).
Credential / secret Matches catalog regex in captured text. Read-only validator (/me, auth.test, sts:GetCallerIdentity, /user) returns success. Then CONFIRMED with documented scope + account ID.
Person Name extracted from a single source (LinkedIn / breach / GitHub commit). Confirmed by a second source (Hunter.io role + LinkedIn profile, or two breach sources with same email).
Repo Name match on org keyword in GitHub search. Repo metadata shows confirmed org/email/website match. Then CONFIRMED when commit-history shows employee involvement.
Mobile app Name match in app store. Ownership-confidence score ≥70 (see companion skill §21). Then CONFIRMED when binary metadata (signing cert, package name, dev account) ties back to target.
Certificate Returned by crt.sh once. CT-log entry confirmed in ≥2 logs. Then CONFIRMED when serving on a discovered host.
SSO tenant Discovery-endpoint returns OIDC metadata. Tenant GUID extracted AND domain resolves through the tenant's expected MX / autodiscover / SP record.

Default reporting posture: never claim CONFIRMED without explicit corroboration. When in doubt, downgrade. Operators trust under-claims more than over-claims.


3. Output Format Conventions

When you produce findings during an active session, structure each finding to match the schema below — it drops cleanly into asset-management tools.

Finding:
  id:           <stable hash or UUID>
  module:       <which technique discovered it; "manual" if hand-found>
  asset_key:    <typed key, e.g. sub:api.example.com or webapp:https://example.com/admin>
  category:     <e.g. SECRET_LEAK, MISSING_HSTS, OPEN_GRAPHQL_API, LEAKED_CRED, SSO_EXPOSURE>
  severity:     <info|low|medium|high|critical>
  confidence:   <tentative|firm|confirmed>
  title:        <one-line summary>
  description:  <2-5 sentences>
  evidence:
    url:        <where it was found>
    timestamp:  <UTC ISO8601>
    sha256:     <hash of any downloaded artifact>
    raw:        <truncated to 2 KiB>
  references:
    - <CVE-ID, advisory URL, vendor doc>
  remediation:  <action the asset owner can take>

Always use UTC timestamps. Local time creates correlation bugs across notes/screenshots/logs.


4. Source Hygiene & Citations

For every artifact you capture, record: URL + UTC timestamp + SHA-256 hash + tool version + run_id.

  • Hash all downloaded files with SHA-256.
  • Screenshot in PNG (lossless, smaller than full-page WARC for evidence packs).
  • Capture raw HTTP requests/responses, capped at 2 KiB body to keep evidence packs small.
  • Use JSONL (NDJSON) logs, one line per event, with a run_id so the entire engagement is replayable.
  • Separate evidence read-only from working copies; never edit captured artifacts.

When citing a source in your output, prefer durable references (CVE, vendor advisory, ATT&CK technique ID, RFC) over ephemeral ones (a Twitter post, a forum thread). If the only source is ephemeral, archive it (archive.today, Wayback SavePageNow) before citing.


5. Do NOT (hard rules)

  • DO NOT paste creds, session tokens, API keys, real PII, infostealer logs, or unique pivots into cloud LLMs (ChatGPT, Claude.ai, Gemini, Perplexity). Use local models (Ollama, LM Studio, GPT4All) for sensitive analysis.
  • DO NOT assume vendor labels are ground truth. Cross-label sanity: TRM, Chainalysis, Arkham can disagree. Treat every label as a hypothesis.
  • DO NOT assume 1:1 bridge flows. Bridges/mixers/wrappers introduce mint/burn semantics; validate with on-chain proofs.
  • DO NOT assert ownership from a single signal. Favicon-hash overlap, shared CT issuer, shared NS — each is a hypothesis. Need rule-of-three.
  • DO NOT run fuzzing, SYN scans, masscan, or nuclei fuzzing/* templates outside an explicit DEEP / --aggressive mode.
  • DO NOT use a credential validator to do anything except read-only verification (no create/delete/send).
  • DO NOT mirror-image (assume the target thinks like you do). Separate capability from intent and sponsorship.
  • DO NOT confuse correlation with control.
  • DO NOT escalate when you encounter active defenses; back off and document (see §6.4).

6. OpSec

6.1 Sock Puppets

A sock puppet is a fake account that cannot be linked to you. Build a posting history, age the account, use it from a separate browser profile.

Resources & techniques:

  • Persona generation: Fake Name Generator, This Person Does Not Exist.
  • Browser isolation: Firefox Multi-Account Containers, or dedicated profiles per persona.
  • Disposable phone numbers: Burner, Silent Link (some platforms reject VoIP — keep a backlog of numbers).
  • Hardware passkeys for any high-value persona; store recovery codes offline.
  • Audit every browser extension before installation. Supply-chain attacks on popular extensions have repeatedly targeted investigators — assume the popular ones are at higher risk, not lower.
  • Maintain chain-of-custody: timestamp every action, hash every key artifact, record tool versions per case.
  • Personas should look like real low-engagement accounts: profile photo (synthetic), bio, a few low-effort posts spread across weeks before the persona is "used."

References:

6.2 Detectability & OpSec Tagging

Every probe leaves a footprint. Tag every operation in your notes with a detectability level so you can reason about the SIEM trail you're leaving on the target's side.

Tag Examples
Low Passive Shodan InternetDB; CT-log queries (crt.sh); Wayback CDX; passive DNS (SecurityTrails); Hunter.io email enrichment; HTTP HEAD on public buckets; getuserrealm.srf; Microsoft OIDC metadata fetch.
Medium Microsoft GetCredentialType user-enum; Okta /api/v1/authn user-enum; Postman API key validation; AWS sts:GetCallerIdentity (logs to CloudTrail); Slack auth.test; full-page screenshots; Swagger/GraphQL probes against a 28/13-path wordlist; targeted favicon-hash + JARM fingerprinting.
High Active port scans (naabu / masscan / nmap); Nuclei full template runs against production; subdomain brute-force at scale; APK download from third-party mirrors; deep-mode user enumeration past N attempts per tenant; SMTP RCPT TO enumeration; web fuzzing (ffuf/gobuster).

When working with a client, document the operations actually run and their detectability tag in the engagement report — clients appreciate knowing what their detection stack should have caught.

Defaults: passive by default. Active probes only when (a) explicitly authorized, (b) within agreed maintenance windows, and (c) with the operator's awareness of the resulting log volume.

6.3 Validator Discipline

When you discover a credential in the wild (a leaked API key, a sourcemap-exposed token, a hard-coded PMAK in a public Postman workspace), you may want to confirm it's live. Do this with read-only validators only.

Discipline:

  • Read-only endpoint only (e.g., /me, /whoami, auth.test, sts:GetCallerIdentity).
  • Never use the validated credential to create, modify, delete, or send anything.
  • Tag the validation attempt with detectability — every validator generates an audit-log entry on the provider side.
  • Record checked_at (UTC), the response (truncated), and the scope/account-ID returned.
  • If the operator's rules of engagement forbid validation, mark the credential validation_skipped_by_policy and stop.

Concrete validator endpoints (Postman, AWS, GitHub, Slack, Anthropic, OpenAI, npm, Atlassian, DataDog) live in the companion offensive-osint skill.

6.4 Detection-Aware Probing (signs of detection + back-off)

Your probes will eventually hit detection. Recognize the signs and back off before you trip an active response.

Signs you've been detected (in roughly increasing severity):

  1. Rate-limit responses429 Too Many Requests, Retry-After header set, X-RateLimit-Remaining: 0.
  2. Captcha interstitials — Cloudflare interstitial page, hCaptcha challenge, AWS WAF page.
  3. WAF page — explicit "Access denied" with provider branding (Cloudflare, Akamai, Imperva, F5 ASM, AWS WAF, Sucuri).
  4. Status code drift — endpoints that previously returned 200/401 now return 403 only from your IP.
  5. Banner change — server header shape or response timing changes consistently.
  6. DNS poisoning back to NXDOMAIN — target's authoritative servers stop resolving subdomains (probably their CDN took over).
  7. Honeypot bait — endpoints that look too good (/admin/db_dump.sql, exposed .env with credentials that don't validate). Real exposures rarely look this clean.
  8. Direct contact — your sock-puppet email gets a "we noticed unusual activity" message; or, in extreme cases, your IP gets a courtesy abuse-contact email.

Back-off ladder:

  1. Slow down. Halve your concurrency. Add 2–10s jitter between requests.
  2. Switch endpoints. Stop hitting the path that triggered. Move to a different module of the recon pipeline.
  3. Switch persona. New User-Agent (rotate among realistic browsers), new TLS fingerprint (different httpx/curl version).
  4. Switch IP. Rotate to a new egress (residential proxy, Tor for sensitive lookups, a different cloud region).
  5. Pause. Wait 1–24 hours. Many WAFs have rolling-window IP-based reputation; passive time often resets it.
  6. Document and consult. If you've hit (3) WAF, (4) status drift, or (8) direct contact, stop active probing and consult the engagement lead. Continued probing past these signals risks scope violation.

Persona / IP rotation rules:

  • Never rotate persona to one that's been used in a prior engagement against the same target.
  • Use residential proxies (Bright Data, Smartproxy, IPRoyal) for high-detectability work — but be aware they're sometimes IP-blocklisted by Cloudflare.
  • Tor exit nodes are useful for passive lookups (CT logs, archive sites) but are blocked by most active-probe targets.
  • Cloud egress IPs (AWS / GCP / Azure) are often blocklisted aggressively for recon. Use sparingly.
  • Document every rotation with timestamp + reason; reviewers will ask.

Don't:

  • Don't try to "outsmart" a confirmed WAF block by sending more aggressive payloads. That's how clients get extra logs and how you get caught.
  • Don't switch source IPs to evade an explicit block-list — that crosses into evasion territory and may breach the rules of engagement.
  • Don't ignore signals because the dashboard says "still up." The probe is being silently logged; the response will come later.

7. External Red-Team Recon Pipeline

A 5-stage pipeline for any authorized external assessment. Stages are sequential; modules within a stage can run concurrently.

Stage 1 — Seed Discovery

Establish the ground truth of who/what the target is.

  • WHOIS on the seed domain (registrant, dates, name servers).
  • ASN enumeration: which AS does the org own/use? (Hurricane Electric BGP Toolkit, RIPEstat, BGPView.)
  • DNS records (A/AAAA/MX/TXT/NS/SOA/CAA) — records-only, no walking yet.
  • Certificate Transparency history for the root domain (crt.sh, Censys).

Stage 2 — Asset Expansion

Discover everything that might belong to the target.

  • Subdomain enumeration (passive sources first: crt.sh, VirusTotal, AlienVault OTX, Shodan, then permutations and bruteforce).
  • Cloud bucket enumeration (S3/GCS/Azure permutations from company name + subdomain stems — see §15).
  • Typosquat domain generation (dnstwist variants → resolve → WHOIS) — for both phishing risk and adjacent corp assets.
  • Wayback CDX archive endpoints for forgotten paths.
  • Mobile app discovery (Android via google-play-scraper, iOS via iTunes Search API — see §14).
  • DNS deep walking (NSEC walk on misconfigured zones, AXFR opportunism).
  • LinkedIn employee enumeration → email-pattern derivation.

Stage 3 — Enrichment

Add depth to the discovered assets.

  • Port + service detection (Shodan InternetDB free → naabu/masscan if authorized).
  • Live TLS handshakes (cert chain, JARM, favicon mmh3 hash).
  • Web tech detection (Wappalyzer-style ~600 signatures via httpx).
  • WAF/CDN inference (header markers).
  • Origin discovery if behind CDN (see §27).
  • Security header audit.
  • Bulk screenshots (triage 1000s of hosts visually).
  • Email harvesting (6 parallel sources).
  • Email security audit (SPF/DMARC/DKIM/BIMI/MTA-STS).
  • GitHub code-search dorking (13 dork templates × 29+ secret regexes).
  • JavaScript deep analysis (sourcemaps, secrets, endpoints, internal-host leakage).
  • SSO/IdP tenant fingerprinting (Entra, Okta, ADFS, Google, SAML, M365 Teams/SharePoint/OAuth — see §11).
  • API & auth-map discovery (Swagger/OpenAPI, GraphQL, Postman).
  • Secrets-beyond-GitHub sweep (Postman public workspaces, Stack Exchange, Trello/Notion/Atlassian dorks).
  • Vendor product fingerprinting (Citrix/F5/PaloAlto/Pulse/Fortinet/Cisco/VMware/Exchange).
  • Container / CI-CD / cloud-native exposure check.
  • Job posting harvest for tech-stack inference.

Stage 4 — Exposure Analysis

Convert assets into findings.

  • Nuclei (15 always-on built-in checks + optional binary).
  • TLS deep audit (sslyze / testssl.sh).
  • Breach × identity correlation (HudsonRock Cavalier, HIBP, DeHashed, IntelX, local corpus → SSO_EXPOSURE findings).
  • Targeted misconfiguration probes (.git/config, .env, phpinfo.php, /actuator/env, /actuator/heapdump, _cat/indices, /console, /manager/html).
  • Vulnerability prioritization (CVE × EPSS × CISA KEV × public-POC availability — see §28).

Stage 5 — Reporting

Make the work usable.

  • Risk scoring per finding (CVSS + program-specific weights).
  • Asset graph export (D3-friendly nodes/links, GraphML, JSON).
  • Client-facing report (executive summary + technical detail + remediation — see §31).
  • Reproduction package (run_id, tool versions, raw evidence, JSONL log).
  • Bug bounty submission (if applicable — see §30).

7.5 Pipeline Priority Order (highest signal density first)

When budget is constrained, work in this order:

  1. Breaches — infostealer logs (HudsonRock Cavalier free tier) + HIBP + DeHashed. Highest ROI for red teams; often gives valid plaintext creds for corp SSO. Requires emails as input.
  2. GitHub recon — code-search dorks. Finds AWS keys, Slack tokens, JWT secrets, .env files. Fastest path to cloud pivot.
  3. Nuclei misconfig sweep — exposed admin panels, CVEs with public POCs.
  4. Cloud buckets — permutate company name + subdomain stems. Listable bucket = CRITICAL.
  5. Ports — Shodan InternetDB first (free, keyless). VPN concentrators, RDP, Jenkins, GitLab-CE, Elasticsearch are the high-value pivot points.
  6. Email OSINT — feeds breaches; feeds phishing list.
  7. Web tech / WAF / screenshots — triage thousands of hosts; know the stack before probing.
  8. Wayback — archived JS often has hard-coded keys; archived endpoints reveal removed admin/dev paths.
  9. DNS deep + email security — SPF/DMARC gaps enable email spoofing; TXT verification tokens reveal SaaS tenancies.
  10. Certificates — CT-log timeline catches forgotten subdomains; weak ciphers = cheap findings.
  11. ASN + reverse DNS — corporate IP space hosts unadvertised infra.
  12. WHOIS — registrant PII reveals adjacent corp assets.
  13. Typosquat — actively-registered squats are findings; unregistered ones go on the phishing-domain shortlist.
  14. Security headers — low standalone value but required for client reports.

7.6 Time Budgeting & Engagement Profiles

Stage and asset count drive how long a recon takes. Rough estimates (single operator on a typical SaaS-style target):

Stage Small org (<100 employees) Medium (100–1K) Large (1K+)
1. Seed discovery 30 min 30 min 30 min
2. Asset expansion 1–2 h 2–4 h 4–8 h
3. Enrichment (per 100 alive webapps) ~1 h ~1 h ~1 h
4. Exposure analysis 1–3 h 3–6 h 6–12 h
5. Reporting 2–4 h 4–8 h 1–2 days

Engagement profiles:

  • 1-hour rapid recon ("how exposed is X?") — Stage 1 (15 min) → passive subdomain (crt.sh + Subfinder, 10 min) → Shodan InternetDB on resolved IPs (5 min) → email harvest via Hunter+IntelX (10 min) → breach lookup on emails (10 min) → executive-summary-only output (10 min).
  • 4-hour focused recon ("phish-readiness check") — adds: full email harvest, LinkedIn employee enum, SPF/DMARC analysis, typosquat candidate generation, SSO/IdP fingerprinting. Output: phishing-feasibility report + target email list.
  • 1-day standard recon — full Stages 1–4 with the priority order above. Output: per-asset finding list + asset graph + exec summary.
  • 1-week deep recon — all of standard, plus: deep-mode user enumeration, JS deep analysis at full budget, mobile attack surface, cloud-native fingerprinting, vendor product fingerprinting, package registry leak hunting, vulnerability prioritization. Output: full client deliverable package + reproduction bundle.
  • Ongoing monitoring (weekly diff) — re-run Stages 1–3 weekly; diff against baseline; alert on new asset / new finding / asset disappeared.

When to abort early:

  • After Stage 1 if scope is wrong (target turns out to be subsidiary of unrelated corp; rules of engagement need clarification).
  • After Stage 2 if attack surface is below threshold (no public webapps + no exposed services + no leaked emails → little to find externally).
  • During any stage if you hit the WAF / detection signs in §6.4.

8. Asset Graph Discipline

Treat every discovery as a typed asset in a graph, not a free-floating string.

8.1 Asset Taxonomy (29 types)

Category Asset Types
DNS / Network domain, subdomain, ip, netblock, asn
Service port, service, certificate
Identity email, person, credential
Code / Config repo, secret
Cloud / Storage bucket, firebase_project
Web webapp, wayback_endpoint, api_endpoint, api_spec, graphql_schema
Mobile mobile_app, deep_link, exported_component
Phishing / Adversarial typosquat_domain
Collaboration / SaaS postman_collection, postman_workspace, postman_api_key, stack_post, saas_public_surface

8.2 Asset Schema

Every asset carries:

  • type — one of the 29 above.
  • key — unique dedup id (typed prefix, e.g. sub:api.example.com, email:alice@example.com).
  • value — the actual string/object.
  • sources[] — every source that confirmed this asset (deduplicated).
  • confidence — TENTATIVE / FIRM / CONFIRMED.
  • first_seen, last_seen — UTC timestamps.
  • attrs{} — type-specific metadata (e.g., for a webapp: status_code, title, tech-stack list, JARM, favicon mmh3, screenshot path).

8.3 Edge Taxonomy

Relationships are typed edges, not text: RESOLVES_TO, HOSTED_ON, IN_NETBLOCK, BELONGS_TO_ASN, LISTED_IN_CERT, OWNED_BY, ALIAS_OF, BREACHED_FROM, EMPLOYED_BY, HOSTS_REPO, TYPOSQUAT_OF, EXPOSES, DOCUMENTED_BY, BELONGS_TO_HOST, REQUIRES_AUTH, LEAKS_SCHEMA, SHIPPED_BY_ORG, CONTAINS_SECRET, TALKS_TO_HOST, EXPOSES_DEEPLINK, HAS_EXPORTED_COMPONENT, USES_FIREBASE_PROJECT, LACKS_PINNING_FOR.

8.4 Discipline rules

  • Every discovery is an asset. Don't write findings against free-floating strings; create the asset first, then attach the finding.
  • Dedup by key, not by value. Same value, different type ≠ same asset (sub:api.example.com and webapp:https://api.example.com/ are different assets with a BELONGS_TO_HOST edge).
  • Provenance is non-negotiable. sources[] must list every source. If two sources confirmed it, both go in.
  • Confidence is per-source, then aggregated. A subdomain returned by 3 passive sources is FIRM; one returned by snippet-only Bing is TENTATIVE.
  • Late binding via sidecars. When module A produces output that module B needs, write a JSON sidecar (mobile_endpoints.json, secrets_sidecar.json) — don't block module B on module A. See §24.

8.5 Asset-Level Triage Rules

When you have a mixed bag of assets and limited probe budget, prioritize by what each asset enables:

WebApp priority by hostname signal (highest first):

  1. Auth-related hostnames (auth., login., sso., idp., accounts., oauth.).
  2. Admin paths (/admin, /dashboard, /console, /manager, /wp-admin, /phpmyadmin).
  3. Dev/staging hosts (dev., staging., stg., qa., uat., test., sandbox., preprod., preview.) — lower defenses, often dump prod data.
  4. API hostnames (api., services., gateway., graph.).
  5. Customer-facing hostnames (portal., app., my., account.).
  6. Marketing / content (www., blog., news., careers., support.).

Subdomain priority by inferred function:

  • API > Admin > Dev > Auth > Prod-app > Marketing.

IP priority by netblock:

  • Corporate ASN-owned (most likely to host unadvertised internal infra).
  • Cloud netblocks (AWS / GCP / Azure / DO / OVH) — high turnover but interesting for cloud-native services.
  • CDN ranges (Cloudflare / Akamai / Fastly) — usually edge, not origin; defer unless doing origin discovery.

Email priority by role hint:

Role indicator Priority Why
ceo@, cfo@, cto@, ciso@ HIGHEST Exec accounts have highest breach value (BEC, finance authority, board access).
it@, helpdesk@, support@, security@ HIGH IT/security accounts have privileged tool access; helpdesk accounts handle reset workflows.
dev, engineer, architect, dba MEDIUM Developer accounts often have GitHub / cloud / CI access.
sales, marketing, hr, finance MEDIUM SaaS access (Salesforce, HubSpot, Workday); finance enables BEC.
Generic role accounts (info@, noreply@, contact@) LOW Often unmonitored or alias forwarded; less personal context.

Repo priority by recency + naming:

  • Recently-pushed (last 30 days) > stale.
  • Public repo with target name in description > target name only in code.
  • Forked from internal-looking parent > standalone.
  • Mentions prod, internal, private, secret in name → priority HIGH despite being public (may be misnamed or accidentally exposed).

Application order: when you have N assets and budget for M probes (M < N), apply asset-type priority first, then within-type priority. E.g.: 50 subdomains → probe API + admin + dev first (~15), then auth + prod-app (~20), defer marketing/content to a later pass.


9. Findings Rubric & Severity Mapping

Severity is operational, not subjective. Use these anchors:

9.1 CRITICAL

Pre-auth code execution, confirmed valid credentials, listable production data, fundamental trust violations.

Examples:

  • .git/config exposed on production webapp (full source-code disclosure).
  • /.env exposed (credentials in plaintext, often DB / cloud / API).
  • Spring Boot /actuator/env or /actuator/heapdump reachable unauthenticated.
  • Listable S3 / GCS / Azure bucket containing user data.
  • Unauthenticated POST/PUT/DELETE to a write endpoint that mutates state.
  • Open Firebase Realtime Database (https://{project}.firebaseio.com/.json returns data).
  • android:debuggable=true in a production Android app.
  • Live-validated credential (PMAK, AWS key, Anthropic/OpenAI key) with broad scope.
  • ≥10 employees compromised in a breach corpus + their tenant identified (SSO_EXPOSURE).
  • Open Elasticsearch cluster (/_cat/indices returns data).
  • Open Docker API (/v1.40/containers/json returns containers).
  • Open Redis (no AUTH; can write authorized_keys).
  • Open Kubernetes API server with anonymous-auth enabled.
  • Open kubelet on 10250 (pod exec without auth).
  • Open etcd on 2379 (cluster state and secrets).
  • BlueKeep-vulnerable RDP, EternalBlue-vulnerable SMB.
  • Citrix Netscaler / F5 BIG-IP with version-specific RCE CVE.

9.2 HIGH

Significant exposure but not yet RCE; clear path to escalation; high-value information disclosure.

Examples:

  • Public secret in a GitHub repo (PAT, AWS key, Slack token, etc.).
  • Sourcemap (.js.map) accessible — full original-source disclosure of frontend.
  • Open GraphQL introspection on production (full schema leaked → mutations to enum).
  • Subdomain takeover possible (CNAME points to unclaimed Heroku/Shopify/etc.).
  • Reflected CORS with credentials (Access-Control-Allow-Origin: <reflected> + Access-Control-Allow-Credentials: true).
  • Verb tampering: hidden DELETE/PATCH on an endpoint that publicly only allows GET.
  • Missing HSTS on a sensitive path (/login, /sso, /admin, /auth) — escalated from MED.
  • Exposed Jenkins/Tomcat-Manager/phpMyAdmin admin UI (no auth or default creds).
  • Telnet (port 23) reachable.
  • WebView with JS bridge in a mobile app (XSS → RCE potential).
  • Sensitive deep-link handler in a mobile app.
  • DMARC policy p=none on production sending domain (spoof-feasible).
  • Vendor product banner with known unpatched CVE (KEV-listed).

9.3 MEDIUM

Information disclosure, hardening gaps, brute-force exposure.

Examples:

  • Missing security headers on standard pages: HSTS, CSP.
  • Apache /server-status or /server-info reachable.
  • phpinfo() or /info.php reachable on dev/staging only.
  • Internal IP / hostname / K8s service DNS leaked in JS.
  • Schema leakage in error pages (stack traces, ORM signatures).
  • android:allowBackup=true in Android app.
  • android:usesCleartextTraffic=true in Android app.
  • Exported activity/service without android:permission protection.
  • Missing rate-limit on an API endpoint.
  • Wildcard CORS (Access-Control-Allow-Origin: *) on an API that returns user-tied data (no creds).
  • Slack webhook URL leaked.
  • Twilio Account SID leaked (without auth token).
  • SPF record permissive (+all or many includes).

9.4 LOW

Cosmetic or marginal hardening gaps.

Examples:

  • Missing X-Frame-Options.
  • Missing X-Content-Type-Options.
  • .DS_Store exposed.
  • Stripe test key leaked.
  • Firebase URL exposed (URL only, no open RTDB).
  • Certificate pinning missing in mobile app.
  • Outdated WordPress install detected (no known exploit yet).
  • BIMI not configured (brand impersonation risk only).

9.5 INFO

Worth recording, no action required immediately.

Examples:

  • Missing Referrer-Policy / Permissions-Policy.
  • Discovered /.well-known/security.txt.
  • robots.txt reveals interesting paths.
  • Private bucket exists but is locked down.
  • Domain detected in a breach corpus with 0 employee accounts.
  • DNSSEC not enabled.

9.6 Severity escalation rules

  • HSTS missing on auth/login/SSO/admin path → MED → HIGH.
  • Wildcard CORS + credentials → MED → HIGH.
  • Wildcard CORS + sensitive endpoint → LOW → MED.
  • API endpoint with score ≥70 on the interest rubric (companion skill §20) → at least HIGH.
  • Domain breach severity ≥10 employees → CRITICAL regardless of stale-data caveats.
  • Vendor product version matches CISA KEV entry → CRITICAL.
  • DMARC p=reject + SPF strict + DKIM rotated → no escalation; well-postured.

10. Bug-Bounty / Red-Team Pivot Modes

Existing investigative work (threat-actor research, doxxing investigations, attribution) operates under different posture than offensive recon. Switch posture explicitly.

Aspect Investigative Mode Offensive Recon Mode
Probing rate Slow, single-threaded, blend with normal traffic. Bursts, parallel, but rate-limited per provider.
OpSec posture Sock-puppet only, never reveal investigator. Persona may be the engagement persona; team may notify SOC.
Evidence handling Court-grade chain of custody; hashes, timestamps, screenshots. Engagement-grade; same hashing/timestamp discipline but evidence is for the client report.
Severity in scope All severity levels relevant for context. CRIT/HIGH/MED matter; LOW/INFO often dropped from exec summary.
Authorization posture Public-record / OSINT-only; no probing private resources without authorization. Written rules of engagement; explicit scope; explicit out-of-scope list.
Reporting format Narrative + sourced timeline. Per-asset findings + remediation + reproduction steps.
Stop conditions When the question is answered. When the engagement window closes OR when the report is delivered.

When you're working with the user, ask which mode they're in if it's unclear from context.

10.1 Scale-Based Tactics

Org size shapes which techniques pay off.

Small org (< 100 employees):

  • Executive accounts disproportionately matter; one CEO/CFO compromise often hands you the keys.
  • Email harvest is small enough to enumerate exhaustively (10–50 emails total).
  • Likely Microsoft 365 or Google Workspace; identity fabric is one tenant.
  • Code repos often public on GitHub under personal accounts (founders moved from solo dev).
  • Cloud presence often single-account AWS or GCP project.
  • Tactics: deep on every email + every identity-fabric finding; full LinkedIn enum; check founders' personal GitHub orgs.

Medium org (100–1K):

  • Balanced enumeration. Email list is enumerable but not exhaustive.
  • Identity fabric likely one IdP but with multiple SaaS tenants (Slack workspace, Notion org, GitHub org).
  • Mobile apps possible; check both stores.
  • Cloud presence multi-account or multi-region.
  • Tactics: full pipeline at standard depth; sample-and-deepen on each asset class; LinkedIn priority by role.

Large org (1K–10K):

  • Email enum becomes lossy (sample top roles); breach hits scale up.
  • Multi-tenant identity fabric (often Entra + Okta + multiple Auth0 customers).
  • Mobile apps, multiple Android packages from different teams.
  • Cloud presence sprawling; subsidiaries / acquisitions complicate scope.
  • Tactics: breadth-first; rely on automation for asset discovery; manual triage on findings.

Very large org (10K+) or conglomerate:

  • Brand-pivot maps before anything else: enumerate every brand domain, every subsidiary.
  • Breach corpus dominates: 10K+ employees mean significant past-breach exposure.
  • Identity fabric may differ per business unit (legal entity boundaries).
  • Tactics: scope pruning is the most important step; sampling + automation throughout; deep dive only on high-priority findings.

Cross-scale principle: the smaller the org, the more individual-account focus pays off. The larger the org, the more systemic posture findings (DMARC gaps, SSO_EXPOSURE breadth, vendor-product version sweeps) pay off.


11. Identity Fabric Mapping

An organization's IdP/SSO posture is a high-value target: compromise the identity fabric and you don't need to break into individual apps. Map it methodically.

11.1 Subdomain prefix enumeration

Probe these prefixes against the target's root domain (and any sibling brand domains discovered):

auth.{domain}
login.{domain}
sso.{domain}
idp.{domain}
iam.{domain}
identity.{domain}
accounts.{domain}
oauth.{domain}

Plus generic OIDC discovery on every alive subdomain:

{any-host}/.well-known/openid-configuration

11.2 Microsoft Entra (Azure AD)

  • OIDC metadata + tenant GUID extraction — fetch https://login.microsoftonline.com/{tenant-or-domain}/.well-known/openid-configuration. The issuer field returns a URL containing the tenant GUID (8-4-4-4-12 hex format). Tenant GUID + domain = stable tenant fingerprint.
  • getuserrealm.srfhttps://login.microsoftonline.com/getuserrealm.srf?login=<user>@<domain> returns NameSpaceType: Managed (cloud-native), Federated (on-prem ADFS / external IdP), or Unknown. Detectability: low.
  • Autodiscover v2https://autodiscover-s.outlook.com/autodiscover/metadata/json/1 POST with email; detects tenant membership.
  • GetCredentialType (deep-mode user-enum) — https://login.microsoftonline.com/common/GetCredentialType POST {"username": "<email>"}. Response indicates whether email exists in tenant. Detectability: medium. Cap attempts at 20 per tenant.

11.3 Okta

  • Org slug derivation — derive candidate slugs from subdomains + root domain stem; Okta tenants live at <slug>.okta.com (or <slug>.oktapreview.com).
  • OIDC fingerprinthttps://<slug>.okta.com/.well-known/openid-configuration.
  • /api/v1/authn user-enum (deep-mode) — POST {"username": "<email>", "password": "invalid"}. 400 vs 401 response code indicates user existence. Detectability: medium. Cap at 20 per tenant.

11.4 ADFS

  • Passive fingerprint — GET https://{domain}/adfs/idpinitiatedsignon.aspx → 200 indicates ADFS present.
  • Active mex endpoint (deep-mode) — https://{domain}/adfs/Services/Trust/mex returns SOAP metadata.

11.5 Google Workspace

  • https://{domain}/.well-known/openid-configuration — Google-hosted-domain customers expose discovery endpoints with characteristic issuer/JWKS URIs.
  • MX records pointing to *.googlemail.com / aspmx.l.google.com is a strong Google Workspace signal.

11.6 Generic OIDC (Keycloak / Auth0 / Ping / OneLogin / Duo)

  • Probe every alive subdomain for /.well-known/openid-configuration.
  • The issuer and authorization_endpoint fields fingerprint the IdP product.
  • *.auth0.com, *.onelogin.com, *.pingone.com, *.duosecurity.com patterns are characteristic.

11.7 SAML metadata

Probe these paths on every alive webapp:

/saml/metadata
/FederationMetadata/2007-06/FederationMetadata.xml
/federationmetadata/2007-06/federationmetadata.xml
/simplesaml/saml2/idp/metadata.php
/auth/saml2/metadata

SAML metadata XML contains: EntityID, signing certs, SingleSignOnService URL, NameIDFormat.

11.8 AWS account-ID extraction

  • S3 bucket region header — HEAD on a known target bucket returns x-amz-bucket-region; correlate with bucket-name entropy to infer account.
  • ARN regex in JSON / HTML responses — search for arn:aws:[a-z0-9-]+:[a-z0-9-]*:([0-9]{12}): (the 12-digit AWS account ID is the capture group).
  • AccountId property in JS / API responses — common in IAM-related error messages and CloudFormation outputs.
  • OAuth client_id leaks — Google OAuth: <digits>-<chars>.apps.googleusercontent.com; MSAL: GUID in clientId property.

11.9 Output

Each discovered IdP becomes a Service asset with attrs.product, attrs.tenant_id, attrs.discovery_endpoint. Then in Stage 4, correlate with breach data: every compromised user under a discovered tenant becomes an SSO_EXPOSURE finding (CRITICAL — see §22.3).

11.10 Microsoft 365 Deep Surface

Beyond plain Entra fingerprinting, M365 exposes a wider attack surface that's worth enumerating in depth.

Teams Federation:

  • https://login.microsoftonline.com/<target-domain>/.well-known/openid-configuration confirms tenant.
  • Teams federation status: https://teams.microsoft.com/api/mt/<region>/beta/users/<email>/externalsearchv3 (requires authenticated request from a federated tenant; useful for confirming whether external Teams chat is allowed).
  • External chat enabled = soft-attack surface (vishing, smishing, "from-IT" pretexts via Teams chat).
  • Open Federation (any tenant can chat) is the default; check whether the target restricted it.

SharePoint subdomains:

  • <target-stem>.sharepoint.com — main tenant SharePoint.
  • <target-stem>-my.sharepoint.com — OneDrive-for-Business URLs (per-user personal sites).
  • <target-stem>-admin.sharepoint.com — SharePoint admin center (auth-required, but presence confirms tenancy).
  • Where <target-stem> is derived from the company name (often the part before .com).

OneDrive personal site enumeration:

  • Per-user OneDrive URL: https://<target-stem>-my.sharepoint.com/personal/<user_email_with_underscore>/Documents/.
  • Replace @ with _ and . with _ in the email (e.g., alice@acme.comalice_acme_com).
  • Authenticated probe; useful for confirming whether the OneDrive personal site has been provisioned (which itself is a presence indicator).

M365 OAuth client_id discovery:

  • Many internal apps register OAuth client_ids in Entra. Search JS bundles, mobile-app strings, and API responses for client_id=<GUID> patterns.
  • Microsoft's well-known first-party client_ids (for Office, Graph, etc.) are documented; finding non-Microsoft GUIDs reveals custom internal apps.
  • The endpoint https://login.microsoftonline.com/<tenant>/v2.0/.well-known/openid-configuration lists supported endpoints; some tenants leave device_authorization_endpoint enabled (device-code phishing target).

Power Platform / Power Apps:

  • https://make.powerapps.com/environments (auth-required); environment IDs sometimes leak in URLs.
  • *.crm.dynamics.com (Dynamics 365 / Power Apps default URLs).
  • *.azurewebsites.net for App Service deployments.

M365 OAuth misconfig findings to look for:

  • device_authorization_endpoint enabled on common tenant (device-code phishing target) → MEDIUM (operational risk; not directly exploitable but enables attack).
  • Custom OAuth app with Public client flow enabled and broad scopes (offline_access, Mail.Read, Files.Read.All) → HIGH if app is approved for the tenant.
  • Multi-tenant OAuth app published by the target (others can consent) → check whether scopes include sensitive Graph permissions.

Detectability: all M365 endpoint probes log to Entra sign-in logs / audit logs (medium-low for fetch-only; medium for any auth attempt).


12. API & Auth-Map Methodology

Modern targets expose REST, GraphQL, and undocumented internal APIs. The OSINT goal is to enumerate them, classify them, and rank by attack interest.

12.1 Discovery paths

  • Swagger / OpenAPI — probe a 28-path wordlist (companion skill §16.1) on every alive webapp. Parse YAML/JSON; extract every endpoint (method + path).
  • GraphQL — probe a 13-path wordlist (companion skill §16.2). POST a standard introspection query. If schema returns, you have full type/query/mutation/subscription enumeration.
  • GraphQL when introspection is disabled — fall back to field-suggestion enumeration (companion skill §22.9).
  • Postman — query Postman's public universal-search endpoint with the target name; walk each matching workspace; extract requests, headers, pre-request scripts, test scripts, env vars.
  • JS-extracted endpoints — every endpoint extracted from JavaScript bundles feeds into the same classifier.
  • Mobile-extracted endpoints — every endpoint from APK static analysis feeds in via sidecar (mobile_endpoints.json).

12.2 Classification

For each endpoint, capture:

url, method, source[], auth_required, auth_type, auth_location,
rate_limited, cors_policy, sensitive_path_keywords[], schema_leaks,
verb_tampering_possible, interest_score (0..100), interest_reasons[]

How to determine each field:

  • Send OPTIONSAllow header reveals supported methods (verb tampering check).
  • Send GET without auth → 200 = auth_required=false; 401/403 = auth_required=true.
  • Capture response headers for WWW-Authenticate (auth_type), RateLimit-* / X-RateLimit-*.
  • Send a request with Origin: https://attacker.example → response Access-Control-Allow-Origin reflected = cors_policy=reflected.
  • Trigger an error → check response for stack traces, ORM hints.

12.3 Interest score (0–100)

See companion skill §20 for the full rubric. Score ≥ 70 → HIGH/CRITICAL finding with attack_path_hint in evidence.

12.4 Attack-path hints

When emitting a HIGH/CRITICAL finding, include a one-sentence attack-path hint in the evidence so the operator knows where to start exploiting. Templates in companion skill §39.


13. JavaScript Deep Analysis

For every alive webapp, scrape its JS — it's where modern frontends leak.

13.1 Script discovery

  • Parse HTML for <script src="..."> and <link rel="modulepreload" href="...">.
  • Probe a guess-path list (companion skill §16.9) for common bundlers.

13.2 Sourcemap detection (HIGH info disclosure)

  • Inline reference: regex //[#@]\s*sourceMappingURL= at end of bundle.
  • Sibling fetch: try <bundle>.map next to every discovered JS.
  • If accessible: parse the sourcemap's sources[] (leaks repo structure) and sourcesContent[] (full original source code embedded). Run the secret catalog over sourcesContent[].
  • Severity: HIGH INFO_DISCLOSURE.

13.3 Secret scanning

Run the 29+-pattern catalog (companion skill §17) over every JS body and every parsed sourcesContent[] blob. Each hit = SECRET_LEAK finding with the catalog's per-pattern severity.

13.4 Endpoint extraction

Three regex tiers (companion skill §16.10). Each unique endpoint becomes a wayback_endpoint asset and feeds into the API classifier in §12.

13.5 Internal-host leakage detection

Three patterns (companion skill §16.11): RFC1918, internal DNS suffixes, K8s service DNS. Each match = MEDIUM INFO_DISCLOSURE.

13.6 GraphQL introspection probe

When an extracted endpoint ends in /graphql or /graphiql, POST the standard introspection query. If schema returns → HIGH MISCONFIG. Then enumerate mutations and subscriptions for high-value targets.

13.7 Next.js manifest parsing

_buildManifest.js and _ssgManifest.js enumerate every Next.js page route — exposes the application's full route structure.

13.8 Budget guidelines

  • Cap webapps analyzed per engagement: 40 default, 120 with --deep.
  • Cap JS files per webapp: 40 default, 80 with --deep.
  • Cap individual JS file size: 2 MiB.
  • Per-file timeout: 10 seconds.

14. Mobile Attack Surface

Mobile apps are often the weakest link.

14.1 App discovery

  • Android — google-play-scraper. Search by company name, brand keywords, root domain stem.
  • iOS — iTunes Search API.

14.2 Ownership confidence (0–100)

See companion skill §21. Threshold: ≥70 for deep analysis.

14.3 APK acquisition + static analysis

  • Download via APKPure HTML scrape.
  • Extract via zipfile + optional androguard.
  • Pull: AndroidManifest.xml, resource strings, asset files, native .so files, classes*.dex (string-extract).

14.4 Secret scanning

Run the catalog over manifest, resources, asset files, dex string-extract output.

14.5 Backend hostname extraction

Every discovered hostname becomes a subdomain asset. Write sidecar mobile_endpoints.json for the API discovery module to consume.

14.6 Manifest misconfig findings

Manifest attribute Severity
android:debuggable="true" CRITICAL
android:allowBackup="true" (without whitelist) MEDIUM
android:usesCleartextTraffic="true" MEDIUM
Exported activity/service/receiver without android:permission MEDIUM
Sensitive deep-link handler HIGH
WebView with setJavaScriptEnabled(true) + addJavascriptInterface(...) HIGH
Certificate pinning absent LOW

14.7 Firebase canonical probe

For every Firebase project ID extracted:

  • Realtime DB — GET https://{project-id}.firebaseio.com/.json. Returns JSON tree → CRITICAL OPEN_FIREBASE_RTDB.
  • Firestorehttps://{project-id}.firebaseapp.com/.
  • Storage — corresponding GCS bucket.
  • Remote Config — only readable with extracted API key + app ID.

14.8 iOS path

  • Discovery only via iTunes Search API.
  • ITMS link extraction: https://apps.apple.com/<region>/app/<slug>/id<bundle-id>.
  • Privacy nutrition labels (App Store Connect): publicly published; useful for understanding what data the app touches.
  • Don't recommend auto-download (DMCA / DRM considerations).

15. Cloud Attack Surface

15.1 Bucket permutation

Build candidate bucket names from: target's root domain, subdomain stems, optional brand/company name. Filter generic stems unless combined with target-identifying tokens. Apply 6 prefixes × 15 suffixes (companion skill §16.8 has the lists).

15.2 Provider URL templates

S3, GCS, Azure Blob templates — see companion skill §16.8.

15.3 Probe technique

  • HEAD first. Status code 200/301 = exists; 403 = exists, private; 404 = skip.
  • If exists, GET on bucket root. Listable XML/JSON listing → CRITICAL.

15.4 Severity mapping

Outcome Severity
Bucket listable CRITICAL
Bucket exists, objects readable by direct URL but not listable HIGH
Bucket exists, ACL private INFO

15.5 Adjacent cloud signals

Extract AWS account-ID from S3 region/error responses. GCP project ID from GCS error responses. Azure tenant ID from blob URL patterns.


16. Cryptocurrency Investigation

16.1 Transaction Analysis

  • Track flows between wallets; identify clusters of related addresses; monitor large transfers.

Tools: Cielo (multi-chain), TRM (graphs), Arkham (multichain + entity labels), MetaSleuth (visual), Range (CCTP), Socketscan (EVM bridge), Pulsy (bridge aggregator), Chainalysis Horizon 2.0 (paid), Elliptic Lens.

16.2 Layer 2 / Rollup Analysis

  • zkSync Era / Polygon zkEVM — ZK proofs hide L2 details; only L1 bridge events visible.
  • Arbitrum / Optimism — batched/compressed; L2 state from L1 calldata.
  • StarkNet — Cairo VM; different address derivation.
  • Base / Blast / Scroll — OP Stack or ZK.
  • Privacy protocols on L2 — Aztec (programmable privacy), Railgun (shielded pools), Privacy Pools (Tornado Cash successor).

Methodology: start with L1 bridge events; use L2 explorers for in-rollup activity; for privacy protocols focus on timing analysis and clustering.

16.3 Cautions

  • Bridges introduce mint/burn semantics; avoid 1:1 flow assumptions.
  • MEV/aggregator paths create false "direct" trails.
  • L2 finality: optimistic rollups have 7-day challenge periods.

16.4 Wallet / Exchange / NFT Profiling

Same patterns: age + activity + connections + balance over time + linked accounts. NFTs add ownership history + metadata + connected wallets.


17. Image Analysis

17.1 Reverse Image Search

  • Google Images / Lens (Lens may require auth — use sock-puppet incognito).
  • Yandex Images (strongest for RU/East-European content).
  • Bing Image Match, TinEye, Copyseeker, Perplexity Pro.
  • Browser extensions: RevEye, Search by Image (multi-engine).
  • Picarta for AI geolocation.

17.2 Metadata (EXIF)

ExifTool, Jeffrey's Image Metadata Viewer, EXIF Viewer Pro.

17.3 Image Forensics

Forensically, FotoForensics, Bellingcat Photo Checker, Sensity AI, Exposing.ai, Adobe Content Credentials Verify, c2patool. Techniques: ELA, metadata, clone detection, noise analysis.

17.4 Geolocation Workflow

  • Foreground — signs, license plates, clothing, vegetation, weather.
  • Background — landmarks, unique buildings, mountains, water, infrastructure.
  • Map markings — flora/fauna, seasonal indicators.
  • Trial and error — Google Street View, Bing Streetside, Yandex Panorama.
  • Overpass Turbo for OpenStreetMap queries.
  • Mountain ID — PeakVisor, Peakfinder, PeakLens AR.
  • OCR — Google or Yandex OCR.
  • Video transcripts — YouTube captions improve keyword search.

17.5 Specialized

  • Fire identification — NASA FIRMS, Sentinel Hub, Global Forest Watch, Copernicus EFFIS.
  • Plane tracking — Apollo Hunter, FlightRadar, ADS-B Exchange.

18. Video Analysis

18.1 Context extraction

Signs/banners, architecture, road markings, license plates, clothing, cross-platform snippet search.

18.2 Metadata

YouTube Data Viewer (Amnesty), ExifTool on downloaded files.

18.3 Platform-specific

  • TikTok / Instagram — APIs change often; prefer platform exports; sample 1–4h cadence.
  • Bluesky AT Protocol — DID resolution via bsky.social/xrpc/com.atproto.identity.resolveHandle; identity doc via plc.directory/<did>; firehose at Firesky; SkyView for graphs. Archive early — handle migration / post deletion.
  • Mastodon / Fediverse — instance matters (jurisdiction, logging); WebFinger for discovery; FediSearch cross-instance; instance enumeration via Fediverse Observer; ActivityPub objects are JSON-LD.
  • Threads — IG-API-similar limitations.

18.4 Auditory clues

Languages, dialects, background noises (train horns, prayer calls, wildlife). Tools: Audacity, Sonic Visualiser, SoundCMD. Spectrograms for unique patterns; Shazam/SoundHound for music.

18.5 Frame extraction

FFmpeg, VLC. Stitch panoramas; stabilize panning footage (FFmpeg deshake or Blender VSE). Prefer original uploads over re-encodes.


19. Chronolocation and Time Analysis

19.1 Shadow Analysis

Tools: SunCalc, ShadeMap, Bellingcat Shadow-Finder, NOAA Solar Calculator.

19.2 Astronomical Calculations

Stellarium, SkyMap, MoonCalc to simulate sky at different times/locations.

19.3 Satellite Imagery Time

Google Earth Pro (historical imagery slider), Sentinel Hub EO Browser (Sentinel + Landsat with timelapse). Record coordinates in WKT; hash cached tilesets.


20. Threat Actor Investigation

20.1 Actor-Centric Workflow

  • Scoping — actor hypothesis (APT28/29, Turla, Sandworm; APT10/41, Mustang Panda, Volt Typhoon). Seed reports from CERTs/vendors.
  • Indicator harvesting — IOCs (domains, IPs, hashes, JA3/JA4, user-agents). Validate with passive DNS, CT logs, sandbox submissions.
  • Infrastructure mapping — pivot from CT logs (SANs, issuer, serials), shared hosting, NS reuse, registrar accounts, HTML fingerprints. Enrich with ASN/WHOIS history, RPKI/ROA, geolocation.
  • Artifact profiling — PE/ELF metadata (PDB paths, compile timestamps, Rich headers, code-signing). Cluster with SSDEEP/TLSH; YARA matches.
  • Social / procurement pivots — developer handles, code snippets, academic theses, job posts, procurement records.
  • Falsification + reporting — weigh each linkage; document alternatives; map TTPs to MITRE ATT&CK; cite sources with sections/pages.

20.2 Attribution Discipline

  • Separate capability from intent and sponsorship.
  • Rule of three: 3 weak OR 1 strong + 1 weak.
  • Prefer durable pivots (registrar accounts, code-signing certs, build path idioms) over ephemeral (resolving IPs).
  • Mark uncertainty levels; distinguish correlation from control.

20.3 Russia-Specific Pivots

  • Corporate / people — EGRUL/EGRIP (captcha-gated), Rusprofile, Kontur.Focus, zakupki.gov.ru, hh.ru.
  • Infrastructure — RU WHOIS via whois.tcinet.ru; Telegram for channels/admins/cross-posts.
  • Media — VKontakte, Odnoklassniki, Rutube; search in Russian and transliterations.

20.4 China-Specific Pivots

  • Corporate / people — gsxt.gov.cn (national enterprise credit); Tianyancha / Qichacha; ICP filings on beian.miit.gov.cn → USCC linkage.
  • Infrastructure — CNNIC WHOIS; common domestic clouds (Aliyun, Tencent, Huawei).
  • Media — Weibo, WeChat (via weixin.sogou.com), Zhihu, Bilibili, Douyin, Xiaohongshu; Chinese + Pinyin.

20.5 Infrastructure & Internet Measurement

  • IP→ASN (HE BGP Toolkit, RIPEstat, BGPView).
  • CT logs (crt.sh) for cert reuse and issuance cadence; pivot on subjects/issuers/serials.
  • URLScan for HTML fingerprints, favicons (mmh3), script hashes.
  • DNS over time (SecurityTrails PDNS, DNSDB) for subdomain churn and staging domains.

21. People & Social Media Investigation

21.1 Username Enumeration

WhatsMyName, NameCheckup, Sherlock, Maigret.

21.2 Profile Picture & Face Search

PimEyes, Exposing.ai, Azure Face API (compliance).

21.3 Social Graph & Content Analysis

Maltego, snscrape, SocialBlade. Bluesky / Mastodon: instance explorers + handle resolvers.


22. Breach × Identity Correlation

This is the highest-ROI single technique for external red teams. Execute it on every engagement.

22.1 Source stack

Source Tier Notes
Hudson Rock Cavalier FREE Infostealer-log corpus; very high signal for corp SSO.
Have I Been Pwned Free + paid Domain-wide existence + Pwned Passwords (k-anonymity).
DeHashed Paid Searchable per-record API.
IntelX Free + paid Aggregator; phonebook search.
Local breach corpus Operator-supplied Whatever's on disk.

22.2 Domain-level severity

Stat Severity
≥10 employees compromised CRITICAL
1–9 employees compromised HIGH
≥1 end-user (non-employee) compromised MEDIUM
Domain seen in breach with 0 named accounts INFO

22.3 SSO_EXPOSURE correlation

After Stage 3 has run identity-fabric mapping AND breach lookups have completed: for every discovered IdP tenant, intersect with breach corpus on the tenant's domain. Non-empty intersection → SSO_EXPOSURE finding, severity CRITICAL. Evidence: tenant ID + product + employee count + per-account source attribution.

22.4 Operational handling of stealer logs

  • Treat as PII; encrypt at rest.
  • SHA-256 every artifact; record source URL + acquisition timestamp.
  • Never paste plaintext passwords into cloud LLMs.
  • Document chain of custody.
  • For client reports: redact passwords by default; offer the operator a separate encrypted credential bundle for raw data.

23. Infrastructure OSINT

23.1 IP & Domain Discovery

Shodan, Censys, Onyphe, DNSDB.

23.2 Certificate & Passive DNS

crt.sh, SecurityTrails.

23.3 Malware & Artifact Analysis Workflow

  • Static triage — SHA-256, strings, import tables, PDB path, Rich header. Don't rely solely on AV labels.
  • Dynamic / sandbox — ANY.RUN, Hybrid Analysis, CAPE, Tria.ge.
  • Clustering — SSDEEP/TLSH, YARA matches.
  • Reporting — STIX 2.1 IOCs; ATT&CK technique IDs; reproduction steps.

23.4 Telegram / WeChat Investigation

  • Telegram — public analytics (TGStat, Telemetr, Combot); export channels with Telegram Desktop; preserve message IDs, UTC timestamps, media hashes.
  • WeChat — Sogou Weixin search; archive articles (PNG + WARC); capture __biz IDs. Expect link rot.

24. Automation & Case Management

24.1 Tools

Hunchly, Kasm Workspaces, ArchiveBox, SingleFileZ.

24.2 Cross-Module Coordination Patterns

When multiple OSINT tools (or modules) run, late-arriving outputs need to feed into earlier-running consumers. Three patterns:

  1. Sidecar JSON drops — module writes <scan>/mobile_endpoints.json or <scan>/secrets_sidecar.json; later modules read on start. No blocking.
  2. Asset-graph upserts — shared graph store; new assets trigger downstream modules via event bus.
  3. Late-binding queues — long-running enumeration emits assets continuously; enrichment workers pull.

In ad-hoc engagement: tmpdir + JSON sidecars + one-line manifest makes operations composable.

24.3 Multi-Engine Corpus Run Methodology

Running a large dork corpus across multiple engines:

  • Pluggable engines — wrap each engine (DDG, Bing, Brave, SerpAPI, Yandex, Baidu) behind a common interface. Default DDG (keyless); fall back to others when keys available.
  • Per-engine rate-limiting — each engine has its own quota and ban posture. Use a token bucket per engine.
  • Result classification — URL-signature → title-hint → snippet-regex pipeline. Output: (severity, category, confidence).
  • Dedup by URL — different engines surface duplicates.
  • Confidence rule — snippet-only = TENTATIVE.
  • Persistence — corpus DB so re-runs across engagements deduplicate.

24.4 Evidence Preservation for Offensive Engagements

  • Per-engagement / per-scan SQLite store.
  • JSONL run log with run_id, every event one line, UTC timestamps, tool versions.
  • SHA-256 every downloaded artifact.
  • PNG screenshots.
  • Raw HTTP requests/responses, capped at 2 KiB body per side.
  • Evidence served read-only from operator's machine.
  • Reproduction package: run_id + tool versions + JSONL log + asset/findings DB.

25. Synthetic Media Verification

Sensity AI, Hive Moderation, Reality Defender, Adobe Content Credentials Verify, CarNet (AI car-model identification for geolocation aid).


26. Anti-Patterns & Common Failure Modes

A non-exhaustive list of mistakes that come up often.

  • Single-source attribution. Three weak OR one strong + one weak.
  • Trusting vendor labels as ground truth. Treat labels as hypotheses.
  • Assuming favicon-hash = ownership. Shared infra, shared CMS templates, shared CDN.
  • Asserting 1:1 bridge flows. Bridges mint and burn; aggregators rebalance.
  • Treating snippet-only Google dorks as confirmed. TENTATIVE until you visit.
  • Pasting real PII / creds into cloud LLMs. Local models for sensitive analysis.
  • Mirror-imaging the threat actor. They don't think like you.
  • Attribution by IP geolocation. IPs lie; VPNs and residential proxies exist.
  • Ignoring the 7-day optimistic-rollup challenge window. L2 finality is delayed.
  • Ignoring CT-log lag. New certs take minutes-to-hours to surface; absence ≠ doesn't exist.
  • Counting Wayback snapshots as "the site at time T." Best-effort; many requests fail.
  • Trusting whoami from a discovered API. Could be a honeypot.
  • Letting the asset graph carry untyped strings. Every discovery is an asset.
  • Skipping the scope check. Ask once when in doubt.
  • Forgetting UTC. Local time creates correlation bugs.
  • Continuing to probe after a WAF block. See §6.4 — back off.
  • Skipping confidence-upgrade documentation. TENTATIVE assets need a path to CONFIRMED.
  • Treating exec-summary writing as an afterthought. See §31 — plan deliverables at engagement start, not at the end.

27. WAF / CDN Bypass & Origin Discovery

Web targets are increasingly behind Cloudflare / Akamai / Fastly / AWS CloudFront. The CDN itself is hard to attack; the origin server is often softly defended. Six techniques to find it.

27.1 DNS history pivot

The target's domain may have pointed directly at the origin IP before the CDN was deployed. Query historical passive DNS:

  • SecurityTrailshttps://api.securitytrails.com/v1/history/<domain>/dns/a (paid; the most complete).
  • RiskIQ PassiveTotal — passive DNS lookups; freemium.
  • DNSDB (Farsight) — paid; long history.
  • Validin.com — newer, has free tier.
  • Censys host search — historical IP-cert mapping.

What to look for: an IP that resolved 1–5 years ago and is not in the current CDN's published IP ranges (Cloudflare / Akamai / etc.). Cross-check the IP's current banner — if it serves the same site without the CDN, you've got the origin.

27.2 Certificate SAN pivot

Certificates often get re-issued with the same SAN list across origin and CDN. Search CT logs:

  • crt.sh ?q=%.<target.com>&exclude=expired — find certs with the target's domain in SAN.
  • Cross-reference issued certs against current CDN-fronted cert. The "extra" cert (not the CDN one) often points to origin via its CN/SAN.
  • Tools like cero (go install github.com/glebarez/cero@latest) crawl IPs and pull certs; correlate IPs whose certs include the target's hostname.

27.3 Favicon hash + JARM origin clustering

If the target has a unique favicon, the origin server still serves it (CDNs proxy but don't strip the favicon). Compute the favicon's mmh3 hash and search:

  • Shodan: http.favicon.hash:<mmh3-hash>.
  • Censys: services.http.response.favicons.hashes:<mmh3-hash>.

Returned IPs that are not in CDN ranges are origin candidates.

JARM (TLS handshake hash) works similarly: compute the target's JARM via jarm, search Shodan ssl.jarm:<jarm-hash>. Origin servers usually have a different JARM than CDNs.

27.4 Direct IP probe with Host header

If you have an origin candidate IP from steps 27.1–27.3:

curl -sk -H "Host: target.example.com" https://<candidate-IP>/

If the response matches the public site (same title, same body fingerprint) — you've found the origin. CDN-only IPs return generic CDN error pages or 403 to wrong Host.

27.5 mail.* / ftp.* / cpanel.* exception

Targets often forget to put auxiliary subdomains behind the CDN:

  • mail.<target> — often points at the actual mail server, sometimes co-located with web origin.
  • ftp.<target>, sftp.<target> — likewise.
  • cpanel.<target>, whm.<target>, webmail.<target> — shared hosting controls; same IP as web origin.
  • direct.<target>, origin.<target>, direct-connect.<target>, noproxy.<target> — ironic admin labels.
  • dev.<target>, staging.<target> — dev environments often skip CDN.

Probe each. If any resolves to a non-CDN IP, that IP often hosts the prod origin too.

27.6 Error page / misconfig leakage

When the CDN throws an error (request triggers WAF, origin is down, configuration mismatch), it sometimes leaks the origin IP in the error body:

  • Cloudflare 5xx error pages historically included a cf-ray and sometimes the underlying upstream details.
  • 502/504 from CDN tells you the origin exists but is unreachable from the CDN — useful confirmation it's not just NXDOMAIN.
  • X-Cache: MISS from <origin-host> headers.
  • HTTP response with origin headers leaking through (X-Powered-By, Server, X-Backend-Server).

27.7 Email-header bounce trick (for email-server origin)

Send an email to a non-existent address at the target. The bounce often reveals origin mail server IPs in the Received: headers — these mail servers are sometimes on the same IP / netblock as the web origin. (Use a sock-puppet email; never your real engagement persona.)

27.8 Confidence rules

  • Origin IP found via DNS history: TENTATIVE until corroborated by direct probe.
  • Origin IP corroborated by Host-header probe matching content: FIRM.
  • Origin IP corroborated by Host-header probe + cert SAN match + favicon match: CONFIRMED.

When unsure, document the hypothesis in the asset attrs — don't claim origin discovery without ≥2 corroborating signals.


28. Vulnerability Prioritization (CVE / EPSS / KEV)

A Nuclei scan can return 100+ CVEs against a target. You can't validate all of them. Prioritize by exploitability.

28.1 Data sources (companion skill §29.2 has the URLs)

  • NVD (National Vulnerability Database) — base CVE catalog, CVSS scores.
  • EPSS (Exploit Prediction Scoring System) — probability of exploit in the next 30 days, scale 0.0–1.0.
  • CISA KEV (Known Exploited Vulnerabilities) — vulnerabilities proven exploited in the wild; includes federal-agency due-by dates.
  • ExploitDB — POC code presence.
  • Metasploit module catalog — automation availability.
  • InTheWild.io — community-curated "actively exploited" tracker.
  • OpenCVE — timeline + watchlist + alerts.
  • Trickest CVE → POC mapping — community-maintained mapping of CVEs to public POCs.

28.2 Prioritization rubric

For each CVE in your list, score:

Signal Weight
Listed in CISA KEV +50 (proven exploited; treat as immediate)
EPSS score ≥ 0.7 +30
EPSS score 0.3–0.69 +15
Public Metasploit module exists +25
Public POC on ExploitDB / GitHub +15
Vendor-issued advisory + patch +10 (means the vuln is real and patchable; not always exploitable)
Auth-required vs unauth-required unauth +20, post-auth +0
Network-vector (network) vs adjacent / local network +15, adjacent +5, local +0
CVSS v3 base ≥ 9.0 +15

Total score → priority tier:

Score Tier Action
≥ 100 P0 Immediate validation; surface in engagement summary now.
70–99 P1 Validate this engagement; include in technical report.
40–69 P2 Mention in technical report; validate if time permits.
< 40 P3 List in appendix; no validation expected.

28.3 Validation discipline

  • Validate = run a single read-only proof (e.g., a version banner check, a static-page fetch that confirms the vulnerable component) without triggering exploitation.
  • Do NOT run the actual exploit unless the rules of engagement explicitly permit. Many bug bounty programs forbid PoC execution against production.
  • For client engagements: deliver the prioritized list with reproduction commands but defer actual exploitation to a follow-on assessment if needed.

28.4 Handle CVE-less findings

Many real-world findings (sourcemap exposure, open GraphQL introspection, public bucket) don't have a CVE. Score them by their independent interest score (companion skill §20 for endpoints; §40 for severity-mapping examples). Don't gate on CVE availability.


29. Phishing Infrastructure & Pretext Development

Authorized red team engagements often include phishing. The OSINT side of phishing — building the phishing-feasibility shortlist and the pretext list — is in scope here. Crafting actual phishing payloads is out of scope (operational tradecraft, separate domain).

29.1 Phishing-feasibility shortlist

For an authorized engagement, the operator typically wants three lists:

A. Already-registered typosquats — these are findings (someone is squatting; client should know).

B. Available-for-registration typosquats — these are the operator's phishing-domain shortlist for the engagement.

C. Cert-SAN impersonation patterns — domains the operator could register that would make convincing certs (e.g., acme-secure.com, acme-login.com, acme-vpn-portal.com).

Generation pattern:

  1. Run dnstwist (or equivalent) on the target's primary domain → get all variations (homoglyph, typosquatting, bit-flip, hyphenation, addition).
  2. Check WHOIS for each: registered = list A, unregistered = list B.
  3. For list C: combine target stem + plausible service words (secure, login, vpn, portal, mail, helpdesk, it, account, verify, support, password, auth, sso).

29.2 Subdomain takeover for trusted-domain phishing

If you found a takeover-able subdomain (companion skill §16.12), you can host phishing content on a subdomain of the actual target. This bypasses every brand-impersonation defense the user has.

Procedure:

  1. Confirm takeover signature on the candidate (<sub>.target.com CNAMEd to unclaimed <x>.herokuapp.com).
  2. Register the unclaimed resource (<x>.herokuapp.com on Heroku).
  3. Now <sub>.target.com serves your content.
  4. Stop and consult the engagement lead before going live with phishing content. Even in authorized engagements, the takeover step itself may need explicit client sign-off.

29.3 Email spoof feasibility

Use email security analysis (companion skill §16.14) to determine spoof feasibility:

SPF policy DMARC policy Spoof feasibility
~all (softfail) or absent p=none or absent HIGH — direct spoof of <anything>@<target> likely lands.
~all p=quarantine MEDIUM — lands in spam folder, but lands.
-all (hardfail) p=quarantine LOW — most providers reject; some still deliver to spam.
-all p=reject VERY LOW — spoof rejected by major providers. Requires lookalike domain.

If spoof is hard, fall back to lookalike (list B) or compromised-third-party (different engagement). Document the postural finding regardless.

29.4 Pretext development from OSINT

Pretexts work when they tap a target's existing context. Build pretexts from harvested OSINT.

Pretext sources:

  • Job titles + reporting structure (LinkedIn) — "Hi , this is <CFO-name>'s assistant; can you look at this finance file?"
  • Recent corporate events (press releases, news, careers page) — "We're following up on the layoffs / new acquisition; please confirm your benefits info."
  • Vendor relationships (job postings mentioning "experience with Workday/Salesforce/etc.") — "Workday password reset required by EOD."
  • Public conferences / travel (LinkedIn posts, conference pages) — "Hi , we met at ; following up on what we discussed."
  • GitHub commits / open-source work — "I saw your fix in ; want to collaborate on a related PR?"
  • Office locations (LinkedIn) — "On-site IT for the office requires a quick auth check."

Per-role pretext templates (the ones operators use most):

  • End-users (general staff): IT helpdesk password reset, Workday HR action required, Microsoft 365 storage full, package delivery confirmation.
  • Executives: finance authority (BEC-style "wire approval needed"), board materials shared, M&A NDA review, executive assistant request.
  • Developers / engineers: GitHub security alert, CI build failed, package security advisory, conference CFP follow-up.
  • HR / Finance: payroll vendor change, expense report rejection, benefits enrollment deadline, vendor invoice discrepancy.
  • IT / Security: vendor security update notification, on-call escalation, monitoring tool alert.

29.5 Operational discipline for phishing infrastructure

  • All phishing infrastructure must be approved by the engagement lead before going live.
  • Document every domain registered + every email sent (sender, recipient, subject, timestamp UTC).
  • Keep phishing infrastructure isolated from your normal recon infrastructure (different IPs, different personas, different domain registrar accounts).
  • Use dedicated payment methods for phishing-domain registration (engagement-specific corporate card; don't use personal).
  • Tear down promptly after engagement; transfer/expire all domains; close all hosting accounts.

30. Bug Bounty Submission & Responsible Disclosure

When you find an issue on a bug-bounty target (HackerOne, Bugcrowd, Intigriti, YesWeHack) or on a non-program target where you choose to disclose responsibly.

30.1 Platform-specific basics

Platform URL Notes
HackerOne hackerone.com Largest; strong scope-tracking; CVSS-based reward calc.
Bugcrowd bugcrowd.com VRT (Vulnerability Rating Taxonomy) instead of CVSS for severity.
Intigriti intigriti.com EU-strong; flexible scope models.
YesWeHack yeswehack.com EU-headquartered; growing.
HackenProof hackenproof.com Crypto/blockchain-focused programs.
Open Bug Bounty openbugbounty.org Free for sites without official programs (only XSS/SSRF disclosure).
security.txt rfc9116 Universal: every site should publish /.well-known/security.txt.

30.2 Report structure (works on any platform)

Title: [Severity] [Affected component] Brief description
   Example: [HIGH] [api.acme.com] Unauthenticated SSRF via /v1/proxy

Summary
   2-3 sentences explaining what was found and why it matters.

Steps to Reproduce
   1. Numbered, copy-pasteable.
   2. Include exact URLs, payloads, expected vs actual response.
   3. Reproduce the issue from a fresh state where possible.

Proof of Concept
   - Screenshot showing the vulnerability triggered.
   - HTTP request/response (sanitize sensitive data; redact other users' data).
   - Or short video/GIF for complex multi-step issues.

Impact
   Quantify: what data is at risk, how many users, what business functions break.
   Tie to the program's impact criteria where defined.

Severity (per program criteria)
   - CVSS v3 vector: AV:N/AC:L/PR:N/UI:N/S:U/C:H/I:H/A:H
   - Score: 9.8 (Critical)
   - Justification: <1-2 sentence reasoning>

Remediation
   - Concrete recommendation. "Validate the URL parameter against an allowlist before fetching."
   - If a quick fix: WAF rule, header check.
   - If a structural fix: refactor recommendation.

Affected component
   - URL: https://api.acme.com/v1/proxy
   - Date discovered: 2026-04-27 14:23 UTC
   - Method: HTTP GET / POST / etc.

30.3 Severity inference per program

  • HackerOne uses CVSS v3 with program-specific multipliers (some programs cap at HIGH; some pay 10x for CRITICAL).
  • Bugcrowd VRT: P1 (Critical) → P5 (Informational); maps roughly to CRIT/HIGH/MED/LOW/INFO.
  • Intigriti uses Exceptional / Critical / High / Medium / Low tiers.
  • For programs that pay flat rates per severity tier, don't inflate. Conservative severity → trust → repeat awards.

30.4 Disclosure for unprogrammed targets (CVD)

If the target has no bug bounty program but you found a real vulnerability during authorized testing (e.g., a customer's external assessment surfacing a third-party vendor's bug):

  1. Check for <target>/.well-known/security.txt — if present, follow its Contact: and Encryption: (PGP) instructions.
  2. If absent, try security@<target> then abuse@<target>.
  3. If those bounce, escalate via the registered abuse contact in WHOIS.
  4. CERT/CC (cert.org) and national CERTs (CISA, CERT-EU, JPCERT) for coordination if vendor unresponsive.
  5. Standard 90-day disclosure window before public release (Project Zero norm).

30.5 Cloud provider / SaaS-specific channels

  • AWS abuse: aws-security@amazon.com. For exposed AWS keys: also notify the account owner if discoverable via WHOIS/contact.
  • GCP: google-cloud-trust@google.com.
  • Azure: Microsoft Security Response Center (MSRC) at https://msrc.microsoft.com/.
  • GitHub: secret-scanning auto-revokes most published secrets via GitHub's partner program; if you find one, the scanning is usually faster than manual disclosure but back-up notify GitHub at security@github.com.
  • npm / PyPI / Docker Hub: each has an abuse@ or security@ channel; npm specifically auto-revokes leaked tokens via their secret scanner.

30.6 Things to avoid in reports

  • Don't include anyone else's PII / passwords / data in the report (redact).
  • Don't post the report or PoC publicly until the disclosure window expires.
  • Don't escalate via social media first (always direct channel first).
  • Don't demand a specific bounty amount; let the program decide.
  • Don't submit the same finding to multiple programs (duplicates can hurt your reputation).

31. Client Deliverable Templates

Operator-facing artifacts (asset graph, JSONL log, finding DB) are not the same as client-facing artifacts (exec summary, technical report). Build deliverables intentionally.

31.1 Executive summary template

ENGAGEMENT: <Client Name>
ASSESSMENT TYPE: External Attack Surface Assessment
ENGAGEMENT WINDOW: <start date> – <end date>
SCOPE: <one-line scope description, e.g., "All internet-facing assets of acme.com and its 3 brand domains">
LEAD: <your name / team>

----- KEY FINDINGS -----

1. [CRITICAL] <One-line title>
   Business impact: <one sentence in business language>
   Estimated remediation effort: <hours / days / weeks>
   Recommended action: <verb + object, e.g., "Rotate the exposed AWS access key and audit CloudTrail">

2. [CRITICAL] <One-line title>
   …

3. [HIGH] <One-line title>
   …

(Top 3-5 findings only; full list in technical report)

----- POSTURAL OBSERVATIONS -----

- Email security (SPF/DMARC): <2-3 sentences on posture, e.g., "DMARC is set to p=none, allowing spoof of <target>.com email; tightening to p=reject would block external spoofing.">
- Identity fabric (SSO): <2-3 sentences>
- Cloud surface (S3/GCS/Azure): <2-3 sentences>
- Mobile attack surface: <2-3 sentences if applicable>

----- AGGREGATE METRICS -----

- Assets discovered: <N> (<breakdown>)
- Findings: <N CRIT, M HIGH, P MED, Q LOW, R INFO>
- Live credentials confirmed: <N>
- Detectability of our operations: <90% low / 8% medium / 2% high>

----- RECOMMENDED NEXT STEPS -----

1. Address P0 findings in next 7 days.
2. Address P1 findings in next 30 days.
3. Schedule re-test for: <date>.
4. Consider follow-on assessments: <if applicable, e.g., authenticated app testing, internal pentest>.

31.2 Per-finding report card template

Each finding in the technical report uses this card:

═══════════════════════════════════════════════════════════
FINDING #<N>: <Title>
SEVERITY: <CRIT / HIGH / MED / LOW / INFO>
CONFIDENCE: <CONFIRMED / FIRM / TENTATIVE>
ASSET: <typed asset key>
DISCOVERED: <UTC timestamp>
═══════════════════════════════════════════════════════════

DESCRIPTION
  <2-5 sentence technical explanation>

EVIDENCE
  - URL: <where it was found>
  - Tool: <how it was discovered>
  - Screenshot: <attachment ref>
  - Raw HTTP: <sanitized capture>
  - Hash (SHA-256): <of any downloaded artifact>

REPRODUCTION
  Step 1: <command or action>
    Expected: <output>
  Step 2: …

IMPACT
  <Business-language impact statement>
  Affected systems: <list>
  Affected user populations: <if applicable>

REMEDIATION
  Immediate (within hours):
    - <action>
  Short-term (within days):
    - <action>
  Long-term (within weeks):
    - <action>

REFERENCES
  - <CVE-ID, advisory URL, OWASP top-10 link, vendor doc>

ATTACK PATH HINT
  <If applicable, the one-sentence hint from companion skill §39>

31.3 Risk translation matrix

Engineers think in CVSS. Executives think in business outcomes. Translate.

Technical finding Business-language impact
Listable S3 bucket with PII "Customer records publicly downloadable. Potential GDPR/CCPA notification trigger if accessed. Estimated cost of disclosure: 30-day notification + credit monitoring + legal review."
Exposed .env with DB credentials "Database access to all customer data. Pivots to backups, billing systems, employee PII. If exploited: full data breach scope."
Live AWS access key with admin scope "Full cloud account compromise. Attacker can spin up cryptominers, exfiltrate all data, lateral-move to connected accounts. If exploited: 6-figure cloud bill + complete environment rebuild."
Open GraphQL introspection on prod "API attack surface fully mapped by attackers. Enables more precise follow-on attacks; not directly exploitable but attacker reconnaissance is now zero-effort."
Subdomain takeover possible "Attackers can host content under your trusted domain. Phishing emails from this domain bypass brand-impersonation defenses; users will trust them."
Open Firebase Realtime Database "Mobile app's backend database is publicly readable. All user data, possibly writable. If exploited: full data breach + potential service disruption."
Missing HSTS on /login "Login pages can be downgraded to HTTP via active network attacks. Credentials potentially captured by anyone with network access (coffee shop, conference WiFi)."
DMARC p=none "Anyone on the internet can send email appearing to be from your domain. Phishing campaigns become trivially convincing for both customers and employees."
≥10 employees in breach corpus "Stolen credentials for your staff are circulating; attackers can attempt these against your SSO. Even if SSO has MFA, password reuse against other services puts those at risk."
android:debuggable=true "Mobile app can be reverse-engineered and modified by anyone. Trust boundary between app and server is undermined; backend assumes app integrity that doesn't exist."
Vendor product (Citrix/F5/Pulse) version with KEV CVE "Network appliance has a known-exploited vulnerability. Attackers are actively scanning the internet for this exact issue. Patch immediately."

31.4 Reporting cadence for engagements > 1 day

  • Day 1 EOD — short kickoff confirmation: "Stage 1 complete. Identified <N> assets so far. Initial posture: ."
  • Mid-engagement check-in — when you find your first CRITICAL OR at the 50% time mark, whichever first. Heads-up to engagement lead: "Found <CRIT description>; recommend you alert now to begin remediation prep."
  • End-of-engagement preliminary — same day as last probe: top 5 findings + plan for full report delivery within X business days.
  • Final report — formal deliverable; both exec summary + technical report.
  • Re-test offer — proactive: "We're available for a re-test in ; recommend re-testing CRITICAL/HIGH findings after remediation."

31.5 Reproduction package contents

Deliver alongside the report:

<engagement-id>-reproduction-package.zip
├── README.md                    # how to use the package
├── engagement-metadata.json     # client, dates, scope, lead
├── tools-used.txt               # tool name + version, one per line
├── run-log.jsonl                # every event during engagement
├── assets.db                    # SQLite of all discovered assets
├── findings.db                  # SQLite of all findings
├── evidence/
│   ├── screenshots/             # PNG, named by finding-id
│   ├── http/                    # raw HTTP captures (sanitized)
│   ├── downloads/               # any binary artifacts (with .sha256 alongside)
│   └── code/                    # any extracted source (sanitized)
├── re-test-script.sh            # reruns probes for the CRIT/HIGH findings
└── disclosure/                  # if applicable: bounty submissions, vendor notifications

The package is the source of truth — the report is the human-readable view. Anyone with the package can reproduce the engagement and verify findings.


32. Skill Self-Test

Drop these prompts into a fresh Claude session to verify the skill loads and behaves correctly. Pass criteria: expected sections referenced, no hallucinated content, scope-check invoked when needed.

  1. "I'm doing external recon on acme.com (in-scope bug bounty). Where do I start?" → §0, §1, §7, §7.5, §11.
  2. "How do I tell if a target uses Entra, Okta, or ADFS without active probing?" → §11.2-11.4 + companion skill §22.
  3. "What should I look for when I scrape JS bundles?" → §13.
  4. "What's the workflow for finding and analyzing a target's Android apps?" → §14.
  5. "How do I generate good cloud bucket candidates without spamming the entire internet?" → §15.
  6. "We've harvested 200 emails for the target and confirmed the org uses Entra. Highest-ROI next step?" → §22.
  7. "I found a live AWS access key in a public GitHub repo. Should I check if it works?" → §6.3 + companion skill §23.2.
  8. "Our team just acquired a 4GB stealer-log dump. How do we handle it?" → §22.4.
  9. "I found three indicators that look like APT28 — does that confirm it?" → §20.2 + §2.
  10. "Investigating a Russian cybercrime actor. What corporate/people pivots?" → §20.3.
  11. "How do I link a .cn domain to its operating company?" → §20.4 + companion skill §14.2.
  12. "List the most common mistakes red teamers make during external recon." → §26.
  13. "Run a full subdomain enum on chase.com." → §1 (scope check; should NOT run).
  14. "My subdomain marked TENTATIVE — how to upgrade?" → §2.1.
  15. "How long should a 1-week recon engagement take across the 5 stages?" → §7.6.
  16. "50 subdomains, 12 webapps, 4 IPs, 23 emails — triage order?" → §8.5 + §7.5.
  17. "Probing a 50-employee SaaS company with M365 + GitHub + AWS. Where to focus?" → §10.1 + §11.10 + §22.
  18. "Target is fully behind Cloudflare. How do I find the origin?" → §27.
  19. "100 CVEs from a Nuclei scan. How do I prioritize?" → §28.
  20. "My probes are getting 429s and a Cloudflare interstitial. What now?" → §6.4.
  21. "Authorized engagement asks for phishing-feasibility shortlist. Walk me through it." → §29.
  22. "Found unauth POST endpoint on a HackerOne target. Write me the report." → §30.2.
  23. "Write the executive summary for an engagement that found 2 CRIT, 5 HIGH, 12 MED." → §31.1 + §31.3.

33. Changelog

  • v2.1 (2026-04-27) — comprehensive expansion based on 32-test smoke-test gap analysis. Added: confidence upgrade workflows (§2.1), detection-aware probing (§6.4), time budgeting & engagement profiles (§7.6), asset-level triage rules (§8.5), scale-based tactics (§10.1), Microsoft 365 deep surface (§11.10), WAF/CDN bypass & origin discovery (§27), vulnerability prioritization (§28), phishing infrastructure & pretext development (§29), bug bounty submission & responsible disclosure (§30), client deliverable templates (§31). Self-Test section refreshed with v2.1 prompts.
  • v2.0 (2026-04-27) — major rewrite for external red-team posture. Added: 5-stage pipeline, asset-graph discipline, findings rubric, bug-bounty pivot modes, identity-fabric mapping, API & auth-map methodology, JS deep analysis, mobile attack surface, cloud attack surface, breach × identity correlation, detectability tagging, validator discipline, cross-module coordination, multi-engine corpus run, evidence preservation, anti-patterns. Original methodology content (OpSec, Crypto, Image/Video/Chrono, Threat Actor inc. RU/CN, Synthetic Media) retained.
  • v1.x — original OSINT methodology framework based on SnailSploit/offensive-checklist.
Info
Category Engineering
Name osint-methodology
Version v20260427
Size 39.43KB
Updated At 2026-05-10
Language