API Inventory And Discovery

v20260317

performing-api-inventory-and-discovery

Performs API inventory and discovery via passive traffic analysis, active scans, DNS/JavaScript enumeration, and cloud inventories to catalog documented, undocumented, shadow, zombie, and deprecated endpoints ahead of security assessments or compliance checks.

API Inventory API Discovery Cybersecurity OWASP Security Testing Shadow APIs

Get Skill

490 downloads

Overview

Performing API Inventory and Discovery

When to Use

Mapping the complete API attack surface of an organization before a security assessment
Identifying shadow APIs deployed by development teams without security review
Discovering deprecated or zombie API versions that remain accessible but unmaintained
Finding undocumented API endpoints exposed through mobile applications, SPAs, or microservices
Building an API inventory for compliance requirements (PCI-DSS, SOC2, GDPR)

Do not use without written authorization. API discovery involves scanning network infrastructure and analyzing traffic.

Prerequisites

Written authorization specifying the target domains and network ranges
Passive traffic capture capability (network tap, proxy, or cloud traffic mirroring)
Active scanning tools: Amass, subfinder, httpx, and nuclei
JavaScript analysis tools: LinkFinder, JS-Miner, or custom parsers
Access to cloud console (AWS, Azure, GCP) for API gateway inventory
Burp Suite Professional for passive API endpoint discovery

Workflow

Step 1: Passive API Discovery from Traffic Analysis

import re
import json
from collections import defaultdict

# Parse HAR file from browser developer tools or proxy
def analyze_har_for_apis(har_file_path):
    """Extract API endpoints from HTTP Archive (HAR) file."""
    with open(har_file_path) as f:
        har = json.load(f)

    api_endpoints = defaultdict(lambda: {
        "methods": set(), "content_types": set(),
        "auth_types": set(), "count": 0
    })

    for entry in har["log"]["entries"]:
        url = entry["request"]["url"]
        method = entry["request"]["method"]

        # Identify API patterns
        api_patterns = [
            r'/api/', r'/v\d+/', r'/graphql', r'/rest/',
            r'/ws/', r'/rpc/', r'/grpc', r'/json',
        ]

        if any(re.search(p, url) for p in api_patterns):
            # Normalize the URL (remove query params and IDs)
            normalized = re.sub(r'\?.*$', '', url)
            normalized = re.sub(r'/\d+(/|$)', '/{id}\\1', normalized)
            normalized = re.sub(
                r'/[0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{12}',
                '/{uuid}', normalized)

            ep = api_endpoints[normalized]
            ep["methods"].add(method)
            ep["count"] += 1

            # Detect authentication type
            for header in entry["request"]["headers"]:
                name = header["name"].lower()
                if name == "authorization":
                    if "bearer" in header["value"].lower():
                        ep["auth_types"].add("Bearer/JWT")
                    elif "basic" in header["value"].lower():
                        ep["auth_types"].add("Basic")
                elif name == "x-api-key":
                    ep["auth_types"].add("API Key")

            # Detect content type
            content_type = next(
                (h["value"] for h in entry["request"]["headers"]
                 if h["name"].lower() == "content-type"), None)
            if content_type:
                ep["content_types"].add(content_type.split(";")[0])

    print(f"Discovered {len(api_endpoints)} unique API endpoints:\n")
    for url, info in sorted(api_endpoints.items()):
        methods = ", ".join(sorted(info["methods"]))
        auth = ", ".join(info["auth_types"]) or "None"
        print(f"  [{methods}] {url}")
        print(f"    Auth: {auth} | Requests: {info['count']}")

    return api_endpoints

Step 2: Active API Endpoint Discovery

# DNS enumeration for API subdomains
amass enum -d example.com -o amass_results.txt
subfinder -d example.com -o subfinder_results.txt

# Filter for API-related subdomains
grep -iE '(api|rest|graphql|ws|gateway|backend|internal|staging|dev|v1|v2)' \
    amass_results.txt subfinder_results.txt | sort -u > api_subdomains.txt

# Check which subdomains are alive
cat api_subdomains.txt | httpx -status-code -content-length -title \
    -tech-detect -o live_apis.txt

# Probe common API paths on each live subdomain
cat api_subdomains.txt | while read domain; do
    for path in /api /api/v1 /api/v2 /graphql /swagger.json /openapi.json \
                /api-docs /docs /health /status /metrics /actuator; do
        curl -s -o /dev/null -w "%{http_code} %{url_effective}\n" \
            "https://${domain}${path}" 2>/dev/null | grep -v "^404"
    done
done

import requests
import concurrent.futures

def discover_api_endpoints(base_domains):
    """Actively probe for API endpoints across discovered domains."""

    # Common API paths to test
    API_PATHS = [
        "/api", "/api/v1", "/api/v2", "/api/v3",
        "/graphql", "/gql", "/query",
        "/rest", "/json", "/rpc",
        "/swagger.json", "/swagger/v1/swagger.json",
        "/openapi.json", "/openapi.yaml", "/api-docs",
        "/docs", "/redoc", "/explorer",
        "/.well-known/openid-configuration",
        "/health", "/healthz", "/ready",
        "/status", "/info", "/version",
        "/metrics", "/prometheus",
        "/actuator", "/actuator/health", "/actuator/info",
        "/admin", "/admin/api", "/internal",
        "/debug", "/debug/vars", "/debug/pprof",
        "/ws", "/websocket", "/socket.io",
        "/grpc", "/twirp",
    ]

    discovered = []

    def check_endpoint(domain, path):
        for scheme in ["https", "http"]:
            url = f"{scheme}://{domain}{path}"
            try:
                resp = requests.get(url, timeout=5, allow_redirects=False,
                                  verify=False)
                if resp.status_code not in (404, 502, 503):
                    return {
                        "url": url,
                        "status": resp.status_code,
                        "content_type": resp.headers.get("Content-Type", ""),
                        "server": resp.headers.get("Server", ""),
                        "size": len(resp.content),
                    }
            except requests.exceptions.RequestException:
                pass
        return None

    with concurrent.futures.ThreadPoolExecutor(max_workers=20) as executor:
        futures = {}
        for domain in base_domains:
            for path in API_PATHS:
                future = executor.submit(check_endpoint, domain, path)
                futures[future] = (domain, path)

        for future in concurrent.futures.as_completed(futures):
            result = future.result()
            if result:
                discovered.append(result)
                print(f"  [FOUND] {result['url']} -> {result['status']} ({result['content_type']})")

    return discovered

Step 3: JavaScript Source Analysis for API Endpoints

import re
import requests

def extract_apis_from_javascript(js_urls):
    """Extract API endpoints from JavaScript source files."""
    api_pattern = re.compile(
        r'''(?:['"`])((?:/api/|/v[0-9]+/|/graphql|/rest/)[^'"`\s<>{}]+)(?:['"`])''',
        re.IGNORECASE
    )
    url_pattern = re.compile(
        r'''(?:['"`])(https?://[a-zA-Z0-9._-]+(?:\.[a-zA-Z]{2,})+(?:/[^'"`\s<>{}]*)?)(?:['"`])'''
    )
    fetch_pattern = re.compile(
        r'''(?:fetch|axios|ajax|XMLHttpRequest|\.get|\.post|\.put|\.delete|\.patch)\s*\(\s*(?:['"`])([^'"`]+)'''
    )

    all_endpoints = set()

    for js_url in js_urls:
        try:
            resp = requests.get(js_url, timeout=10)
            content = resp.text

            # Extract relative API paths
            for match in api_pattern.findall(content):
                all_endpoints.add(("relative", match))

            # Extract absolute URLs
            for match in url_pattern.findall(content):
                if any(kw in match.lower() for kw in ["/api", "/v1", "/v2", "graphql"]):
                    all_endpoints.add(("absolute", match))

            # Extract from fetch/axios calls
            for match in fetch_pattern.findall(content):
                all_endpoints.add(("fetch", match))

        except requests.exceptions.RequestException:
            pass

    print(f"\nAPI endpoints discovered from JavaScript ({len(all_endpoints)}):")
    for source, endpoint in sorted(all_endpoints):
        print(f"  [{source}] {endpoint}")

    return all_endpoints

# Find JavaScript files from the target domain
def find_js_files(domain):
    """Discover JavaScript files from a web application."""
    resp = requests.get(f"https://{domain}", timeout=10)
    js_files = re.findall(r'src=["\']([^"\']+\.js[^"\']*)', resp.text)
    full_urls = []
    for js in js_files:
        if js.startswith("http"):
            full_urls.append(js)
        elif js.startswith("//"):
            full_urls.append(f"https:{js}")
        elif js.startswith("/"):
            full_urls.append(f"https://{domain}{js}")
    return full_urls

Step 4: Cloud API Gateway Inventory

import boto3

def inventory_aws_apis():
    """Inventory all APIs in AWS API Gateway."""
    apigw = boto3.client('apigateway')
    apigwv2 = boto3.client('apigatewayv2')

    apis = []

    # REST APIs (API Gateway v1)
    rest_apis = apigw.get_rest_apis()
    for api in rest_apis['items']:
        resources = apigw.get_resources(restApiId=api['id'])
        stages = apigw.get_stages(restApiId=api['id'])

        for stage in stages['item']:
            for resource in resources['items']:
                for method in resource.get('resourceMethods', {}).keys():
                    apis.append({
                        "type": "REST",
                        "name": api['name'],
                        "stage": stage['stageName'],
                        "path": resource['path'],
                        "method": method,
                        "url": f"https://{api['id']}.execute-api.{boto3.session.Session().region_name}.amazonaws.com/{stage['stageName']}{resource['path']}",
                        "created": str(api.get('createdDate', '')),
                    })

    # HTTP APIs (API Gateway v2)
    http_apis = apigwv2.get_apis()
    for api in http_apis['Items']:
        routes = apigwv2.get_routes(ApiId=api['ApiId'])
        stages = apigwv2.get_stages(ApiId=api['ApiId'])

        for route in routes['Items']:
            apis.append({
                "type": "HTTP",
                "name": api['Name'],
                "route": route['RouteKey'],
                "api_id": api['ApiId'],
                "protocol": api['ProtocolType'],
            })

    print(f"\nAWS API Inventory ({len(apis)} endpoints):")
    for api in apis:
        print(f"  [{api['type']}] {api.get('name')} - {api.get('method', '')} {api.get('path', api.get('route', ''))}")

    return apis

Step 5: API Version and Shadow API Detection

def detect_shadow_and_zombie_apis(discovered_endpoints, documented_endpoints):
    """Compare discovered APIs against documented inventory."""

    # Normalize endpoints for comparison
    def normalize(ep):
        ep = re.sub(r'/v\d+/', '/vX/', ep)
        ep = re.sub(r'/\d+', '/{id}', ep)
        return ep.lower().rstrip('/')

    documented_normalized = {normalize(ep) for ep in documented_endpoints}

    shadow_apis = []  # Discovered but not documented
    zombie_apis = []  # Old versions still accessible

    for ep in discovered_endpoints:
        normalized = normalize(ep["url"])

        if normalized not in documented_normalized:
            # Check if it is an old version of a documented API
            if re.search(r'/v[0-9]+/', ep["url"]):
                zombie_apis.append(ep)
            else:
                shadow_apis.append(ep)

    print(f"\nShadow APIs (undocumented): {len(shadow_apis)}")
    for api in shadow_apis:
        print(f"  [SHADOW] {api['url']} -> {api['status']}")

    print(f"\nZombie APIs (deprecated versions): {len(zombie_apis)}")
    for api in zombie_apis:
        print(f"  [ZOMBIE] {api['url']} -> {api['status']}")

    # Check if zombie APIs lack security controls
    for api in zombie_apis:
        resp = requests.get(api["url"], timeout=5)
        if resp.status_code not in (401, 403):
            print(f"  [CRITICAL] Zombie API accessible without auth: {api['url']}")

    return shadow_apis, zombie_apis

Key Concepts

Term	Definition
Shadow API	An API deployed by a development team without going through the official API management or security review process
Zombie API	A deprecated or old API version that remains accessible and running but is no longer maintained or monitored
API Inventory	A comprehensive catalog of all APIs in an organization including endpoint URLs, owners, versions, authentication methods, and data classifications
Improper Inventory Management	OWASP API9:2023 - failure to maintain an accurate API inventory, leading to unmonitored and unprotected API endpoints
Attack Surface	The total set of API endpoints, methods, and parameters that an attacker can potentially interact with
API Sprawl	The uncontrolled proliferation of APIs in an organization, often resulting from microservice adoption without centralized governance

Tools & Systems

Amass: OWASP tool for attack surface mapping through DNS enumeration, web scraping, and API discovery
httpx: Fast HTTP probing tool for validating discovered domains and identifying live API endpoints
nuclei: Template-based scanner for detecting exposed API documentation, debug endpoints, and misconfigured services
Swagger UI Detector: Tool for finding exposed Swagger/OpenAPI documentation endpoints across the organization
Akto: API security platform that discovers APIs through traffic analysis and maintains an automated inventory

Common Scenarios

Scenario: Enterprise API Attack Surface Assessment

Context: A large enterprise has 200+ development teams using microservices. The security team suspects many undocumented APIs are exposed to the internet. A comprehensive API inventory is needed for a security audit.

Approach:

DNS enumeration discovers 340 subdomains, 45 contain API-related keywords (api, rest, gateway, backend)
Active probing of all subdomains with API path wordlist discovers 127 live API endpoints
JavaScript analysis of the main web application reveals 34 API endpoints, 8 of which point to undocumented internal services
AWS API Gateway inventory shows 67 REST APIs and 23 HTTP APIs across 12 accounts
Cross-referencing against the official API catalog: 31 shadow APIs (undocumented), 14 zombie APIs (deprecated versions)
3 zombie APIs have no authentication, exposing customer data through endpoints that were supposed to be decommissioned
2 shadow APIs expose internal admin functions to the internet without authorization

Pitfalls:

Only checking documented API endpoints and missing shadow APIs deployed outside the API gateway
Not scanning JavaScript bundles where frontend applications hardcode API endpoint URLs
Missing APIs behind non-standard ports or subpaths
Not checking for multiple API versions where older versions may lack security controls
Assuming all APIs go through the API gateway when some may be directly exposed

Output Format

## API Inventory and Discovery Report

**Organization**: Example Corp
**Assessment Date**: 2024-12-15
**Domains Scanned**: 340

### Summary

| Category | Count |
|----------|-------|
| Total APIs Discovered | 127 |
| Documented APIs | 82 |
| Shadow APIs (undocumented) | 31 |
| Zombie APIs (deprecated) | 14 |
| APIs Without Authentication | 8 |
| APIs Exposing Sensitive Data | 5 |

### Critical Findings

1. **Zombie API**: api-v1.example.com/api/v1/users - Deprecated in 2022,
   still accessible, no authentication required, returns full user data
2. **Shadow API**: internal-tools.example.com/api/admin - Admin functions
   exposed to internet without authorization
3. **Exposed Documentation**: 12 Swagger UI instances accessible publicly,
   revealing full API schema and endpoint details

Info

Category Development

Name performing-api-inventory-and-discovery

Version v20260317

Size 14.35KB

Source mukul975/Anthropic-Cybersecurity-Skills

Updated At 2026-03-18