Skills Engineering Gather OCI Instance Debug Bundle

Gather OCI Instance Debug Bundle

v20260423
oraclecloud-debug-bundle
Collects comprehensive diagnostics from an unresponsive Oracle Cloud Infrastructure (OCI) compute instance. This tool gathers critical data—including serial console logs, cloud-init logs, VCN flow logs, and CPU metrics—into a single, self-contained tar archive. Use this when an instance is failing, stuck in provisioning, or showing underlying infrastructure errors, providing support teams with all necessary information for root-cause analysis.
Get Skill
476 downloads
Overview

Oracle Cloud Debug Bundle

Overview

Collect comprehensive diagnostics from an unresponsive OCI compute instance without touching the OCI Console. When an instance reports "unavailable due to an issue with the underlying infrastructure" or cloud-init failures, you need serial console output, cloud-init logs, instance metadata, and VCN flow logs — all gathered via CLI commands into a single tar archive for root-cause analysis or support ticket attachment.

Purpose: Generate a self-contained debug bundle (.tar.gz) with all the data OCI Support will ask for, collected in under 60 seconds.

Prerequisites

  • OCI CLI installed and configuredoci --version returns 3.x+, ~/.oci/config is valid (see oraclecloud-install-auth)
  • Python 3.8+ with the OCI SDK — pip install oci
  • Compartment OCID — the compartment containing the target instance
  • Instance OCID — format: ocid1.instance.oc1.{region}.aaaa...
  • IAM policies granting inspect instance-console-histories, read instances, read vcn-flow-logs in the target compartment

Instructions

Step 1: Set Target Variables

export INSTANCE_OCID="ocid1.instance.oc1.iad.YOUR_INSTANCE_OCID"
export COMPARTMENT_OCID="ocid1.compartment.oc1..YOUR_COMPARTMENT_OCID"
export BUNDLE_DIR="oci-debug-$(date +%Y%m%d-%H%M%S)"
mkdir -p "$BUNDLE_DIR"

Step 2: Capture Instance Metadata

oci compute instance get \
  --instance-id "$INSTANCE_OCID" \
  --query 'data.{state:"lifecycle-state",shape:shape,ad:"availability-domain",created:"time-created",fault:"fault-domain"}' \
  --output json > "$BUNDLE_DIR/instance-metadata.json"

echo "Instance state: $(jq -r '.state' "$BUNDLE_DIR/instance-metadata.json")"

Step 3: Retrieve Serial Console History

The serial console captures kernel panics, boot failures, and cloud-init output — even when SSH is unreachable:

# Create a console history capture
CAPTURE_ID=$(oci compute instance-console-history capture \
  --instance-id "$INSTANCE_OCID" \
  --query 'data.id' --raw-output)

echo "Console history capture: $CAPTURE_ID"

# Wait for capture to complete, then download
sleep 10
oci compute instance-console-history get-content \
  --instance-console-history-id "$CAPTURE_ID" \
  > "$BUNDLE_DIR/serial-console.log"

echo "Serial console: $(wc -l < "$BUNDLE_DIR/serial-console.log") lines"

Step 4: Extract Cloud-Init Logs via Instance Agent

If the instance agent is running, pull cloud-init logs via the run-command plugin:

COMMAND_ID=$(oci instance-agent command create \
  --compartment-id "$COMPARTMENT_OCID" \
  --target '{"instanceId":"'"$INSTANCE_OCID"'"}' \
  --content '{"source":{"sourceType":"TEXT","text":"cat /var/log/cloud-init.log | tail -200"}}' \
  --timeout-in-seconds 30 \
  --query 'data.id' --raw-output)

sleep 15
oci instance-agent command get \
  --command-id "$COMMAND_ID" \
  --query 'data."delivery-state"' --raw-output

# Retrieve output
oci instance-agent command get-content \
  --command-id "$COMMAND_ID" \
  > "$BUNDLE_DIR/cloud-init.log"

Step 5: Collect VCN Flow Logs

oci logging search \
  --search-query "search \"$COMPARTMENT_OCID\" | where source = 'vcn' | sort by datetime desc" \
  --time-start "$(date -u -d '-1 hour' +%Y-%m-%dT%H:%M:%SZ)" \
  --time-end "$(date -u +%Y-%m-%dT%H:%M:%SZ)" \
  --output json > "$BUNDLE_DIR/vcn-flow-logs.json" 2>&1 || echo "Flow logs unavailable" > "$BUNDLE_DIR/vcn-flow-logs.json"

Step 6: Query Instance Metrics

import oci
import datetime
import json

config = oci.config.from_file("~/.oci/config")
monitoring = oci.monitoring.MonitoringClient(config)

end = datetime.datetime.utcnow()
start = end - datetime.timedelta(hours=1)

response = monitoring.summarize_metrics_data(
    compartment_id="COMPARTMENT_OCID",
    summarize_metrics_data_details=oci.monitoring.models.SummarizeMetricsDataDetails(
        namespace="oci_computeagent",
        query='CpuUtilization[5m]{resourceId = "INSTANCE_OCID"}.max()',
        start_time=start.isoformat() + "Z",
        end_time=end.isoformat() + "Z",
    ),
)

with open("oci-debug-bundle/instance-metrics.json", "w") as f:
    json.dump([item.__dict__ for item in response.data], f, indent=2, default=str)

Step 7: Package the Bundle

# Add a summary header
cat > "$BUNDLE_DIR/README.txt" << EOF
OCI Debug Bundle
Instance: $INSTANCE_OCID
Captured: $(date -u)
Contents:
  instance-metadata.json  - Lifecycle state, shape, AD, fault domain
  serial-console.log      - Serial console history (boot output, kernel msgs)
  cloud-init.log          - Cloud-init execution log (last 200 lines)
  vcn-flow-logs.json      - VCN flow log entries (last 1 hour)
  instance-metrics.json   - CPU utilization metrics (last 1 hour)
EOF

tar -czf "$BUNDLE_DIR.tar.gz" "$BUNDLE_DIR"
echo "Bundle ready: $BUNDLE_DIR.tar.gz ($(du -h "$BUNDLE_DIR.tar.gz" | cut -f1))"

Output

Successful completion produces:

  • A oci-debug-YYYYMMDD-HHMMSS.tar.gz archive containing five diagnostic files
  • Instance metadata showing current lifecycle state, shape, and fault domain
  • Serial console log capturing boot output, kernel messages, and cloud-init status
  • Cloud-init execution logs (last 200 lines) retrieved via the instance agent
  • VCN flow logs for the last hour to identify network-level issues
  • CPU utilization metrics to detect resource exhaustion

Error Handling

Error Code Cause Solution
NotAuthorizedOrNotFound 404 Missing IAM policy or wrong OCID Add allow group <grp> to inspect instance-console-histories in compartment <comp>
NotAuthenticated 401 Expired or invalid API key Re-validate config: oci iam user get --user-id $(grep ^user ~/.oci/config | cut -d= -f2)
Instance agent unreachable Agent not running or instance stopped Fall back to serial console history only (Step 3)
TooManyRequests 429 Rate limited on console history API Wait 30 seconds and retry — OCI does not return a Retry-After header
InternalError 500 OCI service issue Retry after 60 seconds; check https://ocistatus.oraclecloud.com
Flow logs empty VCN flow logs not enabled Enable flow logs: VCN > Subnet > Flow Logs > Enable

Examples

Quick one-liner to check if instance is reachable:

oci compute instance get --instance-id "$INSTANCE_OCID" \
  --query 'data."lifecycle-state"' --raw-output
# Expected: RUNNING, STOPPED, or TERMINATED

Automated bundle script (saves as oci-debug.sh):

#!/bin/bash
set -euo pipefail
INSTANCE_OCID="${1:?Usage: oci-debug.sh <instance-ocid>}"
COMPARTMENT_OCID="${2:?Usage: oci-debug.sh <instance-ocid> <compartment-ocid>}"
BUNDLE="oci-debug-$(date +%Y%m%d-%H%M%S)"
mkdir -p "$BUNDLE"
oci compute instance get --instance-id "$INSTANCE_OCID" --output json > "$BUNDLE/metadata.json"
CAPTURE=$(oci compute instance-console-history capture --instance-id "$INSTANCE_OCID" --query 'data.id' --raw-output)
sleep 10
oci compute instance-console-history get-content --instance-console-history-id "$CAPTURE" > "$BUNDLE/serial.log"
tar -czf "$BUNDLE.tar.gz" "$BUNDLE" && rm -rf "$BUNDLE"
echo "Bundle: $BUNDLE.tar.gz"

Resources

Next Steps

After collecting the debug bundle, review oraclecloud-incident-runbook for guided triage and recovery actions, or oraclecloud-common-errors for decoding specific error codes in the serial console output.

Info
Category Engineering
Name oraclecloud-debug-bundle
Version v20260423
Size 4.67KB
Updated At 2026-04-28
Language