Collect comprehensive diagnostics from an unresponsive OCI compute instance without touching the OCI Console. When an instance reports "unavailable due to an issue with the underlying infrastructure" or cloud-init failures, you need serial console output, cloud-init logs, instance metadata, and VCN flow logs — all gathered via CLI commands into a single tar archive for root-cause analysis or support ticket attachment.
Purpose: Generate a self-contained debug bundle (.tar.gz) with all the data OCI Support will ask for, collected in under 60 seconds.
oci --version returns 3.x+, ~/.oci/config is valid (see oraclecloud-install-auth)pip install oci
ocid1.instance.oc1.{region}.aaaa...
inspect instance-console-histories, read instances, read vcn-flow-logs in the target compartmentexport INSTANCE_OCID="ocid1.instance.oc1.iad.YOUR_INSTANCE_OCID"
export COMPARTMENT_OCID="ocid1.compartment.oc1..YOUR_COMPARTMENT_OCID"
export BUNDLE_DIR="oci-debug-$(date +%Y%m%d-%H%M%S)"
mkdir -p "$BUNDLE_DIR"
oci compute instance get \
--instance-id "$INSTANCE_OCID" \
--query 'data.{state:"lifecycle-state",shape:shape,ad:"availability-domain",created:"time-created",fault:"fault-domain"}' \
--output json > "$BUNDLE_DIR/instance-metadata.json"
echo "Instance state: $(jq -r '.state' "$BUNDLE_DIR/instance-metadata.json")"
The serial console captures kernel panics, boot failures, and cloud-init output — even when SSH is unreachable:
# Create a console history capture
CAPTURE_ID=$(oci compute instance-console-history capture \
--instance-id "$INSTANCE_OCID" \
--query 'data.id' --raw-output)
echo "Console history capture: $CAPTURE_ID"
# Wait for capture to complete, then download
sleep 10
oci compute instance-console-history get-content \
--instance-console-history-id "$CAPTURE_ID" \
> "$BUNDLE_DIR/serial-console.log"
echo "Serial console: $(wc -l < "$BUNDLE_DIR/serial-console.log") lines"
If the instance agent is running, pull cloud-init logs via the run-command plugin:
COMMAND_ID=$(oci instance-agent command create \
--compartment-id "$COMPARTMENT_OCID" \
--target '{"instanceId":"'"$INSTANCE_OCID"'"}' \
--content '{"source":{"sourceType":"TEXT","text":"cat /var/log/cloud-init.log | tail -200"}}' \
--timeout-in-seconds 30 \
--query 'data.id' --raw-output)
sleep 15
oci instance-agent command get \
--command-id "$COMMAND_ID" \
--query 'data."delivery-state"' --raw-output
# Retrieve output
oci instance-agent command get-content \
--command-id "$COMMAND_ID" \
> "$BUNDLE_DIR/cloud-init.log"
oci logging search \
--search-query "search \"$COMPARTMENT_OCID\" | where source = 'vcn' | sort by datetime desc" \
--time-start "$(date -u -d '-1 hour' +%Y-%m-%dT%H:%M:%SZ)" \
--time-end "$(date -u +%Y-%m-%dT%H:%M:%SZ)" \
--output json > "$BUNDLE_DIR/vcn-flow-logs.json" 2>&1 || echo "Flow logs unavailable" > "$BUNDLE_DIR/vcn-flow-logs.json"
import oci
import datetime
import json
config = oci.config.from_file("~/.oci/config")
monitoring = oci.monitoring.MonitoringClient(config)
end = datetime.datetime.utcnow()
start = end - datetime.timedelta(hours=1)
response = monitoring.summarize_metrics_data(
compartment_id="COMPARTMENT_OCID",
summarize_metrics_data_details=oci.monitoring.models.SummarizeMetricsDataDetails(
namespace="oci_computeagent",
query='CpuUtilization[5m]{resourceId = "INSTANCE_OCID"}.max()',
start_time=start.isoformat() + "Z",
end_time=end.isoformat() + "Z",
),
)
with open("oci-debug-bundle/instance-metrics.json", "w") as f:
json.dump([item.__dict__ for item in response.data], f, indent=2, default=str)
# Add a summary header
cat > "$BUNDLE_DIR/README.txt" << EOF
OCI Debug Bundle
Instance: $INSTANCE_OCID
Captured: $(date -u)
Contents:
instance-metadata.json - Lifecycle state, shape, AD, fault domain
serial-console.log - Serial console history (boot output, kernel msgs)
cloud-init.log - Cloud-init execution log (last 200 lines)
vcn-flow-logs.json - VCN flow log entries (last 1 hour)
instance-metrics.json - CPU utilization metrics (last 1 hour)
EOF
tar -czf "$BUNDLE_DIR.tar.gz" "$BUNDLE_DIR"
echo "Bundle ready: $BUNDLE_DIR.tar.gz ($(du -h "$BUNDLE_DIR.tar.gz" | cut -f1))"
Successful completion produces:
oci-debug-YYYYMMDD-HHMMSS.tar.gz archive containing five diagnostic files| Error | Code | Cause | Solution |
|---|---|---|---|
| NotAuthorizedOrNotFound | 404 | Missing IAM policy or wrong OCID | Add allow group <grp> to inspect instance-console-histories in compartment <comp> |
| NotAuthenticated | 401 | Expired or invalid API key | Re-validate config: oci iam user get --user-id $(grep ^user ~/.oci/config | cut -d= -f2) |
| Instance agent unreachable | — | Agent not running or instance stopped | Fall back to serial console history only (Step 3) |
| TooManyRequests | 429 | Rate limited on console history API | Wait 30 seconds and retry — OCI does not return a Retry-After header |
| InternalError | 500 | OCI service issue | Retry after 60 seconds; check https://ocistatus.oraclecloud.com |
| Flow logs empty | — | VCN flow logs not enabled | Enable flow logs: VCN > Subnet > Flow Logs > Enable |
Quick one-liner to check if instance is reachable:
oci compute instance get --instance-id "$INSTANCE_OCID" \
--query 'data."lifecycle-state"' --raw-output
# Expected: RUNNING, STOPPED, or TERMINATED
Automated bundle script (saves as oci-debug.sh):
#!/bin/bash
set -euo pipefail
INSTANCE_OCID="${1:?Usage: oci-debug.sh <instance-ocid>}"
COMPARTMENT_OCID="${2:?Usage: oci-debug.sh <instance-ocid> <compartment-ocid>}"
BUNDLE="oci-debug-$(date +%Y%m%d-%H%M%S)"
mkdir -p "$BUNDLE"
oci compute instance get --instance-id "$INSTANCE_OCID" --output json > "$BUNDLE/metadata.json"
CAPTURE=$(oci compute instance-console-history capture --instance-id "$INSTANCE_OCID" --query 'data.id' --raw-output)
sleep 10
oci compute instance-console-history get-content --instance-console-history-id "$CAPTURE" > "$BUNDLE/serial.log"
tar -czf "$BUNDLE.tar.gz" "$BUNDLE" && rm -rf "$BUNDLE"
echo "Bundle: $BUNDLE.tar.gz"
After collecting the debug bundle, review oraclecloud-incident-runbook for guided triage and recovery actions, or oraclecloud-common-errors for decoding specific error codes in the serial console output.