技能 产品商业 Navan平台故障响应手册

Navan平台故障响应手册

v20260423
navan-incident-runbook
本手册是用于诊断和解决Navan差旅平台关键故障的结构化流程指南。它指导用户进行严重性分级(P1-P4),利用内置AI助手和API健康检查(如使用curl/jq),系统性地处理预订、OAuth令牌和费用同步等故障。确保故障处理流程规范化,最大限度地减少停机时间。
获取技能
474 次下载
概览

Navan Incident Runbook

Overview

Structured incident response for Navan travel platform disruptions. Navan uses raw REST APIs with OAuth 2.0 — there is no SDK and no sandbox. All diagnostic commands run against production.

Prerequisites

  • Access to Navan admin console (Admin > Travel admin)
  • OAuth credentials (client_id, client_secret) stored in your secret manager
  • Familiarity with Navan's Ava AI assistant (in-app chat, first-line support)
  • curl and jq for API health probing

Instructions

Step 1 — Classify Severity

Severity Condition Response Time Escalation
P1 — Critical API fully down, all bookings failing Immediate Navan support + Ava AI + internal exec
P2 — High Degraded performance, partial failures 15 minutes Navan support + internal travel admin
P3 — Medium Intermittent errors, expense sync delays 1 hour Internal triage, monitor
P4 — Low Cosmetic issues, non-blocking warnings Next business day Internal backlog

Step 2 — Triage with Ava AI

Before manual debugging, use Navan's built-in AI assistant:

  1. Open the Navan app or visit app.navan.com
  2. Click the Ava chat icon (bottom-right)
  3. Describe the issue — Ava can check booking status, rebook flights, and surface known outages
  4. If Ava cannot resolve, proceed to API health checks

Step 3 — API Health Check

# Test OAuth authentication
AUTH_RESPONSE=$(curl -s -w "\n%{http_code}" \
  -X POST "https://api.navan.com/ta-auth/oauth/token" \
  -H "Content-Type: application/x-www-form-urlencoded" \
  -d "grant_type=client_credentials&client_id=$NAVAN_CLIENT_ID&client_secret=$NAVAN_CLIENT_SECRET")

HTTP_CODE=$(echo "$AUTH_RESPONSE" | tail -1)
BODY=$(echo "$AUTH_RESPONSE" | sed '$d')

echo "Auth endpoint: HTTP $HTTP_CODE"
echo "$BODY" | jq '{token_present: (.access_token != null), error: .error}' 2>/dev/null
# Test booking retrieval (requires valid token)
TOKEN=$(echo "$BODY" | jq -r '.access_token')
curl -s -w "\nHTTP %{http_code}" \
  -H "Authorization: Bearer $TOKEN" \
  "https://api.navan.com/v1/bookings?page=0&size=50" | tail -1

Step 4 — Incident-Specific Playbooks

Booking API Failure (P1/P2):

  1. Confirm via API health check above — look for HTTP 500/503
  2. Check if the issue is flight-specific or hotel-specific by testing both trip types
  3. Direct travelers to Navan mobile app or phone support as fallback
  4. Queue failed booking requests for retry with exponential backoff

OAuth Token Failure (P1):

  1. Test with curl against /ta-auth/oauth/token — expect HTTP 200 with access_token field
  2. If HTTP 401: credentials may be rotated; check Admin > Integrations
  3. If HTTP 403: API access may be revoked; contact Navan admin
  4. Re-request a token via POST /ta-auth/oauth/token with grant_type=client_credentials

Expense Sync Failure (P2/P3):

  1. Check the Expense Transaction API status — this endpoint requires separate enablement
  2. Verify your Fivetran/Airbyte connector status if using a data pipeline
  3. Check TRANSACTION table freshness — incremental sync may be lagging
  4. Validate that expense categories map correctly to your ERP

Flight Cancellation / Disruption (P2):

  1. Use Ava AI to check rebooking options — Ava handles most rebookings automatically
  2. Verify traveler's profile has correct loyalty program numbers
  3. Check /v1/bookings for the affected booking UUID
  4. Coordinate with travel admin for policy exception approvals if rebooking exceeds budget

Step 5 — Escalation Path

Level Contact When
L1 Ava AI assistant Always start here
L2 Navan Help Center Ava cannot resolve; app.navan.com/app/helpcenter
L3 Navan account manager P1/P2 unresolved after 30 minutes
L4 Internal executive sponsor Business-critical travel disruption

Step 6 — Post-Incident Review

After resolution, create a post-incident record:

cat > "incident-$(date +%Y%m%d-%H%M%S).md" <<'INCEOF'
## Incident Report
- **Severity**: P?
- **Duration**: Start — End
- **Impact**: Number of affected travelers/bookings
- **Root Cause**: (API outage / credential issue / sync failure / ...)
- **Resolution**: Steps taken
- **Prevention**: Changes to avoid recurrence
INCEOF

Output

  • Severity classification for the incident
  • API health check results confirming platform vs local issue
  • Executed playbook steps with outcomes
  • Escalation actions taken with timestamps
  • Post-incident report document

Error Handling

HTTP Code Meaning Runbook Action
401 Authentication failed Check credential rotation; re-authenticate
403 Access denied Verify API integration is enabled in admin
429 Rate limited Back off; check Retry-After header value
500 Server error Navan-side issue; escalate to L2/L3
502/503 Service unavailable Platform outage; escalate immediately

Examples

Quick API status check during an incident:

# One-liner health probe
curl -s -o /dev/null -w "Auth: %{http_code} (%{time_total}s)\n" \
  -X POST "https://api.navan.com/ta-auth/oauth/token" \
  -H "Content-Type: application/x-www-form-urlencoded" \
  -d "grant_type=client_credentials&client_id=$NAVAN_CLIENT_ID&client_secret=$NAVAN_CLIENT_SECRET"

Resources

Next Steps

  • Use navan-debug-bundle to collect full diagnostic data for support tickets
  • Use navan-prod-checklist to harden your integration against future incidents
  • Use navan-common-errors for detailed error code interpretation
信息
Category 产品商业
Name navan-incident-runbook
版本 v20260423
大小 4.1KB
更新时间 2026-04-28
语言