Clay 事件响应手册

v20260311

clay-incident-runbook

指导 Clay 故障侦查、缓解及善后流程，明确响应步骤与沟通要求，帮助团队在集成故障时迅速恢复服务并记录取证。

Clay 事件响应流程手册可靠性值班

获取技能

157 次下载

概览

Clay Incident Runbook

Overview

Rapid incident response procedures for Clay-related outages.

Prerequisites

Access to Clay dashboard and status page
kubectl access to production cluster
Prometheus/Grafana access
Communication channels (Slack, PagerDuty)

Severity Levels

Level	Definition	Response Time	Examples
P1	Complete outage	< 15 min	Clay API unreachable
P2	Degraded service	< 1 hour	High latency, partial failures
P3	Minor impact	< 4 hours	Webhook delays, non-critical errors
P4	No user impact	Next business day	Monitoring gaps

Instructions

Step 1: Quick Triage

Check Clay status page, your integration health endpoint, error rate metrics, and recent pod logs.

Step 2: Follow Decision Tree

If Clay API returns errors and status.clay.com shows an incident, wait and enable fallback. If no Clay incident, check your credentials and config. If no API errors but your service is unhealthy, investigate infrastructure.

Step 3: Execute Immediate Actions

401/403: Verify API key in secrets, update if rotated, restart pods
429: Check rate limit headers, enable request queuing
500/503: Enable graceful degradation, monitor Clay status

Step 4: Communicate Status

Post to internal Slack with severity, impact, current action, and next update time. Update external status page with user-facing impact description.

For complete triage scripts, remediation commands, communication templates, and postmortem template, load the reference guide: Read(${CLAUDE_SKILL_DIR}/references/implementation-guide.md)

Output

Issue identified and categorized
Remediation applied
Stakeholders notified
Evidence collected for postmortem

Error Handling

Issue	Cause	Solution
Can't reach status page	Network issue	Use mobile or VPN
kubectl fails	Auth expired	Re-authenticate
Metrics unavailable	Prometheus down	Check backup metrics
Secret rotation fails	Permission denied	Escalate to admin

Resources

Next Steps

For data handling, see clay-data-handling.

Examples

Basic usage: Apply clay incident runbook to a standard project setup with default configuration options.

Advanced scenario: Customize clay incident runbook for production environments with multiple constraints and team-specific requirements.

信息

Category 效率工具

Name clay-incident-runbook

版本 v20260311

大小 3.23KB

Source jeremylongshore/claude-code-plugins-plus-skills

更新时间 2026-03-12