incident-responder
sickn33/antigravity-awesome-skills
A comprehensive guide for Site Reliability Engineers (SRE) covering the entire incident lifecycle. It provides structured methodologies for initial severity assessment, establishing an incident command structure, advanced observability-driven investigation (tracing, metrics, logs), systematic troubleshooting, and crucial post-incident root cause analysis (blameless post-mortems). Ideal for managing complex, large-scale system outages.