databricks-incident-runbook
jeremylongshore/claude-code-plugins-plus-skills
A comprehensive runbook designed for on-call engineers to manage Databricks incidents, outages, and major job failures. It provides structured procedures for immediate triage, decision tree guidance (covering cluster failures, code errors, and data quality issues), and steps for evidence collection and postmortem documentation, ensuring rapid and systematic incident resolution.