Step-by-step procedures for responding to Langfuse-related incidents, from initial triage through resolution and post-incident review.
Run the quick diagnosis script: check Langfuse status page, API connectivity, auth test, and application metrics.
| Symptom | Likely Cause | Action |
|---|---|---|
| No traces appearing | SDK not flushing | Check shutdown handlers, reduce batch size |
| 401/403 errors | Auth issue | Verify keys match project, check rotation |
| High latency | Rate limits | Increase batching, implement circuit breaker |
| Missing data | Partial failures | Ensure spans end in finally blocks |
| Complete outage | Langfuse service | Enable fallback, queue locally |
Follow the section-specific resolution steps. For outages, activate graceful degradation mode.
Verify traces appearing, check error rates normalized, schedule post-mortem for P1/P2.
See detailed implementation for advanced patterns.
| Severity | Description | Response Time |
|---|---|---|
| P1 | Complete outage | 15 min |
| P2 | Degraded, partial loss | 1 hour |
| P3 | Slow/delayed traces | 4 hours |
| P4 | Minor issues | 24 hours |
| Level | Contact | When |
|---|---|---|
| L1 | On-call engineer | All incidents |
| L2 | Platform team lead | P1/P2 unresolved 30min |
| L3 | Langfuse support | Service-side issues |