Monitor Lindy AI agent execution health, task completion rates, step-level failures, trigger frequency, and credit consumption. Lindy provides built-in task history in the dashboard. External observability requires webhook callbacks, the Task Completed trigger, and application-side metrics collection.
| Signal | Source | Why It Matters |
|---|---|---|
| Task completion rate | Tasks tab / callback | Measures agent reliability |
| Task duration | Task detail view | Tracks performance over time |
| Step failure rate | Task detail (red steps) | Identifies broken actions |
| Credit consumption | Billing dashboard | Budget tracking |
| Trigger frequency | Task count over time | Detects trigger storms |
| Agent error rate | Failed tasks / total tasks | Overall health indicator |
Lindy's Tasks tab provides per-agent monitoring:
Use Lindy's built-in Task Completed trigger to build an observability agent:
Monitoring Agent:
Trigger: Task Completed (from Production Support Agent)
Condition: "Go down this path if the task failed"
→ Action: Slack Send Channel Message to #ops-alerts
Message: "Support Agent task failed: {{task.error}}"
Condition: "Go down this path if task duration > 30 seconds"
→ Action: Slack Send Channel Message to #ops-alerts
Message: "Support Agent slow: {{task.duration}}s"
Configure agents to call your metrics endpoint on task completion:
// metrics-collector.ts — Receive agent metrics via HTTP Request action
import express from 'express';
import { Counter, Histogram, Gauge } from 'prom-client';
const app = express();
app.use(express.json());
// Prometheus metrics
const taskCounter = new Counter({
name: 'lindy_tasks_total',
help: 'Total Lindy agent tasks',
labelNames: ['agent', 'status'],
});
const taskDuration = new Histogram({
name: 'lindy_task_duration_seconds',
help: 'Lindy task execution duration',
labelNames: ['agent'],
buckets: [1, 2, 5, 10, 30, 60, 120],
});
const creditGauge = new Gauge({
name: 'lindy_credits_consumed',
help: 'Credits consumed per task',
labelNames: ['agent'],
});
// Receive metrics from Lindy HTTP Request action
app.post('/lindy/metrics', (req, res) => {
const auth = req.headers.authorization;
if (auth !== `Bearer ${process.env.LINDY_WEBHOOK_SECRET}`) {
return res.status(401).json({ error: 'Unauthorized' });
}
const { agent, status, duration, credits } = req.body;
taskCounter.inc({ agent, status });
taskDuration.observe({ agent }, duration);
creditGauge.set({ agent }, credits);
res.json({ recorded: true });
});
// Prometheus scrape endpoint
app.get('/metrics', async (req, res) => {
res.set('Content-Type', 'text/plain');
res.send(await register.metrics());
});
Lindy agent configuration: Add an HTTP Request action as the last step in each monitored agent:
https://monitoring.yourapp.com/lindy/metrics
{
"agent": "support-bot",
"status": "{{task.status}}",
"duration": "{{task.duration}}",
"credits": "{{task.credits}}"
}
Key panels for a Lindy monitoring dashboard:
| Panel | Metric | Type |
|---|---|---|
| Task Success Rate | rate(lindy_tasks_total{status="completed"}[1h]) |
Percentage gauge |
| Task Failures | rate(lindy_tasks_total{status="failed"}[1h]) |
Counter |
| Duration p50/p95 | histogram_quantile(0.95, lindy_task_duration_seconds) |
Time series |
| Credit Burn Rate | rate(lindy_credits_consumed[1h]) |
Counter |
| Active Agents | Count of agents with tasks in last 24h | Stat panel |
| Trigger Frequency | Tasks per hour by agent | Bar chart |
# Prometheus alert rules
groups:
- name: lindy
rules:
- alert: LindyAgentHighFailureRate
expr: rate(lindy_tasks_total{status="failed"}[30m]) > 0.1
for: 10m
labels:
severity: warning
annotations:
summary: "Lindy agent {{ $labels.agent }} failure rate > 10%"
- alert: LindyAgentDown
expr: absent(lindy_tasks_total{agent="support-bot"}[1h])
for: 30m
labels:
severity: critical
annotations:
summary: "No tasks from support-bot in 1 hour"
- alert: LindyCreditsBurnRate
expr: rate(lindy_credits_consumed[1h]) * 720 > 5000
for: 15m
labels:
severity: warning
annotations:
summary: "Credit burn rate will exhaust monthly budget"
Use Lindy Evals to catch quality regressions:
Score 1 (pass) if the response is professional, accurate, and under 200 words.
Score 0 (fail) if the response contains hallucinations or exceeds 200 words.
Note: Eval runs consume credits but do NOT execute real actions (safe simulation).
| Level | What You Monitor | How |
|---|---|---|
| L0 | Nothing | Manual dashboard checks |
| L1 | Task failures | Task Completed trigger + Slack alerts |
| L2 | Success rate + duration | HTTP Request action + Prometheus |
| L3 | Credit burn + quality | Evals + Grafana dashboards |
| L4 | Automated remediation | Monitoring agent auto-restarts failed agents |
| Issue | Cause | Solution |
|---|---|---|
| Metrics endpoint down | Monitoring server crashed | Alert on scrape failures |
| Task Completed not firing | Monitoring agent paused | Check monitoring agent is active |
| Credit burn alert false positive | Legitimate traffic spike | Tune alert threshold |
| Eval scores dropping | Prompt drift or model change | Review recent prompt/model changes |
Proceed to lindy-incident-runbook for incident response procedures.