Every loop has time complexity and space complexity. A loop that calls a paid API has a third: dollars per execution. The model tracks the first two automatically. It does not track the third, so it ships code where a single bug — a retry without bound, a stream reconnect storm, an agent that re-queues itself, a webhook that fires the same job twice — silently spends real money.
The canonical incident: developer writes a Fal.ai image-generation loop. Loop "obviously terminates" because it iterates over a fixed list. The list comes from a callback that fires on every Inngest retry. Each retry doubles the list. By morning, the bill is $200. Tests pass. Code review passed. The bug is not in the loop body. The bug is that no one stated the wallet invariant.
runaway-guard fixes this. State the max calls. State the max dollars per run. State the max dollars per day. Set the same caps in the provider dashboard so a code bug cannot bypass them. Then write the code.
Violating the letter of these rules is violating the spirit of the skill. "I'm only testing locally" is the exact rationalization that ships the $200 bill — local code hits the same paid API as production.
Use runaway-guard when:
@fal-ai/*, fal-client, @anthropic-ai/sdk, anthropic, openai, replicate, elevenlabs, together-ai, groq-sdk, cohere-ai, @mistralai/*.NO CALL TO A PAID API WITHOUT A WRITTEN $-CAP AT BOTH THE CODE AND PROVIDER LEVEL
A cap only in code can be bypassed by a bug in that code. A cap only at the provider can be hit during normal usage and degrade the product. You need both. If you cannot state both in one sentence each, you have not designed the call site — you have written a wish.
Every call site gets a one-line cost contract. Before writing any paid-API call, state in one sentence:
max_calls × unit_cost — compute it, don't estimate.Examples:
If you cannot fill in all three numbers, you have not designed the call site.
Every loop calling a paid API gets an explicit iteration bound, not just a termination argument. invariant-guard requires a termination measure. runaway-guard requires the bound to be a concrete integer in code, not just "eventually terminates":
// ❌ Terminates in theory. Bills $200 in practice.
while (job.status !== 'done') {
await fal.run(...);
}
// ✅ Concrete bound — wallet invariant explicit.
const MAX_CALLS = 20;
for (let i = 0; i < MAX_CALLS && job.status !== 'done'; i++) {
await fal.run(...);
}
if (job.status !== 'done') throw new Error('exceeded MAX_CALLS budget');
Every retry path is bounded by attempts AND total elapsed cost, not by time alone. Exponential backoff with no attempt cap is a wallet attack on yourself.
Every fan-out path declares a concurrency limit. Parallel calls multiply cost per wall-clock second. State the limit in code, at the queue (Inngest concurrency), and at the provider where supported:
concurrency: { limit: N } on the function.p-limit, semaphore, or batched Promise.all chunks — never an unbounded Promise.all(items.map(...)) on a paid API.Every paid API has a matching provider-side hard cap, configured out of band. Defense in depth: if the code is wrong, the provider stops the bleeding. Document the cap in the same file as the call site so future readers know it exists.
| Provider | Where to set the hard cap |
|---|---|
| Fal.ai | Dashboard → Billing → Spend Limit (e.g. $50/day). Hard stop on exceed. |
| Anthropic | Console → Workspaces → Workspace Budget with hard limit. Per-workspace, per-month. |
| OpenAI | Org → Settings → Usage limits (org-level hard limit blocks requests). ⚠️ Per-project monthly budgets are soft thresholds only — they alert but do not block. For a real hard cap use the org-level Usage limit, a billing gateway, or your own fail-closed budget check. |
| Replicate | Account → Billing → Spend limit. Per account. |
| ElevenLabs | Workspace → Usage limits per workspace / API key. |
| Together / Groq / Cohere / Mistral | Each has a billing dashboard with a monthly spend cap — set it before first deploy, not after. |
No hard cap, no call site. Set the cap before the first request, not after the first incident.
Idempotency keys on every mutating or charging call. A webhook that fires twice should bill once. Without an idempotency key, retry policies you cannot see (load balancer, framework, gateway) silently double-charge.
Make the "amplifier" patterns explicit and forbidden by default. These are the shapes that turn small bugs into large bills:
while (!done) await poll() with no maxWaitMs is a wallet leak.singleflight / request coalescing.Before producing code that calls a paid API, your message must contain — in this order:
max_calls × unit_cost. Compute it.If any of 1–7 is missing, do not emit code.
This is the canonical case. Observe how each rule would have caught it.
What shipped:
// inngest function: generate images for a campaign
export const generateCampaign = inngest.createFunction(
{ id: 'gen-campaign' }, // ❌ no concurrency limit
{ event: 'campaign/start' },
async ({ event, step }) => {
const prompts = await step.run('fetch', () => fetchPrompts(event.data.id));
// ❌ unbounded fan-out, no per-run cap, no idempotency
await Promise.all(prompts.map(p => fal.run('fal-ai/flux-pro', { input: { prompt: p } })));
}
);
What went wrong. fetchPrompts had a bug: on a transient DB error it returned the partial list plus the previous run's list appended. Inngest retried the function at its default retry count (multiple attempts in addition to the initial one). Each retry re-ran fetchPrompts, each retry doubled the list (40 → 80 → 160 → 320 prompts). Promise.all fanned all 320 out concurrently. At $0.05/image: $16/retry × triangular growth across overnight retries on the schedule = ~$200 by morning.
Why each rule would have caught it.
| Rule | Catch |
|---|---|
| 1. Cost contract | Forces writing "max calls per run". The number prompts.length is not a known integer → rule fails → write a cap. |
| 2. Concrete iteration bound | Promise.all(prompts.map(...)) has no integer bound → rule fails → wrap in chunks with MAX_IMAGES_PER_RUN. |
| 3. Retry policy | Inngest default retries × no idempotency key = double-billed work. Rule forces an idempotency key per (campaignId, promptHash). |
| 4. Concurrency limit | Promise.all is unbounded concurrency. Rule forces p-limit(3) and Inngest concurrency: { limit: 3 }. |
| 5. Provider hard cap | Fal Spend Limit $50/day would have stopped the bleeding at $50 instead of $200. |
| 7. Amplifier audit | "Self-rescheduling jobs" — Inngest's retry IS self-rescheduling. The audit forces you to consider it. |
The fix that survives the protocol:
// cost contract:
// provider: Fal flux-pro @ $0.05/image
// max calls per run: 50
// max $ per run: $2.50
// provider hard cap: $50/day (set in Fal dashboard 2026-05-22)
// concurrency: 3 (Inngest + p-limit, matching)
// idempotency: key = `${campaignId}:${sha1(prompt)}` — provider-side dedup window 24h
const MAX_IMAGES_PER_RUN = 50;
const limit = pLimit(3);
export const generateCampaign = inngest.createFunction(
{
id: 'gen-campaign',
concurrency: { limit: 3 },
retries: 2, // attempts = 1 + retries
},
{ event: 'campaign/start' },
async ({ event, step }) => {
const prompts = await step.run('fetch', () => fetchPrompts(event.data.id));
if (prompts.length > MAX_IMAGES_PER_RUN) {
throw new NonRetriableError(
`prompt count ${prompts.length} exceeds MAX_IMAGES_PER_RUN=${MAX_IMAGES_PER_RUN}`
);
}
await Promise.all(prompts.map(p => limit(() => step.run(
`img:${event.data.id}:${sha1(p)}`, // idempotency key
() => fal.run('fal-ai/flux-pro', { input: { prompt: p } })
))));
}
);
Note: the bug in fetchPrompts is still there. The protocol does not fix that bug — it makes the bug cost $2.50 instead of $200 while you find it. That is the entire point of defense in depth.
| Pattern | Wallet invariant to write | Hard cap to set |
|---|---|---|
| Fan-out over a list of items | total_cost ≤ list_len × unit_cost ≤ MAX_$_PER_RUN |
provider daily spend limit |
| Retry on transient error | total_cost ≤ attempts × unit_cost, attempts ≤ 5 |
provider daily spend limit; alert at 50% |
| Agent loop ("ask model what to do next") | total_cost ≤ MAX_STEPS × per_step_cost, depth ≤ MAX_DEPTH |
per-agent-run cost ceiling, kill-switch |
| Polling for job completion | total_cost ≤ ceil(MAX_WAIT_MS / poll_interval) × poll_cost |
absolute deadline + alert |
| Webhook handler → API call | idempotency key required; cycle if webhook is triggered by the same API | provider rate limit per key |
| Stream reconnect | attempts ≤ MAX_RECONNECTS, exponential backoff with cap |
provider connection cap |
| Cache miss stampede | singleflight → cost ≤ 1 × unit_cost per key per window |
n/a (deduped in code) |
| Self-scheduling job | recursion depth bounded by ledger row, not by code | scheduler-level dedup + max runs/day |
| Multi-provider fallback | sum across providers ≤ MAX_$_PER_RUN | hard cap on each provider separately |
Set these before the first deploy. None of them require code changes.
concurrency: { limit: N } on every function that calls a paid API.retries: 2 (Inngest default is 4 retries, i.e. up to 5 attempts including the initial — confirm against current Inngest docs) for paid call functions; fewer attempts on idempotent failures. Worst-case wallet math: attempts = 1 + retries, so a default step.run() can bill 5×, not 4×.NonRetriableError for 4xx — never retry a 4xx into a paid API.idempotency: ... on events you cannot deduplicate at the call site.| Scenario | Expected behavior |
|---|---|
| Empty input list | 0 calls, 0 cost, return early — do not even auth |
| Input list longer than MAX | reject with NonRetriableError, do not partial-process |
| All calls fail with 4xx | 1 attempt each, no retry, surface error |
| All calls fail with 5xx | bounded retries, total cost ≤ attempts × unit, alert on full exhaustion |
| Concurrent invocation of the same job | idempotency key dedups; second invocation costs $0 |
| Network partition mid-batch | partial cost banked; on resume, idempotency key prevents re-charge |
| Provider rate-limit (429) | respect Retry-After; do not multiply retries inside SDK and outside |
| Webhook retried by provider | idempotency at the handler boundary |
| Local dev accidentally pointing at prod key | per-env keys + per-env caps make this cost $0.50, not $50 |
| Cron fires while previous run still executing | concurrency limit = 1 OR explicit overlap-tolerant design |
Code you emit must:
// cost contract: block above each call site with the four numbers (unit cost, max calls, max $/run, provider hard cap setting).MAX_IMAGES_PER_RUN, MAX_AGENT_STEPS) for the bound — never a magic number inline.p-limit or equivalent — never raw Promise.all over a paid API.4xx retries via NonRetriableError or equivalent.| Excuse | Reality |
|---|---|
| "I'm only testing locally." | Local hits the same paid endpoint. A retry bug in test code bills the same dollars. |
| "The list is small, fan-out is fine." | The list is small today. Next week it is fetched from a table that grew 50×. The cap exists for next week. |
| "Inngest already retries, so I don't need a retry policy." | Inngest retries × your retry wrapper × SDK retries = 27 attempts. Each one bills. |
| "The API call is cheap, $0.001." | At 10,000 unintended invocations that is $10 — and the count is exactly what you failed to bound. |
| "I'll set the provider cap later." | The bug ships before "later". Set the cap in the 60 seconds it takes; the code can wait. |
| "Idempotency is overkill for this." | Webhooks retry. Load balancers retry. Browsers retry. Without an idempotency key, something will duplicate. |
| "We have monitoring, we'll catch it." | Monitoring catches it after $200 is spent. Caps prevent the $200 from being spent. |
| "It obviously terminates." | The $200/night incident also "obviously terminated". Write the integer bound. |
If any of these sound familiar mid-thought: stop, write the cost contract, set the provider cap, then write the code.
await Promise.all(items.map(x => paidApi(x))) with no p-limit.while (!done) await paidApi(...) with no integer bound..env shared across environments.All of these mean: stop, write the cost contract, set the provider cap, then write the code.
Before shipping any code that calls a paid API:
p-limit) AND at the queue (Inngest concurrency).Cannot check every box? The code is example-correct, not bill-correct. Either fill the gap or do not connect a billing-enabled key.
max calls is a proxy — the real cap is max input tokens × max output tokens × per-token rate. Adapt the protocol: replace max calls per run with max tokens per run.complexity-cuts for the corrective rewrite — runaway-guard prevents the next one, not the current one.Time bounds prevent stalls. Space bounds prevent OOMs. Dollar bounds prevent $200 mornings. AI assistants enforce the first two by default and ignore the third. runaway-guard makes them reason about the wallet first.