Skills Development Optimizing LLM Observability Rate Limiting

Optimizing LLM Observability Rate Limiting

v20260423
langfuse-rate-limits
A comprehensive guide on implementing robust strategies for high-volume LLM observability workloads using Langfuse. Learn to manage API rate limits by implementing exponential backoff with jitter, optimizing SDK batching settings, controlling concurrency with queues, and utilizing configurable sampling to prevent data loss during periods of high traffic.
Get Skill
446 downloads
Overview

Langfuse Rate Limits

Overview

Handle Langfuse API rate limits with optimized SDK batching, exponential backoff with jitter, concurrent request limiting, and configurable sampling for ultra-high-volume workloads.

Prerequisites

  • Langfuse SDK installed and configured
  • High-volume trace workload (1,000+ events/minute)

Instructions

Step 1: Optimize SDK Batching Configuration

The Langfuse SDK batches events internally before sending. Tuning batch settings is the first defense against rate limits.

// v3 Legacy: Direct configuration
import { Langfuse } from "langfuse";

const langfuse = new Langfuse({
  flushAt: 50,           // Events per batch (default: 15, max ~200)
  flushInterval: 10000,  // Milliseconds between flushes (default: 10000)
  requestTimeout: 30000, // Timeout per batch request
});

// v4+: Configure via OTel span processor
import { LangfuseSpanProcessor } from "@langfuse/otel";
import { NodeSDK } from "@opentelemetry/sdk-node";

const processor = new LangfuseSpanProcessor({
  exportIntervalMillis: 10000, // Flush interval
  maxExportBatchSize: 50,      // Events per batch
});

const sdk = new NodeSDK({ spanProcessors: [processor] });
sdk.start();

Step 2: Implement Retry with Exponential Backoff

For custom API calls (scores, datasets, prompts) that hit rate limits:

async function withRetry<T>(
  fn: () => Promise<T>,
  options: { maxRetries?: number; baseDelayMs?: number; maxDelayMs?: number } = {}
): Promise<T> {
  const { maxRetries = 5, baseDelayMs = 1000, maxDelayMs = 30000 } = options;

  for (let attempt = 0; attempt <= maxRetries; attempt++) {
    try {
      return await fn();
    } catch (error: any) {
      const status = error?.status || error?.response?.status;

      // Only retry on rate limits (429) and server errors (5xx)
      if (attempt === maxRetries || (status && status < 429)) {
        throw error;
      }

      // Honor Retry-After header if present
      const retryAfter = error?.response?.headers?.["retry-after"];
      let delay: number;

      if (retryAfter) {
        delay = parseInt(retryAfter, 10) * 1000;
      } else {
        // Exponential backoff with jitter
        delay = Math.min(baseDelayMs * Math.pow(2, attempt), maxDelayMs);
        delay += Math.random() * 500; // Jitter
      }

      console.warn(`Rate limited. Retry ${attempt + 1}/${maxRetries} in ${Math.round(delay)}ms`);
      await new Promise((r) => setTimeout(r, delay));
    }
  }
  throw new Error("Unreachable");
}

// Usage with Langfuse client operations
const langfuse = new LangfuseClient();

await withRetry(() =>
  langfuse.score.create({
    traceId: "trace-123",
    name: "quality",
    value: 0.95,
    dataType: "NUMERIC",
  })
);

Step 3: Queue-Based Concurrency Limiting

Use p-queue to cap concurrent Langfuse API calls:

import PQueue from "p-queue";
import { LangfuseClient } from "@langfuse/client";

const langfuse = new LangfuseClient();

// Max 10 concurrent API calls, 50 per second
const queue = new PQueue({
  concurrency: 10,
  interval: 1000,
  intervalCap: 50,
});

// Queue score submissions
async function queueScore(params: {
  traceId: string;
  name: string;
  value: number;
}) {
  return queue.add(() =>
    langfuse.score.create({
      ...params,
      dataType: "NUMERIC",
    })
  );
}

// Queue dataset item creation
async function queueDatasetItem(datasetName: string, item: any) {
  return queue.add(() =>
    langfuse.api.datasetItems.create({
      datasetName,
      input: item.input,
      expectedOutput: item.expectedOutput,
    })
  );
}

// Monitor queue health
setInterval(() => {
  console.log(`Queue: ${queue.pending} pending, ${queue.size} queued`);
}, 10000);

Step 4: Configurable Sampling for Ultra-High Volume

When tracing volume exceeds rate limits, sample traces instead of dropping them:

import { observe, updateActiveObservation, startActiveObservation } from "@langfuse/tracing";

class TraceSampler {
  private rate: number;
  private windowCounts: number[] = [];
  private windowMs = 60000; // 1 minute window
  private maxPerWindow: number;

  constructor(sampleRate: number, maxPerMinute: number) {
    this.rate = sampleRate;
    this.maxPerWindow = maxPerMinute;
  }

  shouldSample(tags?: string[]): boolean {
    // Always sample errors
    if (tags?.includes("error") || tags?.includes("critical")) {
      return true;
    }

    // Check window limit
    const now = Date.now();
    this.windowCounts = this.windowCounts.filter((t) => t > now - this.windowMs);
    if (this.windowCounts.length >= this.maxPerWindow) {
      return false;
    }

    // Probabilistic sampling
    if (Math.random() > this.rate) {
      return false;
    }

    this.windowCounts.push(now);
    return true;
  }
}

// 10% sampling, max 1000 traces/minute
const sampler = new TraceSampler(0.1, 1000);

async function sampledOperation(name: string, fn: () => Promise<any>) {
  if (!sampler.shouldSample()) {
    return fn(); // Run without tracing
  }

  return startActiveObservation(name, async () => {
    updateActiveObservation({ metadata: { sampled: true } });
    return fn();
  });
}

Rate Limit Reference

Tier Traces/min Batch Size Strategy
Hobby ~500 15 Default settings
Pro ~5,000 50 Increase flushAt
Team ~10,000 100 + Queue-based limiting
Enterprise Custom Custom + Sampling

Error Handling

Error Response Action
429 Too Many Requests Retry-After: N Backoff for N seconds
503 Service Unavailable Server overloaded Backoff 30s+
Flush timeout Large batch Reduce flushAt, increase requestTimeout
Memory growth Queue backup Add maxSize to PQueue

Resources

Info
Category Development
Name langfuse-rate-limits
Version v20260423
Size 4.98KB
Updated At 2026-04-26
Language