技能 编程开发 LLM可观测数据合规管理

LLM可观测数据合规管理

v20260423
langfuse-data-handling
本技能用于管理大型语言模型(LLM)可观测性平台(如Langfuse)的完整数据生命周期。核心功能包括导出详细的调用轨迹和评分数据、配置数据保留策略、执行GDPR等合规操作,以及对个人身份信息(PII)进行脱敏处理。适用于需要确保AI应用数据安全和合规性的场景。
获取技能
432 次下载
概览

Langfuse Data Handling

Overview

Manage the Langfuse data lifecycle: export traces and scores via the API, configure retention policies, handle GDPR data subject requests, anonymize data for analytics, and maintain audit trails.

Prerequisites

  • @langfuse/client installed
  • Langfuse API keys with appropriate permissions
  • Understanding of your compliance requirements (GDPR, SOC2, HIPAA)

Instructions

Step 1: Export Trace Data via API

import { LangfuseClient } from "@langfuse/client";
import { writeFileSync } from "fs";

const langfuse = new LangfuseClient();

async function exportTraces(options: {
  fromDate: string;
  toDate: string;
  outputFile: string;
  includeObservations?: boolean;
}) {
  const allTraces: any[] = [];
  let page = 1;
  let hasMore = true;

  while (hasMore) {
    const result = await langfuse.api.traces.list({
      fromTimestamp: options.fromDate,
      toTimestamp: options.toDate,
      limit: 100,
      page,
    });

    for (const trace of result.data) {
      const exportItem: any = {
        id: trace.id,
        name: trace.name,
        timestamp: trace.timestamp,
        userId: trace.userId,
        sessionId: trace.sessionId,
        metadata: trace.metadata,
        tags: trace.tags,
      };

      if (options.includeObservations) {
        const observations = await langfuse.api.observations.list({
          traceId: trace.id,
        });
        exportItem.observations = observations.data;
      }

      allTraces.push(exportItem);
    }

    hasMore = result.data.length === 100;
    page++;

    // Rate limit respect
    await new Promise((r) => setTimeout(r, 200));
  }

  writeFileSync(options.outputFile, JSON.stringify(allTraces, null, 2));
  console.log(`Exported ${allTraces.length} traces to ${options.outputFile}`);
}

// Usage
await exportTraces({
  fromDate: "2025-01-01T00:00:00Z",
  toDate: "2025-01-31T23:59:59Z",
  outputFile: "traces-january.json",
  includeObservations: true,
});

Step 2: Export Scores

async function exportScores(fromDate: string, outputFile: string) {
  const scores: any[] = [];
  let page = 1;
  let hasMore = true;

  while (hasMore) {
    const result = await langfuse.api.scores.list({
      fromTimestamp: fromDate,
      limit: 100,
      page,
    });

    scores.push(...result.data);
    hasMore = result.data.length === 100;
    page++;
    await new Promise((r) => setTimeout(r, 200));
  }

  writeFileSync(outputFile, JSON.stringify(scores, null, 2));
  console.log(`Exported ${scores.length} scores to ${outputFile}`);
}

Step 3: Data Retention Configuration

Self-hosted: Set retention via environment variable:

# docker-compose.yml
services:
  langfuse:
    environment:
      - LANGFUSE_RETENTION_DAYS=90

Cloud: Programmatic cleanup of old data:

async function enforceRetention(maxAgeDays: number) {
  const cutoff = new Date(Date.now() - maxAgeDays * 86400000).toISOString();

  const oldTraces = await langfuse.api.traces.list({
    toTimestamp: cutoff,
    limit: 100,
  });

  console.log(`Found ${oldTraces.data.length} traces older than ${maxAgeDays} days`);

  for (const trace of oldTraces.data) {
    await langfuse.api.traces.delete(trace.id);
    await new Promise((r) => setTimeout(r, 100)); // Rate limit
  }
}

// Run as cron job
await enforceRetention(90);

Step 4: GDPR Data Subject Requests

// Handle "Right to Access" -- export all data for a user
async function handleAccessRequest(userId: string) {
  const traces = await langfuse.api.traces.list({
    userId,
    limit: 1000,
  });

  const userData = {
    userId,
    exportDate: new Date().toISOString(),
    traceCount: traces.data.length,
    traces: traces.data.map((t) => ({
      id: t.id,
      name: t.name,
      timestamp: t.timestamp,
      input: t.input,
      output: t.output,
      metadata: t.metadata,
    })),
  };

  writeFileSync(`gdpr-export-${userId}.json`, JSON.stringify(userData, null, 2));
  return userData;
}

// Handle "Right to Erasure" -- delete all data for a user
async function handleDeletionRequest(userId: string) {
  const traces = await langfuse.api.traces.list({
    userId,
    limit: 1000,
  });

  let deleted = 0;
  for (const trace of traces.data) {
    await langfuse.api.traces.delete(trace.id);
    deleted++;
    await new Promise((r) => setTimeout(r, 100));
  }

  console.log(`Deleted ${deleted} traces for user ${userId}`);
  return { userId, tracesDeleted: deleted };
}

Step 5: Data Anonymization for Analytics

import crypto from "crypto";

function anonymizeTrace(trace: any): any {
  return {
    ...trace,
    userId: trace.userId ? crypto.createHash("sha256").update(trace.userId).digest("hex").slice(0, 16) : null,
    sessionId: trace.sessionId ? crypto.createHash("sha256").update(trace.sessionId).digest("hex").slice(0, 16) : null,
    input: "[REDACTED]",
    output: "[REDACTED]",
    metadata: {
      model: trace.metadata?.model,
      // Keep operational fields, remove PII
    },
  };
}

async function exportAnonymized(fromDate: string, outputFile: string) {
  const traces = await langfuse.api.traces.list({
    fromTimestamp: fromDate,
    limit: 1000,
  });

  const anonymized = traces.data.map(anonymizeTrace);
  writeFileSync(outputFile, JSON.stringify(anonymized, null, 2));
}

Data Categories and Retention

Category Contains PII? Default Retention Compliance Note
Traces (inputs/outputs) Likely 90 days Scrub PII before tracing
Generations (LLM I/O) Likely 90 days May contain user data
Scores Rarely 1 year Typically safe to retain
Sessions User ID linked 90 days Link to user data requests
Prompts No Indefinite Template data only
Datasets Maybe Per use case Review test data for PII

Error Handling

Issue Cause Solution
Export timeout Too many traces Reduce date range, use pagination
Missing user data Different userId format Verify exact userId used in traces
Deletion not immediate Async processing Allow time for propagation
Rate limited during export Too many API calls Add 200ms delay between pages

Resources

信息
Category 编程开发
Name langfuse-data-handling
版本 v20260423
大小 4.48KB
更新时间 2026-04-28
语言