LangChain生产环境就绪度清单

v20260423

langchain-prod-checklist

这是一份全面的LangChain应用程序生产环境就绪度清单。它系统性地覆盖了从配置管理、错误处理、可观测性（如LangSmith）、性能优化、安全加固（防止提示注入、处理PII）到严格测试和部署策略等多个维度，帮助开发者确保AI应用具备高可靠性和可维护性，适用于应用上线前或现有系统的审计。

LangChain 大模型 AI 生产环境部署清单安全审计

获取技能

131 次下载

概览

LangChain Production Checklist

Overview

Comprehensive go-live checklist for deploying LangChain applications to production. Covers configuration, resilience, observability, performance, security, testing, deployment, and cost management.

1. Configuration & Secrets

All API keys in secrets manager (not .env in production)
Environment-specific configs (dev/staging/prod) validated with Zod
Startup validation fails fast on missing config
.env files in .gitignore

// Startup validation
import { z } from "zod";

const ProdConfig = z.object({
  OPENAI_API_KEY: z.string().startsWith("sk-"),
  LANGSMITH_API_KEY: z.string().startsWith("lsv2_"),
  NODE_ENV: z.literal("production"),
});

try {
  ProdConfig.parse(process.env);
} catch (e) {
  console.error("Invalid production config:", e);
  process.exit(1);
}

2. Error Handling & Resilience

maxRetries configured on all models (3-5)
timeout set on all models (30-60s)
Fallback models configured with .withFallbacks()
Error responses return safe messages (no stack traces to users)

const model = new ChatOpenAI({
  model: "gpt-4o-mini",
  maxRetries: 5,
  timeout: 30000,
}).withFallbacks({
  fallbacks: [new ChatAnthropic({ model: "claude-sonnet-4-20250514" })],
});

3. Observability

LangSmith tracing enabled (LANGSMITH_TRACING=true)
LANGCHAIN_CALLBACKS_BACKGROUND=true (non-serverless only)
Structured logging on all LLM/tool calls
Prometheus metrics exported (requests, latency, tokens, errors)
Alerting rules configured (error rate >5%, P95 latency >5s)

4. Performance

Caching enabled for repeated queries (Redis or SQLite)
maxConcurrency set on batch operations
Streaming enabled for user-facing responses
Connection pooling configured
Prompt length optimized (no unnecessary verbosity)

5. Security

User input isolated in human messages (never in system prompts)
Input length limits enforced
Prompt injection patterns logged/flagged
Tools restricted to allowlisted operations
LLM output validated before display (no PII/key leakage)
Audit logging on all LLM and tool calls
Rate limiting per user/IP

6. Testing

Unit tests for all chains (using FakeListChatModel, no API calls)
Integration tests with real LLMs (gated behind CI secrets)
RAG pipeline validation (retrieval relevance + no hallucination)
Tool unit tests (valid input, invalid input, error cases)
Load testing completed (concurrent users, batch operations)

7. Deployment

Health check endpoint returns LLM connectivity status
Graceful shutdown handles in-flight requests
Rolling deployment (zero downtime)
Rollback procedure documented and tested
Container resource limits set (memory, CPU)

// Health check endpoint
app.get("/health", async (_req, res) => {
  const checks: Record<string, string> = { server: "ok" };

  try {
    await model.invoke("ping");
    checks.llm = "ok";
  } catch (e: any) {
    checks.llm = `error: ${e.message.slice(0, 100)}`;
  }

  const healthy = Object.values(checks).every((v) => v === "ok");
  res.status(healthy ? 200 : 503).json({ status: healthy ? "healthy" : "degraded", checks });
});

// Graceful shutdown
process.on("SIGTERM", async () => {
  console.log("Shutting down gracefully...");
  server.close(() => process.exit(0));
  setTimeout(() => process.exit(1), 10000); // force after 10s
});

8. Cost Management

Token usage tracking callback attached
Daily/monthly budget limits enforced
Model tiering: cheap model for simple tasks, powerful for complex
Cost alerts configured (Slack/email on threshold)
Cost per user/tenant tracked

Pre-Launch Validation Script

async function validateProduction() {
  const results: Record<string, string> = {};

  // 1. Config
  try {
    ProdConfig.parse(process.env);
    results["Config"] = "PASS";
  } catch { results["Config"] = "FAIL: missing env vars"; }

  // 2. LLM connectivity
  try {
    await model.invoke("ping");
    results["LLM"] = "PASS";
  } catch (e: any) { results["LLM"] = `FAIL: ${e.message.slice(0, 50)}`; }

  // 3. Fallback
  try {
    const fallbackModel = model.withFallbacks({ fallbacks: [fallback] });
    await fallbackModel.invoke("ping");
    results["Fallback"] = "PASS";
  } catch { results["Fallback"] = "FAIL"; }

  // 4. LangSmith
  results["LangSmith"] = process.env.LANGSMITH_TRACING === "true" ? "PASS" : "WARN: disabled";

  // 5. Health endpoint
  try {
    const res = await fetch("http://localhost:8000/health");
    results["Health"] = res.ok ? "PASS" : "FAIL";
  } catch { results["Health"] = "FAIL: not reachable"; }

  console.table(results);
  const allPass = Object.values(results).every((v) => v === "PASS");
  console.log(allPass ? "READY FOR PRODUCTION" : "ISSUES FOUND - FIX BEFORE LAUNCH");
  return allPass;
}

Error Handling

Issue	Cause	Fix
API key missing at startup	Secrets not mounted	Check deployment config
No fallback on outage	`.withFallbacks()` not configured	Add fallback model
LangSmith trace gaps	Background callbacks in serverless	Set `LANGCHAIN_CALLBACKS_BACKGROUND=false`
Cache miss storm	Redis down	Implement graceful degradation

Resources

Next Steps

After launch, use langchain-observability for monitoring and langchain-incident-runbook for incident response.

信息

Category 人工智能

Name langchain-prod-checklist

版本 v20260423

大小 4.5KB

Source jeremylongshore/claude-code-plugins-plus-skills

更新时间 2026-04-28