下载

Groq生产部署清单

v20260423

groq-prod-checklist

这是一个全面的指南，用于确保使用 Groq API 的应用能够安全、稳定地部署到生产环境。内容涵盖API密钥管理、模型选型、速率限制规划、错误处理（如故障转移和熔断器）以及完善的监控和健康检查设置，确保系统平稳发布。

Groq 部署 MLOps DevOps API 生产环境错误处理检查清单

获取技能

167 次下载

概览

Groq Production Checklist

Overview

Complete pre-launch checklist for deploying Groq-powered applications to production. Covers API key security, model selection, rate limit planning, fallback strategies, and monitoring setup.

Prerequisites

Staging environment tested with Groq API
Groq Developer or Enterprise plan (free tier is not suitable for production)
Production API key created in console.groq.com
Monitoring and alerting infrastructure ready

Pre-Deployment Checklist

API Key & Auth

Production API key stored in secret manager (not .env files)
Key is NOT shared with development or staging environments
Key rotation procedure documented and tested
Pre-commit hook blocks gsk_ pattern in code

Model Selection

Production model chosen and tested (recommend llama-3.3-70b-versatile)
Fallback model configured (llama-3.1-8b-instant)
Deprecated model IDs removed (check deprecations)
max_tokens set to actual expected output size (not context max)

Rate Limit Planning

Production rate limits known (check console.groq.com/settings/limits)
Estimated peak RPM < 80% of limit
Estimated peak TPM < 80% of limit
Exponential backoff with retry-after header implemented
Request queue for burst protection (p-queue or similar)

Error Handling

All Groq error types caught (Groq.APIError, Groq.APIConnectionError)
429 errors retried with backoff
5xx errors retried with backoff
401 errors trigger alert (key may be revoked)
Network timeouts configured (default 60s may be too long)
Circuit breaker pattern for sustained failures

Fallback & Degradation

async function completionWithFallback(messages: any[]) {
  try {
    return await groq.chat.completions.create({
      model: "llama-3.3-70b-versatile",
      messages,
      timeout: 15_000,
    });
  } catch (err: any) {
    if (err.status === 429 || err.status >= 500) {
      console.warn("Groq primary failed, trying fallback model");
      try {
        return await groq.chat.completions.create({
          model: "llama-3.1-8b-instant",
          messages,
          timeout: 10_000,
        });
      } catch {
        console.error("Groq fully unavailable, degrading gracefully");
        return { choices: [{ message: { content: "Service temporarily unavailable. Please try again." } }] };
      }
    }
    throw err;
  }
}

Health Check Endpoint

// /api/health or /healthz
export async function GET() {
  const checks: Record<string, any> = { status: "healthy" };
  const start = performance.now();

  try {
    await groq.chat.completions.create({
      model: "llama-3.1-8b-instant",
      messages: [{ role: "user", content: "OK" }],
      max_tokens: 1,
      temperature: 0,
    });
    checks.groq = { status: "connected", latencyMs: Math.round(performance.now() - start) };
  } catch (err: any) {
    checks.status = "degraded";
    checks.groq = { status: "error", error: err.status || err.message };
  }

  return Response.json(checks, { status: checks.status === "healthy" ? 200 : 503 });
}

Monitoring Setup

Latency histogram (p50, p95, p99)
Token throughput counter (tokens/sec by model)
Error rate by status code (429, 5xx)
Rate limit remaining gauge (from response headers)
Cost tracking (tokens * price per million)
Alert: latency p95 > 1s (Groq normally < 200ms)
Alert: error rate > 5%
Alert: rate limit remaining < 10%

Spending Controls

Monthly spending cap set in Groq Console
Budget alerts at 50%, 80%, 95%
Auto-pause enabled when cap is reached

Documentation

Incident runbook created (see groq-incident-runbook)
Key rotation SOP documented
On-call knows how to check status.groq.com
Rollback procedure tested

Go-Live Verification

set -euo pipefail
# Pre-flight checks
echo "1. Groq API status..."
curl -sf https://status.groq.com > /dev/null && echo "OK" || echo "ISSUE"

echo "2. Production key valid..."
curl -sf https://api.groq.com/openai/v1/models \
  -H "Authorization: Bearer $GROQ_API_KEY_PROD" | jq '.data | length'

echo "3. Health endpoint..."
curl -sf https://your-app.com/api/health | jq .

echo "4. Rate limit headroom..."
curl -si https://api.groq.com/openai/v1/chat/completions \
  -H "Authorization: Bearer $GROQ_API_KEY_PROD" \
  -H "Content-Type: application/json" \
  -d '{"model":"llama-3.1-8b-instant","messages":[{"role":"user","content":"ping"}],"max_tokens":1}' \
  2>/dev/null | grep -i "x-ratelimit-remaining"

Error Handling

Alert	Condition	Severity
API errors spike	5xx rate > 5/min	P1
Latency degraded	p95 > 1000ms	P2
Rate limited	429 count > 5/min	P2
Auth failure	Any 401 error	P1
Spending near cap	>90% of monthly budget	P3

Resources

Next Steps

For version upgrades, see groq-upgrade-migration.

信息

Category 编程开发

Name groq-prod-checklist

版本 v20260423

大小 5.78KB

Source jeremylongshore/claude-code-plugins-plus-skills

更新时间 2026-04-28

语言

简体中文

English