Skills Artificial Intelligence Cohere Production Deployment Checklist

Cohere Production Deployment Checklist

v20260423
cohere-prod-checklist
This comprehensive checklist guides the safe and robust deployment of Cohere API v2 integrations from staging to production. It covers critical steps including API key management, code quality standards, performance tuning, implementing health checks and circuit breakers, and defining full rollback procedures, ensuring mission-critical AI features are reliable for launch.
Get Skill
239 downloads
Overview

Cohere Production Checklist

Overview

Complete go-live checklist for deploying Cohere API v2 integrations to production with safety gates, health checks, and rollback procedures.

Prerequisites

  • Staging environment tested and verified
  • Production API key (not trial) from dashboard.cohere.com
  • Deployment pipeline configured
  • Monitoring and alerting ready

Checklist

API & Authentication

  • Using production API key (not trial — trial is rate-limited to 20 calls/min)
  • CO_API_KEY stored in secret manager (Vault, AWS Secrets Manager, GCP Secret Manager)
  • Key rotation procedure documented and tested
  • Billing alerts configured at dashboard.cohere.com
  • Using API v2 endpoints (CohereClientV2, not CohereClient)

Code Quality

  • All API calls specify model parameter explicitly
  • embeddingTypes set for all Embed calls (required for v3+)
  • inputType set for all Embed calls (required for v3+)
  • Error handling catches CohereError and CohereTimeoutError
  • Retry logic with exponential backoff for 429 and 5xx
  • No hardcoded API keys in source code
  • Request/response logging excludes API keys and PII

Model Selection

  • Correct model IDs used (not deprecated names):
Use Case Recommended Model Fallback
Chat/generation command-a-03-2025 command-r-plus-08-2024
Lightweight chat command-r7b-12-2024 command-r-08-2024
Embeddings embed-v4.0 embed-english-v3.0
Reranking rerank-v3.5 rerank-english-v3.0

Performance

  • Embed calls batched (up to 96 texts per request)
  • Rerank calls limited to 1000 documents per request
  • Streaming enabled for user-facing chat (chatStream)
  • Connection pooling / keep-alive configured
  • Response caching for repeated embed/rerank queries
  • maxTokens set to prevent runaway generation costs

Health Check Endpoint

// /api/health
import { CohereClientV2, CohereError } from 'cohere-ai';

const cohere = new CohereClientV2();

export async function GET() {
  const start = Date.now();
  let cohereStatus: 'healthy' | 'degraded' | 'down' = 'down';

  try {
    // Cheapest possible health check — minimal chat
    await cohere.chat({
      model: 'command-r7b-12-2024',
      messages: [{ role: 'user', content: 'ping' }],
      maxTokens: 1,
    });
    cohereStatus = 'healthy';
  } catch (err) {
    if (err instanceof CohereError && err.statusCode === 429) {
      cohereStatus = 'degraded'; // Rate limited but reachable
    }
  }

  return Response.json({
    status: cohereStatus === 'healthy' ? 'ok' : 'degraded',
    cohere: {
      status: cohereStatus,
      latencyMs: Date.now() - start,
    },
    timestamp: new Date().toISOString(),
  });
}

Circuit Breaker

class CohereCircuitBreaker {
  private failures = 0;
  private lastFailure = 0;
  private state: 'closed' | 'open' | 'half-open' = 'closed';

  constructor(
    private threshold = 5,
    private resetMs = 60_000
  ) {}

  async call<T>(fn: () => Promise<T>, fallback?: () => T): Promise<T> {
    if (this.state === 'open') {
      if (Date.now() - this.lastFailure > this.resetMs) {
        this.state = 'half-open';
      } else if (fallback) {
        return fallback();
      } else {
        throw new Error('Cohere circuit breaker is open');
      }
    }

    try {
      const result = await fn();
      this.failures = 0;
      this.state = 'closed';
      return result;
    } catch (err) {
      this.failures++;
      this.lastFailure = Date.now();

      if (this.failures >= this.threshold) {
        this.state = 'open';
        console.error(`Cohere circuit breaker OPEN after ${this.failures} failures`);
      }
      throw err;
    }
  }
}

const breaker = new CohereCircuitBreaker();

Gradual Rollout

# Pre-flight
curl -sf https://staging.example.com/api/health | jq '.cohere'
curl -s https://status.cohere.com/api/v2/status.json | jq '.status'

# Deploy with canary (10% traffic)
kubectl apply -f k8s/production.yaml
kubectl rollout pause deployment/app

# Monitor for 10 minutes: error rate, latency, 429s
# Check: No increase in CohereError rate
# Check: P95 latency < 5s for chat, < 500ms for embed/rerank

# Proceed to 100%
kubectl rollout resume deployment/app
kubectl rollout status deployment/app

Monitoring Alerts

Alert Condition Severity
Cohere unreachable Health check fails 3x P1
High error rate 5xx > 5% of requests/5min P1
Rate limited 429 > 10/min P2
High latency Chat P95 > 10s P2
Auth failure Any 401 response P1
Budget exceeded Daily token cost > threshold P2

Rollback

# Immediate rollback
kubectl rollout undo deployment/app
kubectl rollout status deployment/app

# Verify rollback
curl -sf https://api.example.com/api/health | jq '.cohere'

Output

  • Production-ready Cohere integration with health checks
  • Circuit breaker preventing cascade failures
  • Monitoring alerts for Cohere-specific error conditions
  • Documented rollback procedure

Resources

Next Steps

For version upgrades, see cohere-upgrade-migration.

Info
Name cohere-prod-checklist
Version v20260423
Size 5.91KB
Updated At 2026-04-28
Language