技能 编程开发 Fly.io生产环境部署检查清单

Fly.io生产环境部署检查清单

v20260423
flyio-prod-checklist
本清单旨在指导开发者为Fly.io的生产环境部署做好全方位的准备。它涵盖了从多区域冗余、安全密钥管理到健康检查、优雅关闭和回滚测试等关键环节。使用此清单可以确保应用程序具备高可用性、高可靠性,能够抵御各种生产环境故障。
获取技能
245 次下载
概览

Fly.io Production Checklist

Overview

Fly.io runs applications on edge infrastructure across 30+ regions with Machines, Volumes, and managed Postgres. A production deployment requires multi-region redundancy, proper secret management, health checks, and rollback procedures. Misconfigured auto-scaling means cold starts; missing volume backups mean data loss. This checklist ensures your Fly.io app is production-hardened.

Authentication & Secrets

  • FLY_API_TOKEN stored in CI secrets (never in fly.toml or source)
  • All app secrets set via fly secrets (not [env] block)
  • Deploy tokens scoped per app (not org-wide personal tokens)
  • Key rotation scheduled (quarterly, or after team changes)
  • No hardcoded secrets in Dockerfile or codebase

API Integration

  • Production base URL: app deployed to https://<app>.fly.dev
  • force_https = true in fly.toml http_service
  • Custom domain with TLS certificate active and auto-renewing
  • min_machines_running = 1 to avoid cold starts
  • Machines deployed in 2+ regions for redundancy
  • Concurrency limits tuned (soft_limit/hard_limit per workload)
  • Volumes backed up if using persistent storage

Error Handling & Resilience

  • Health check endpoint configured with appropriate grace period
  • Graceful shutdown handles SIGTERM within 10s window
  • Auto-stop/auto-start configured for cost optimization
  • Postgres standby replica provisioned for database apps
  • Rollback procedure tested: fly releases rollback <N>
  • Dockerfile builds and runs identically local vs deployed

Monitoring & Alerting

  • fly logs streaming configured for centralized logging
  • Machine health monitored via fly machine status
  • Platform status checked: https://status.flyio.net
  • Alert on health check failures across any region
  • VM resource utilization tracked (fly scale show)

Validation Script

async function checkFlyioReadiness(): Promise<void> {
  const checks: { name: string; pass: boolean; detail: string }[] = [];
  // Fly.io API connectivity
  try {
    const res = await fetch('https://api.machines.dev/v1/apps', {
      headers: { Authorization: `Bearer ${process.env.FLY_API_TOKEN}` },
    });
    checks.push({ name: 'Fly API', pass: res.ok, detail: res.ok ? 'Connected' : `HTTP ${res.status}` });
  } catch (e: any) { checks.push({ name: 'Fly API', pass: false, detail: e.message }); }
  // Token present
  checks.push({ name: 'API Token Set', pass: !!process.env.FLY_API_TOKEN, detail: process.env.FLY_API_TOKEN ? 'Present' : 'MISSING' });
  // Platform status
  try {
    const res = await fetch('https://status.flyio.net/api/v2/status.json');
    const data = await res.json();
    const status = data?.status?.indicator || 'unknown';
    checks.push({ name: 'Platform Status', pass: status === 'none', detail: status === 'none' ? 'Operational' : status });
  } catch (e: any) { checks.push({ name: 'Platform Status', pass: false, detail: e.message }); }
  for (const c of checks) console.log(`[${c.pass ? 'PASS' : 'FAIL'}] ${c.name}: ${c.detail}`);
}
checkFlyioReadiness();

Error Handling

Check Risk if Skipped Priority
Multi-region deployment Single region outage = full downtime P1
Volume backups Data loss on machine replacement P1
Health check config Dead machines receive traffic P2
SIGTERM handling Dropped requests during deploys P2
Rollback procedure Stuck on broken release P3

Resources

Next Steps

See flyio-security-basics for network policies and secret management.

信息
Category 编程开发
Name flyio-prod-checklist
版本 v20260423
大小 4.15KB
更新时间 2026-04-28
语言