技能 硬件工程 CAST AI升级与迁移指南

CAST AI升级与迁移指南

v20260423
castai-upgrade-migration
本指南详细介绍了CAST AI组件的升级和迁移流程,涵盖了Helm Charts(代理、控制器等)和Terraform Provider的更新。内容包括升级前置条件、分步操作指南、验证检查以及详细的回滚步骤,确保系统版本升级过程的安全性和稳定性。
获取技能
355 次下载
概览

CAST AI Upgrade & Migration

Overview

Upgrade CAST AI components: Helm charts for the agent/controller/evictor, Terraform provider version, and workload autoscaler. Includes rollback procedures for each component.

Prerequisites

  • Current CAST AI components installed
  • Staging cluster for testing upgrades first
  • Helm and kubectl access
  • Change management approval for production

Instructions

Step 1: Check Current Versions

# Helm chart versions
helm list -n castai-agent -o json | jq '.[] | {name: .name, chart: .chart, appVersion: .app_version}'

# Available versions
helm repo update
helm search repo castai-helm --versions | head -20

# Terraform provider version
grep "castai/castai" .terraform.lock.hcl
terraform providers

Step 2: Review Changelog

# Check CAST AI changelog for breaking changes
# https://docs.cast.ai/changelog/

# Check Terraform provider releases
# https://github.com/castai/terraform-provider-castai/releases

Step 3: Upgrade on Staging First

# Upgrade agent
helm upgrade castai-agent castai-helm/castai-agent \
  -n castai-agent --reuse-values

# Upgrade cluster controller
helm upgrade cluster-controller castai-helm/castai-cluster-controller \
  -n castai-agent --reuse-values

# Upgrade evictor
helm upgrade castai-evictor castai-helm/castai-evictor \
  -n castai-agent --reuse-values

# Upgrade workload autoscaler
helm upgrade castai-workload-autoscaler castai-helm/castai-workload-autoscaler \
  -n castai-agent --reuse-values

# Verify all pods restarted cleanly
kubectl get pods -n castai-agent -w

Step 4: Upgrade Terraform Provider

# Update version constraint in versions.tf
terraform {
  required_providers {
    castai = {
      source  = "castai/castai"
      version = "~> 7.5"  # Update to target version
    }
  }
}
# Upgrade provider
terraform init -upgrade
terraform plan -var-file=environments/staging.tfvars

# Review plan carefully for resource recreation
# Apply if plan looks safe
terraform apply -var-file=environments/staging.tfvars

Step 5: Validate After Upgrade

# Verify agent is online
curl -s -H "X-API-Key: ${CASTAI_API_KEY}" \
  "https://api.cast.ai/v1/kubernetes/external-clusters/${CASTAI_CLUSTER_ID}" \
  | jq '{agentStatus, name}'

# Verify autoscaler policies still applied
curl -s -H "X-API-Key: ${CASTAI_API_KEY}" \
  "https://api.cast.ai/v1/kubernetes/clusters/${CASTAI_CLUSTER_ID}/policies" \
  | jq '.enabled'

# Check for errors in new agent version
kubectl logs -n castai-agent deployment/castai-agent --tail=50 | grep -i error

Rollback Procedure

# Helm rollback to previous release
helm rollback castai-agent -n castai-agent
helm rollback cluster-controller -n castai-agent

# Terraform rollback
terraform plan -var-file=environments/staging.tfvars  # Review
# If needed, pin previous provider version:
# version = "= 7.4.2"
terraform init -upgrade
terraform apply -var-file=environments/staging.tfvars

Error Handling

Issue Cause Solution
Agent CrashLoop after upgrade Breaking chart change helm rollback to previous
Terraform plan shows destroy Major provider version jump Pin intermediate version
Policies reset after upgrade Chart default values changed Pass --reuse-values
Spot handler incompatible Node format changed Upgrade all components together

Resources

Next Steps

For CI pipeline integration, see castai-ci-integration.

信息
Category 硬件工程
Name castai-upgrade-migration
版本 v20260423
大小 4.25KB
更新时间 2026-04-26
语言