Skills Engineering Upgrade And Migrate CAST AI Infrastructure

Upgrade And Migrate CAST AI Infrastructure

v20260423
castai-upgrade-migration
A comprehensive guide detailing the systematic process for upgrading and migrating CAST AI components. This includes updating Helm charts (agent, controller, evictor) and Terraform providers. It outlines prerequisites, step-by-step upgrade procedures, validation checks, and necessary rollback procedures to ensure a safe transition between versions.
Get Skill
355 downloads
Overview

CAST AI Upgrade & Migration

Overview

Upgrade CAST AI components: Helm charts for the agent/controller/evictor, Terraform provider version, and workload autoscaler. Includes rollback procedures for each component.

Prerequisites

  • Current CAST AI components installed
  • Staging cluster for testing upgrades first
  • Helm and kubectl access
  • Change management approval for production

Instructions

Step 1: Check Current Versions

# Helm chart versions
helm list -n castai-agent -o json | jq '.[] | {name: .name, chart: .chart, appVersion: .app_version}'

# Available versions
helm repo update
helm search repo castai-helm --versions | head -20

# Terraform provider version
grep "castai/castai" .terraform.lock.hcl
terraform providers

Step 2: Review Changelog

# Check CAST AI changelog for breaking changes
# https://docs.cast.ai/changelog/

# Check Terraform provider releases
# https://github.com/castai/terraform-provider-castai/releases

Step 3: Upgrade on Staging First

# Upgrade agent
helm upgrade castai-agent castai-helm/castai-agent \
  -n castai-agent --reuse-values

# Upgrade cluster controller
helm upgrade cluster-controller castai-helm/castai-cluster-controller \
  -n castai-agent --reuse-values

# Upgrade evictor
helm upgrade castai-evictor castai-helm/castai-evictor \
  -n castai-agent --reuse-values

# Upgrade workload autoscaler
helm upgrade castai-workload-autoscaler castai-helm/castai-workload-autoscaler \
  -n castai-agent --reuse-values

# Verify all pods restarted cleanly
kubectl get pods -n castai-agent -w

Step 4: Upgrade Terraform Provider

# Update version constraint in versions.tf
terraform {
  required_providers {
    castai = {
      source  = "castai/castai"
      version = "~> 7.5"  # Update to target version
    }
  }
}
# Upgrade provider
terraform init -upgrade
terraform plan -var-file=environments/staging.tfvars

# Review plan carefully for resource recreation
# Apply if plan looks safe
terraform apply -var-file=environments/staging.tfvars

Step 5: Validate After Upgrade

# Verify agent is online
curl -s -H "X-API-Key: ${CASTAI_API_KEY}" \
  "https://api.cast.ai/v1/kubernetes/external-clusters/${CASTAI_CLUSTER_ID}" \
  | jq '{agentStatus, name}'

# Verify autoscaler policies still applied
curl -s -H "X-API-Key: ${CASTAI_API_KEY}" \
  "https://api.cast.ai/v1/kubernetes/clusters/${CASTAI_CLUSTER_ID}/policies" \
  | jq '.enabled'

# Check for errors in new agent version
kubectl logs -n castai-agent deployment/castai-agent --tail=50 | grep -i error

Rollback Procedure

# Helm rollback to previous release
helm rollback castai-agent -n castai-agent
helm rollback cluster-controller -n castai-agent

# Terraform rollback
terraform plan -var-file=environments/staging.tfvars  # Review
# If needed, pin previous provider version:
# version = "= 7.4.2"
terraform init -upgrade
terraform apply -var-file=environments/staging.tfvars

Error Handling

Issue Cause Solution
Agent CrashLoop after upgrade Breaking chart change helm rollback to previous
Terraform plan shows destroy Major provider version jump Pin intermediate version
Policies reset after upgrade Chart default values changed Pass --reuse-values
Spot handler incompatible Node format changed Upgrade all components together

Resources

Next Steps

For CI pipeline integration, see castai-ci-integration.

Info
Category Engineering
Name castai-upgrade-migration
Version v20260423
Size 4.25KB
Updated At 2026-04-26
Language