技能 人工智能 云端机器学习工作负载迁移指南

云端机器学习工作负载迁移指南

v20260423
coreweave-migration-deep-dive
本技能包详细指导用户如何将机器学习工作负载(包括推理服务和训练管道)从AWS/GCP/Azure等大型云平台迁移到CoreWeave GPU云。内容涵盖成本对比、容器化步骤、Kubernetes配置适配和分阶段部署,帮助用户确保迁移过程平稳、高效且具成本效益。
获取技能
205 次下载
概览

CoreWeave Migration Deep Dive

Cost Comparison

Instance AWS CoreWeave Savings
1x A100 80GB ~$3.60/hr (p4d) ~$2.21/hr ~39%
8x A100 80GB ~$32/hr (p4d.24xl) ~$17.70/hr ~45%
1x H100 80GB ~$6.50/hr (p5) ~$4.76/hr ~27%

Migration Steps

Phase 1: Containerize

# If running on bare EC2/GCE, containerize first
docker build -t inference-server:v1 .
docker push ghcr.io/myorg/inference-server:v1

Phase 2: Adapt YAML for CoreWeave

Key changes from AWS EKS / GKE:

  1. Node affinity: Use gpu.nvidia.com/class instead of nvidia.com/gpu.product
  2. Storage: Use CoreWeave storage classes (shared-ssd-ord1)
  3. Networking: CoreWeave provides flat networking within VPC

Phase 3: Parallel Deploy

Run both old and new infrastructure simultaneously, gradually shift traffic.

Phase 4: Cut Over

Decommission old GPU instances after validation period.

Common Gotchas

Issue Solution
Different CUDA drivers Match container CUDA to CoreWeave node drivers
Storage migration Use rclone or rsync to move data to CoreWeave PVC
DNS changes Update ingress/load balancer DNS
IAM differences CoreWeave uses kubeconfig, not IAM roles

Resources

Next Steps

This completes the CoreWeave skill pack. Start with coreweave-install-auth for new deployments.

信息
Category 人工智能
Name coreweave-migration-deep-dive
版本 v20260423
大小 2.11KB
更新时间 2026-04-28
语言