技能 编程开发 CoreWeave GPU部署CI/CD集成

CoreWeave GPU部署CI/CD集成

v20260423
coreweave-ci-integration
用于自动化CoreWeave GPU云工作负载的完整部署生命周期。该流程集成了单元测试、容器镜像构建(推送到GHCR)和到CoreWeave Kubernetes集群的滚动部署。适用于需要稳定、高性能部署的GPU依赖型机器学习推理模型。核心功能包括验证GPU资源请求和使用标准的Kubernetes API进行部署管理。
获取技能
358 次下载
概览

CoreWeave CI Integration

Overview

Set up CI/CD for CoreWeave GPU cloud workloads: run unit tests with mocked Kubernetes clients on every PR, deploy inference containers to CoreWeave namespaces on merge to main, and validate GPU resource requests against quota. CoreWeave uses standard Kubernetes APIs with GPU-specific scheduling, so CI pipelines authenticate via kubeconfig and manage deployments through kubectl.

GitHub Actions Workflow

# .github/workflows/coreweave-ci.yml
name: CoreWeave CI
on:
  pull_request:
    paths: ['src/**', 'k8s/**', 'Dockerfile']
  push:
    branches: [main]

jobs:
  unit-tests:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-node@v4
        with: { node-version: '20' }
      - run: npm ci
      - run: npm test -- --reporter=verbose

  deploy:
    if: github.ref == 'refs/heads/main'
    needs: unit-tests
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - name: Build and push container
        run: |
          echo "${{ secrets.GHCR_TOKEN }}" | docker login ghcr.io -u ${{ github.actor }} --password-stdin
          docker build -t ghcr.io/${{ github.repository }}/inference:${{ github.sha }} .
          docker push ghcr.io/${{ github.repository }}/inference:${{ github.sha }}
      - name: Deploy to CoreWeave
        env:
          KUBECONFIG_DATA: ${{ secrets.COREWEAVE_KUBECONFIG }}
        run: |
          echo "$KUBECONFIG_DATA" | base64 -d > /tmp/kubeconfig
          export KUBECONFIG=/tmp/kubeconfig
          kubectl set image deployment/inference \
            inference=ghcr.io/${{ github.repository }}/inference:${{ github.sha }}
          kubectl rollout status deployment/inference --timeout=300s

Mock-Based Unit Tests

// tests/coreweave-service.test.ts
import { describe, it, expect, vi } from 'vitest';
import { deployInferenceModel } from '../src/coreweave-service';

vi.mock('@kubernetes/client-node', () => ({
  KubeConfig: vi.fn().mockImplementation(() => ({
    loadFromDefault: vi.fn(),
    makeApiClient: vi.fn().mockReturnValue({
      patchNamespacedDeployment: vi.fn().mockResolvedValue({ body: { status: { readyReplicas: 1 } } }),
      listNamespacedPod: vi.fn().mockResolvedValue({
        body: { items: [{ metadata: { name: 'inference-abc' }, status: { phase: 'Running' } }] },
      }),
    }),
  })),
  AppsV1Api: vi.fn(),
}));

describe('CoreWeave Service', () => {
  it('deploys inference model with GPU requests', async () => {
    const result = await deployInferenceModel('llama-70b', { gpu: 'A100', count: 4 });
    expect(result.status).toBe('deployed');
    expect(result.gpuType).toBe('A100');
  });
});

Integration Tests

// tests/integration/coreweave.integration.test.ts
import { describe, it, expect } from 'vitest';
import { KubeConfig, CoreV1Api } from '@kubernetes/client-node';

const hasKubeconfig = !!process.env.COREWEAVE_KUBECONFIG;

describe.skipIf(!hasKubeconfig)('CoreWeave Live API', () => {
  it('lists GPU nodes in namespace', async () => {
    const kc = new KubeConfig();
    kc.loadFromString(Buffer.from(process.env.COREWEAVE_KUBECONFIG!, 'base64').toString());
    const k8sApi = kc.makeApiClient(CoreV1Api);
    const { body } = await k8sApi.listNamespacedPod('default');
    expect(Array.isArray(body.items)).toBe(true);
  });
});

Error Handling

CI Issue Cause Fix
KUBECONFIG_DATA empty Secret not set Run gh secret set COREWEAVE_KUBECONFIG --body "$(base64 -w0 kubeconfig)"
Rollout timeout GPU nodes unavailable Increase --timeout or check CoreWeave GPU availability dashboard
Image pull backoff GHCR auth expired Verify GHCR_TOKEN secret and image registry permissions
Quota exceeded GPU request exceeds namespace limit Check namespace quota with kubectl describe quota
Pod pending No matching GPU node type Verify nodeSelector matches available GPU SKUs (A100, H100)

Resources

Next Steps

For deployment patterns, see coreweave-deploy-integration.

信息
Category 编程开发
Name coreweave-ci-integration
版本 v20260423
大小 4.77KB
更新时间 2026-04-26
语言