CoreWeave GPU部署CI/CD集成

v20260423

coreweave-ci-integration

用于自动化CoreWeave GPU云工作负载的完整部署生命周期。该流程集成了单元测试、容器镜像构建（推送到GHCR）和到CoreWeave Kubernetes集群的滚动部署。适用于需要稳定、高性能部署的GPU依赖型机器学习推理模型。核心功能包括验证GPU资源请求和使用标准的Kubernetes API进行部署管理。

CoreWeave CI/CD Kubernetes GPU 推理服务 GitHub Actions DevOps

获取技能

358 次下载

概览

CoreWeave CI Integration

Overview

Set up CI/CD for CoreWeave GPU cloud workloads: run unit tests with mocked Kubernetes clients on every PR, deploy inference containers to CoreWeave namespaces on merge to main, and validate GPU resource requests against quota. CoreWeave uses standard Kubernetes APIs with GPU-specific scheduling, so CI pipelines authenticate via kubeconfig and manage deployments through kubectl.

GitHub Actions Workflow

# .github/workflows/coreweave-ci.yml
name: CoreWeave CI
on:
  pull_request:
    paths: ['src/**', 'k8s/**', 'Dockerfile']
  push:
    branches: [main]

jobs:
  unit-tests:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-node@v4
        with: { node-version: '20' }
      - run: npm ci
      - run: npm test -- --reporter=verbose

  deploy:
    if: github.ref == 'refs/heads/main'
    needs: unit-tests
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - name: Build and push container
        run: |
          echo "${{ secrets.GHCR_TOKEN }}" | docker login ghcr.io -u ${{ github.actor }} --password-stdin
          docker build -t ghcr.io/${{ github.repository }}/inference:${{ github.sha }} .
          docker push ghcr.io/${{ github.repository }}/inference:${{ github.sha }}
      - name: Deploy to CoreWeave
        env:
          KUBECONFIG_DATA: ${{ secrets.COREWEAVE_KUBECONFIG }}
        run: |
          echo "$KUBECONFIG_DATA" | base64 -d > /tmp/kubeconfig
          export KUBECONFIG=/tmp/kubeconfig
          kubectl set image deployment/inference \
            inference=ghcr.io/${{ github.repository }}/inference:${{ github.sha }}
          kubectl rollout status deployment/inference --timeout=300s

Mock-Based Unit Tests

// tests/coreweave-service.test.ts
import { describe, it, expect, vi } from 'vitest';
import { deployInferenceModel } from '../src/coreweave-service';

vi.mock('@kubernetes/client-node', () => ({
  KubeConfig: vi.fn().mockImplementation(() => ({
    loadFromDefault: vi.fn(),
    makeApiClient: vi.fn().mockReturnValue({
      patchNamespacedDeployment: vi.fn().mockResolvedValue({ body: { status: { readyReplicas: 1 } } }),
      listNamespacedPod: vi.fn().mockResolvedValue({
        body: { items: [{ metadata: { name: 'inference-abc' }, status: { phase: 'Running' } }] },
      }),
    }),
  })),
  AppsV1Api: vi.fn(),
}));

describe('CoreWeave Service', () => {
  it('deploys inference model with GPU requests', async () => {
    const result = await deployInferenceModel('llama-70b', { gpu: 'A100', count: 4 });
    expect(result.status).toBe('deployed');
    expect(result.gpuType).toBe('A100');
  });
});

Integration Tests

// tests/integration/coreweave.integration.test.ts
import { describe, it, expect } from 'vitest';
import { KubeConfig, CoreV1Api } from '@kubernetes/client-node';

const hasKubeconfig = !!process.env.COREWEAVE_KUBECONFIG;

describe.skipIf(!hasKubeconfig)('CoreWeave Live API', () => {
  it('lists GPU nodes in namespace', async () => {
    const kc = new KubeConfig();
    kc.loadFromString(Buffer.from(process.env.COREWEAVE_KUBECONFIG!, 'base64').toString());
    const k8sApi = kc.makeApiClient(CoreV1Api);
    const { body } = await k8sApi.listNamespacedPod('default');
    expect(Array.isArray(body.items)).toBe(true);
  });
});

Error Handling

CI Issue	Cause	Fix
`KUBECONFIG_DATA` empty	Secret not set	Run `gh secret set COREWEAVE_KUBECONFIG --body "$(base64 -w0 kubeconfig)"`
Rollout timeout	GPU nodes unavailable	Increase `--timeout` or check CoreWeave GPU availability dashboard
Image pull backoff	GHCR auth expired	Verify `GHCR_TOKEN` secret and image registry permissions
Quota exceeded	GPU request exceeds namespace limit	Check namespace quota with `kubectl describe quota`
Pod pending	No matching GPU node type	Verify `nodeSelector` matches available GPU SKUs (A100, H100)

Resources

Next Steps

For deployment patterns, see coreweave-deploy-integration.

信息

Category 编程开发

Name coreweave-ci-integration

版本 v20260423

大小 4.77KB

Source jeremylongshore/claude-code-plugins-plus-skills

更新时间 2026-04-26