技能 编程开发 CoreWeave GPU工作负载安全配置

CoreWeave GPU工作负载安全配置

v20260423
coreweave-security-basics
本技能旨在指导用户如何在CoreWeave平台上安全部署GPU工作负载。涵盖了从API密钥管理、RBAC权限控制到网络策略(NetworkPolicy)的全面安全最佳实践。帮助用户有效隔离命名空间、保护模型权重和敏感训练数据,确保云原生环境的高安全性。
获取技能
200 次下载
概览

CoreWeave Security Basics

Overview

CoreWeave provides bare-metal GPU cloud on Kubernetes. Security concerns center on compute credential management (kubeconfig, deploy tokens), network isolation between inference workloads, secrets for model registry access (HuggingFace, container registries), and protecting sensitive training data on persistent volumes. A compromised namespace can expose GPU resources, model weights, and customer inference data.

API Key Management

import { KubeConfig, CoreV1Api } from "@kubernetes/client-node";

function createCoreWeaveClient(): CoreV1Api {
  const apiKey = process.env.COREWEAVE_API_KEY;
  if (!apiKey) {
    throw new Error("Missing COREWEAVE_API_KEY — set via secrets manager");
  }
  const kc = new KubeConfig();
  kc.loadFromDefault();
  const api = kc.makeApiClient(CoreV1Api);
  // Never log kubeconfig or API key contents
  console.log("CoreWeave client initialized for namespace:", process.env.CW_NAMESPACE);
  return api;
}

Webhook Signature Verification

import crypto from "crypto";
import { Request, Response, NextFunction } from "express";

function verifyCoreWeaveWebhook(req: Request, res: Response, next: NextFunction): void {
  const signature = req.headers["x-coreweave-signature"] as string;
  const secret = process.env.COREWEAVE_WEBHOOK_SECRET!;
  const expected = crypto.createHmac("sha256", secret).update(req.body).digest("hex");
  if (!signature || !crypto.timingSafeEqual(Buffer.from(signature), Buffer.from(expected))) {
    res.status(401).send("Invalid signature");
    return;
  }
  next();
}

Input Validation

import { z } from "zod";

const WorkloadRequestSchema = z.object({
  namespace: z.string().regex(/^[a-z0-9-]+$/).max(63),
  gpu_type: z.enum(["A100_80GB", "A100_40GB", "H100_80GB", "RTX_A6000"]),
  gpu_count: z.number().int().min(1).max(8),
  image: z.string().regex(/^[a-z0-9.\-/]+:[a-z0-9.\-]+$/),
  model_id: z.string().min(1).max(200),
});

function validateWorkloadRequest(data: unknown) {
  return WorkloadRequestSchema.parse(data);
}

Data Protection

const CW_SENSITIVE_FIELDS = ["kubeconfig", "hf_token", "registry_password", "api_key", "model_weights_url"];

function redactCoreWeaveLog(record: Record<string, unknown>): Record<string, unknown> {
  const redacted = { ...record };
  for (const field of CW_SENSITIVE_FIELDS) {
    if (field in redacted) redacted[field] = "[REDACTED]";
  }
  return redacted;
}

Security Checklist

  • Kubeconfig stored in secrets manager, never in repos
  • Kubernetes Secrets used for model tokens (not env vars in YAML)
  • Network policies restrict inference endpoint access
  • RBAC limits namespace access per team
  • Container images scanned for CVEs before deployment
  • PVCs encrypted at rest for training data
  • GPU workload namespaces isolated with NetworkPolicy
  • Deploy tokens scoped per-namespace, not cluster-wide

Error Handling

Vulnerability Risk Mitigation
Leaked kubeconfig Full cluster access, GPU resource theft Secrets manager + RBAC scoping
Open inference endpoints Unauthorized model access NetworkPolicy ingress rules
Unscanned container images CVE exploitation in GPU pods CI image scanning before deploy
Overly broad RBAC Cross-namespace data leakage Per-team namespace RBAC bindings
Unencrypted PVCs Training data exposure Encrypted storage classes

Resources

Next Steps

See coreweave-prod-checklist.

信息
Category 编程开发
Name coreweave-security-basics
版本 v20260423
大小 4.16KB
更新时间 2026-04-28
语言