Skills Data Science Deploy GPU Workloads on CoreWeave

Deploy GPU Workloads on CoreWeave

v20260423
coreweave-hello-world
This guide demonstrates how to deploy GPU workloads on the CoreWeave platform using Kubernetes (kubectl). It covers deploying vLLM inference services for large language model (LLM) testing and running batch CUDA jobs for benchmarking. This is ideal for users running their first GPU workloads, testing inference pipelines, or verifying cluster access on bare-metal GPU nodes (A100, H100).
Get Skill
436 downloads
Overview

CoreWeave Hello World

Overview

Deploy your first GPU workload on CoreWeave: a simple inference service using vLLM or a batch CUDA job. CoreWeave runs Kubernetes on bare-metal GPU nodes with A100, H100, and L40 GPUs.

Prerequisites

  • Completed coreweave-install-auth setup
  • kubectl configured with CoreWeave kubeconfig
  • Namespace with GPU quota

Instructions

Step 1: Deploy a vLLM Inference Server

# vllm-inference.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: vllm-server
spec:
  replicas: 1
  selector:
    matchLabels:
      app: vllm-server
  template:
    metadata:
      labels:
        app: vllm-server
    spec:
      containers:
        - name: vllm
          image: vllm/vllm-openai:latest
          args:
            - "--model"
            - "meta-llama/Llama-3.1-8B-Instruct"
            - "--port"
            - "8000"
          ports:
            - containerPort: 8000
          resources:
            limits:
              nvidia.com/gpu: 1
              memory: 48Gi
              cpu: "8"
            requests:
              nvidia.com/gpu: 1
              memory: 32Gi
              cpu: "4"
          env:
            - name: HUGGING_FACE_HUB_TOKEN
              valueFrom:
                secretKeyRef:
                  name: hf-token
                  key: token
      affinity:
        nodeAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            nodeSelectorTerms:
              - matchExpressions:
                  - key: gpu.nvidia.com/class
                    operator: In
                    values: ["A100_PCIE_80GB"]
---
apiVersion: v1
kind: Service
metadata:
  name: vllm-server
spec:
  selector:
    app: vllm-server
  ports:
    - port: 8000
      targetPort: 8000
  type: ClusterIP
# Create HuggingFace token secret
kubectl create secret generic hf-token --from-literal=token="${HF_TOKEN}"

# Deploy
kubectl apply -f vllm-inference.yaml
kubectl get pods -w  # Wait for Running state

# Port-forward and test
kubectl port-forward svc/vllm-server 8000:8000 &
curl http://localhost:8000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{"model": "meta-llama/Llama-3.1-8B-Instruct", "messages": [{"role": "user", "content": "Hello!"}]}'

Step 2: Batch GPU Job

# gpu-batch-job.yaml
apiVersion: batch/v1
kind: Job
metadata:
  name: gpu-benchmark
spec:
  template:
    spec:
      restartPolicy: Never
      containers:
        - name: benchmark
          image: pytorch/pytorch:2.2.0-cuda12.1-cudnn8-runtime
          command: ["python3", "-c"]
          args:
            - |
              import torch
              print(f"CUDA available: {torch.cuda.is_available()}")
              print(f"GPU: {torch.cuda.get_device_name(0)}")
              x = torch.randn(10000, 10000, device="cuda")
              y = torch.matmul(x, x)
              print(f"Matrix multiply result shape: {y.shape}")
              print("CoreWeave GPU test passed!")
          resources:
            limits:
              nvidia.com/gpu: 1
      affinity:
        nodeAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            nodeSelectorTerms:
              - matchExpressions:
                  - key: gpu.nvidia.com/class
                    operator: In
                    values: ["A100_PCIE_80GB"]
kubectl apply -f gpu-batch-job.yaml
kubectl logs job/gpu-benchmark --follow

Error Handling

Error Cause Solution
Pod stuck Pending No GPU capacity Try different GPU type or check quota
nvidia-smi not found Wrong base image Use NVIDIA CUDA images
OOMKilled Insufficient GPU memory Use larger GPU (80GB A100)
Image pull error Registry auth Create imagePullSecret

Resources

Next Steps

Proceed to coreweave-local-dev-loop for development workflow setup.

Info
Category Data Science
Name coreweave-hello-world
Version v20260423
Size 4.6KB
Updated At 2026-04-28
Language