核心云资源配额与限流管理

v20260423

coreweave-rate-limits

本技能用于管理CoreWeave云服务的GPU配额和速率限制。它提供了使用`kubectl`检查资源配额的方法，并演示了使用Python `asyncio`实现推理请求队列，确保在高并发环境下资源分配的稳定性和效率。

CoreWeave GPU 配额 Kubernetes 推理限流异步云服务

获取技能

438 次下载

概览

CoreWeave Rate Limits

Overview

CoreWeave limits are primarily GPU quota-based rather than API rate limits. Each namespace has allocated GPU quotas per type.

Check GPU Quota

kubectl describe resourcequota -n my-namespace
kubectl get resourcequota -o json | jq '.items[].status'

Inference Request Queuing

import asyncio
from collections import deque

class InferenceQueue:
    def __init__(self, max_concurrent: int = 10):
        self.semaphore = asyncio.Semaphore(max_concurrent)
        self.queue_depth = 0

    async def inference(self, client, prompt: str) -> str:
        self.queue_depth += 1
        async with self.semaphore:
            try:
                return await asyncio.to_thread(client.generate, prompt)
            finally:
                self.queue_depth -= 1

Resources

CoreWeave Node Pools

Next Steps

For security, see coreweave-security-basics.

信息

Category 编程开发

Name coreweave-rate-limits

版本 v20260423

大小 1.51KB

Source jeremylongshore/claude-code-plugins-plus-skills

更新时间 2026-04-28