coreweave-performance-tuning
jeremylongshore/claude-code-plugins-plus-skills
This guide provides expert strategies for optimizing GPU inference performance on CoreWeave infrastructure. It covers GPU selection based on workload (LLM, image generation, training), advanced techniques like continuous batching (using vLLM), autoscaling setup (HPA), and benchmarking data. Use this to maximize GPU utilization, minimize latency, and optimize large-scale AI model deployment.