serving-llms-vllm
Orchestra-Research/AI-Research-SKILLs
Serve production-grade LLM APIs with vLLM, leveraging PagedAttention, continuous batching, quantization, and tensor parallelism to maximize throughput while keeping latency, GPU usage, and scalability under control for OpenAI-compatible deployments.