Download

Skill UI

Browse and discover 6006+ curated skills

All Development Artificial Intelligence Design & Creative Product & Business Data Science Marketing Soft Skills Productivity Engineering Languages

Search LLM Serving , found 4 results

Default Newest Most Downloaded

LoRA PEFT Fine-Tuning

peft-fine-tuning

Orchestra-Research/AI-Research-SKILLs

Guides parameter-efficient fine-tuning for 7B-70B LLMs using PEFT/LoRA/QLoRA adapters, quantization, and multi-adapter serving so you can train <1% of parameters on consumer GPUs with minimal accuracy loss.

HighThroughput LLMS

serving-llms-vllm

Orchestra-Research/AI-Research-SKILLs

Deploy LLMs with vLLM to maximize throughput and minimize latency through PagedAttention, continuous batching, quantization, and tensor parallelism for production APIs or batch inference on constrained GPUs.

RadixAttention Structured Serving

Orchestra-Research/AI-Research-SKILLs

High-performance LLM/VLM serving framework that caches radix-tree prefixes to accelerate structured JSON/regex decoding, agentic workflows, and multi-turn tool-enabled conversations.

TensorRT LLM Optimizer

Orchestra-Research/AI-Research-SKILLs

Optimizes LLM inference on NVIDIA GPUs, delivering 10-100× faster throughput, sub-10ms latency, and multi-GPU scaling with FP8/INT4 quantization, in-flight batching, and production-ready serving for real-time deployments.

1

Language