Login
Download
Skill UI
Browse and discover
6006+
curated skills
All
Development
Artificial Intelligence
Design & Creative
Product & Business
Data Science
Marketing
Soft Skills
Productivity
Engineering
Languages
Search
LLM Serving
, found
4
results
Default
Newest
Most Downloaded
LoRA PEFT Fine-Tuning
peft-fine-tuning
Orchestra-Research/AI-Research-SKILLs
344
Guides parameter-efficient fine-tuning for 7B-70B LLMs using PEFT/LoRA/QLoRA adapters, quantization, and multi-adapter serving so you can train <1% of parameters on consumer GPUs with minimal accuracy loss.
View Details
HighThroughput LLMS
serving-llms-vllm
Orchestra-Research/AI-Research-SKILLs
284
Deploy LLMs with vLLM to maximize throughput and minimize latency through PagedAttention, continuous batching, quantization, and tensor parallelism for production APIs or batch inference on constrained GPUs.
View Details
RadixAttention Structured Serving
sglang
Orchestra-Research/AI-Research-SKILLs
326
High-performance LLM/VLM serving framework that caches radix-tree prefixes to accelerate structured JSON/regex decoding, agentic workflows, and multi-turn tool-enabled conversations.
View Details
TensorRT LLM Optimizer
tensorrt-llm
Orchestra-Research/AI-Research-SKILLs
376
Optimizes LLM inference on NVIDIA GPUs, delivering 10-100× faster throughput, sub-10ms latency, and multi-GPU scaling with FP8/INT4 quantization, in-flight batching, and production-ready serving for real-time deployments.
View Details
1
Language
简体中文
English