Download

Skill UI

Browse and discover 5145+ curated skills

All Development Artificial Intelligence Design & Creative Product & Business Data Science Marketing Soft Skills Productivity Engineering Languages

Search inference , found 33 results

Default Newest Most Downloaded

AWQ Weight Quantization

awq-quantization

Orchestra-Research/AI-Research-SKILLs

AWQ provides activation-aware 4-bit quantization for large language models, delivering ~3x inference speedup and sub-5% accuracy loss to deploy instruction-tuned or multimodal models on memory-constrained GPUs with vLLM integration and Marlin kernels.

Batch Inference Pipeline

batch-inference-pipeline

jeremylongshore/claude-code-plugins-plus-skills

Guides ML teams through automated batch inference pipelines, suggesting best practices, monitoring and production readiness, and generating code and configs for deployment.

Groq Cost Tuning

groq-cost-tuning

jeremylongshore/claude-code-plugins-plus-skills

Guide to reducing Groq inference spend by routing requests to cost-effective models, trimming token usage, caching repeated calls, batching requests, and setting spend limits for Groq Cloud billing scenarios.

Groq Reference Architecture

groq-reference-architecture

jeremylongshore/claude-code-plugins-plus-skills

Defines best-practice Groq deployment with tiered model routing, middleware, streaming pipelines, and fallback chains for ultra-fast LLM inference and production monitoring when launching new Groq integrations.

GroqCloud Automation Suite

groqcloud-automation

ComposioHQ/awesome-claude-skills

GroqCloud Automation orchestrates high-performance GroqCloud APIs through Composio, covering inference, chat completions, audio translation, and TTS voice selection for production workflows.

Inference Latency Profiler

inference-latency-profiler

jeremylongshore/claude-code-plugins-plus-skills

Automates inference latency profiler tasks in ML deployment scenarios, offering step-by-step guidance on model serving, MLOps pipelines, monitoring, and production optimization, generating production-ready code and validating outputs against best practices.

LLM Knowledge Distillation

knowledge-distillation

Orchestra-Research/AI-Research-SKILLs

Compress large language models via teacher-student distillation, covering temperature scaling, soft targets, reverse KLD, and response distillation so you can deploy smaller LLMs with GPT-4-level behavior and lower inference cost.

Lambda Labs GPU Cloud

lambda-labs-gpu-cloud

Orchestra-Research/AI-Research-SKILLs

Lambda Labs GPU cloud offers reserved and on-demand instances with SSH access, persistent filesystems, and 1-Click multi-node clusters, making it ideal for long-running training and inference workloads that need high-performance GPUs.

Llama cpp CPU Inference

Orchestra-Research/AI-Research-SKILLs

Deploy llama.cpp to run LLM inference across CPUs, Apple Silicon, and non-NVIDIA GPUs, making it ideal for edge devices or CUDA-free setups with GGUF quantization for faster, lower-memory results.

LlamaGuard Content Moderation

Orchestra-Research/AI-Research-SKILLs

LlamaGuard is Meta’s 7–8B safety-specialized LLM that filters both prompts and responses by classifying six threat categories, enabling fast inference via vLLM/SageMaker and integration into NeMo Guardrails for end-to-end moderation.

Long Context Extensions

Orchestra-Research/AI-Research-SKILLs

Extends transformer models’ context windows using RoPE, YaRN, ALiBi, and interpolation so LLMs can process documents of 32k–128k+ tokens, extrapolate to longer lengths, and deploy efficient positional encodings and bias strategies for fine-tuning or inference.

Mamba Selective State Models

mamba-architecture

Orchestra-Research/AI-Research-SKILLs

Mamba provides selective state-space models with O(n) inference complexity, letting you handle million-token sequences faster than transformers while skipping KV caches and benefiting from a hardware-aware design. Use it for long-context language modeling, streaming applications, and scalable low-memory sequence learners.

Language