Login
Download
Skill UI
Browse and discover
5143+
curated skills
All
Development
Artificial Intelligence
Design & Creative
Product & Business
Data Science
Marketing
Soft Skills
Productivity
Engineering
Languages
Search
inference
, found
47
results
Default
Newest
Most Downloaded
AWQ Weight Quantization
awq-quantization
Orchestra-Research/AI-Research-SKILLs
151
AWQ provides activation-aware 4-bit quantization for large language models, delivering ~3x inference speedup and sub-5% accuracy loss to deploy instruction-tuned or multimodal models on memory-constrained GPUs with vLLM integration and Marlin kernels.
View Details
Batch Inference Pipeline
batch-inference-pipeline
jeremylongshore/claude-code-plugins-plus-skills
50
Guides ML teams through automated batch inference pipelines, suggesting best practices, monitoring and production readiness, and generating code and configs for deployment.
View Details
Feature Store Connector
feature-store-connector
jeremylongshore/claude-code-plugins-plus-skills
213
Automates guidance for setting up and validating feature store connectors within ML deployment pipelines, covering best practices, code generation, and production monitoring for inference.
View Details
GGUF Quantization Guide
gguf-quantization
Orchestra-Research/AI-Research-SKILLs
412
Provides GGUF format and llama.cpp quantization workflows for efficient CPU/Apple Silicon inference, including conversion, quantization, and runtime commands suited for deploying large models without GPUs.
View Details
Groq Cost Tuning
groq-cost-tuning
jeremylongshore/claude-code-plugins-plus-skills
375
Guide to reducing Groq inference spend by routing requests to cost-effective models, trimming token usage, caching repeated calls, batching requests, and setting spend limits for Groq Cloud billing scenarios.
View Details
Groq Deployment Guide
groq-deploy-integration
jeremylongshore/claude-code-plugins-plus-skills
60
Deploy Groq-powered applications to Vercel, Fly.io, and Cloud Run while configuring secrets, streaming responses, and health checks for low-latency LLM inference.
View Details
Groq Enterprise RBAC
groq-enterprise-rbac
jeremylongshore/claude-code-plugins-plus-skills
404
Guide for configuring Groq enterprise SSO, token scoping, and org-wide constraints so teams can safely use the LPU inference API with model-level permissions, rate limits, spending caps, and automated key rotation.
View Details
Groq Performance Tuning
groq-performance-tuning
jeremylongshore/claude-code-plugins-plus-skills
70
Guides developers to optimize Groq API calls through model selection, streaming, semantic caching, and parallel orchestration so latency-sensitive applications get consistent sub-100ms inference.
View Details
Groq Reference Architecture
groq-reference-architecture
jeremylongshore/claude-code-plugins-plus-skills
54
Defines best-practice Groq deployment with tiered model routing, middleware, streaming pipelines, and fallback chains for ultra-fast LLM inference and production monitoring when launching new Groq integrations.
View Details
GroqCloud Automation Suite
groqcloud-automation
ComposioHQ/awesome-claude-skills
302
GroqCloud Automation orchestrates high-performance GroqCloud APIs through Composio, covering inference, chat completions, audio translation, and TTS voice selection for production workflows.
View Details
Inference Latency Profiler
inference-latency-profiler
jeremylongshore/claude-code-plugins-plus-skills
208
Automates inference latency profiler tasks in ML deployment scenarios, offering step-by-step guidance on model serving, MLOps pipelines, monitoring, and production optimization, generating production-ready code and validating outputs against best practices.
View Details
LLM Knowledge Distillation
knowledge-distillation
Orchestra-Research/AI-Research-SKILLs
81
Compress large language models via teacher-student distillation, covering temperature scaling, soft targets, reverse KLD, and response distillation so you can deploy smaller LLMs with GPT-4-level behavior and lower inference cost.
View Details
1
2
3
4
Next
Language
简体中文
English