Download

Skill UI

Browse and discover 5143+ curated skills

All Development Artificial Intelligence Design & Creative Product & Business Data Science Marketing Soft Skills Productivity Engineering Languages

Search inference , found 47 results

Default Newest Most Downloaded

AWQ Weight Quantization

awq-quantization

Orchestra-Research/AI-Research-SKILLs

AWQ provides activation-aware 4-bit quantization for large language models, delivering ~3x inference speedup and sub-5% accuracy loss to deploy instruction-tuned or multimodal models on memory-constrained GPUs with vLLM integration and Marlin kernels.

Batch Inference Pipeline

batch-inference-pipeline

jeremylongshore/claude-code-plugins-plus-skills

Guides ML teams through automated batch inference pipelines, suggesting best practices, monitoring and production readiness, and generating code and configs for deployment.

Feature Store Connector

feature-store-connector

jeremylongshore/claude-code-plugins-plus-skills

Automates guidance for setting up and validating feature store connectors within ML deployment pipelines, covering best practices, code generation, and production monitoring for inference.

GGUF Quantization Guide

gguf-quantization

Orchestra-Research/AI-Research-SKILLs

Provides GGUF format and llama.cpp quantization workflows for efficient CPU/Apple Silicon inference, including conversion, quantization, and runtime commands suited for deploying large models without GPUs.

Groq Cost Tuning

groq-cost-tuning

jeremylongshore/claude-code-plugins-plus-skills

Guide to reducing Groq inference spend by routing requests to cost-effective models, trimming token usage, caching repeated calls, batching requests, and setting spend limits for Groq Cloud billing scenarios.

Groq Deployment Guide

groq-deploy-integration

jeremylongshore/claude-code-plugins-plus-skills

Deploy Groq-powered applications to Vercel, Fly.io, and Cloud Run while configuring secrets, streaming responses, and health checks for low-latency LLM inference.

Groq Enterprise RBAC

groq-enterprise-rbac

jeremylongshore/claude-code-plugins-plus-skills

Guide for configuring Groq enterprise SSO, token scoping, and org-wide constraints so teams can safely use the LPU inference API with model-level permissions, rate limits, spending caps, and automated key rotation.

Groq Performance Tuning

groq-performance-tuning

jeremylongshore/claude-code-plugins-plus-skills

Guides developers to optimize Groq API calls through model selection, streaming, semantic caching, and parallel orchestration so latency-sensitive applications get consistent sub-100ms inference.

Groq Reference Architecture

groq-reference-architecture

jeremylongshore/claude-code-plugins-plus-skills

Defines best-practice Groq deployment with tiered model routing, middleware, streaming pipelines, and fallback chains for ultra-fast LLM inference and production monitoring when launching new Groq integrations.

GroqCloud Automation Suite

groqcloud-automation

ComposioHQ/awesome-claude-skills

GroqCloud Automation orchestrates high-performance GroqCloud APIs through Composio, covering inference, chat completions, audio translation, and TTS voice selection for production workflows.

Inference Latency Profiler

inference-latency-profiler

jeremylongshore/claude-code-plugins-plus-skills

Automates inference latency profiler tasks in ML deployment scenarios, offering step-by-step guidance on model serving, MLOps pipelines, monitoring, and production optimization, generating production-ready code and validating outputs against best practices.

LLM Knowledge Distillation

knowledge-distillation

Orchestra-Research/AI-Research-SKILLs

Compress large language models via teacher-student distillation, covering temperature scaling, soft targets, reverse KLD, and response distillation so you can deploy smaller LLMs with GPT-4-level behavior and lower inference cost.

Language