Download

Skill UI

Browse and discover 6006+ curated skills

All Development Artificial Intelligence Design & Creative Product & Business Data Science Marketing Soft Skills Productivity Engineering Languages

Search Fast Inference , found 8 results

Default Newest Most Downloaded

Groq Reference Architecture

groq-reference-architecture

jeremylongshore/claude-code-plugins-plus-skills

Defines best-practice Groq deployment with tiered model routing, middleware, streaming pipelines, and fallback chains for ultra-fast LLM inference and production monitoring when launching new Groq integrations.

Llama cpp CPU Inference

Orchestra-Research/AI-Research-SKILLs

Deploy llama.cpp to run LLM inference across CPUs, Apple Silicon, and non-NVIDIA GPUs, making it ideal for edge devices or CUDA-free setups with GGUF quantization for faster, lower-memory results.

LlamaGuard Content Moderation

Orchestra-Research/AI-Research-SKILLs

LlamaGuard is Meta’s 7–8B safety-specialized LLM that filters both prompts and responses by classifying six threat categories, enabling fast inference via vLLM/SageMaker and integration into NeMo Guardrails for end-to-end moderation.

Mamba Selective State Models

mamba-architecture

Orchestra-Research/AI-Research-SKILLs

Mamba provides selective state-space models with O(n) inference complexity, letting you handle million-token sequences faster than transformers while skipping KV caches and benefiting from a hardware-aware design. Use it for long-context language modeling, streaming applications, and scalable low-memory sequence learners.

Modal Serverless GPU

modal-serverless-gpu

Orchestra-Research/AI-Research-SKILLs

Modal's serverless GPU cloud platform lets teams run ML training, inference, and batch jobs with pay-per-second pricing, automatic scaling, Python-native infra definitions, fast cold starts, and container caching.

Structured Text Generation

Orchestra-Research/AI-Research-SKILLs

Outlines guarantees valid JSON/XML/code generation via CFG-driven FSM filtering, Pydantic schemas, and fast local or API models (Transformers, vLLM, llama.cpp), making structured inference safe and high-performance.

TensorRT LLM Optimizer

Orchestra-Research/AI-Research-SKILLs

Optimizes LLM inference on NVIDIA GPUs, delivering 10-100× faster throughput, sub-10ms latency, and multi-GPU scaling with FP8/INT4 quantization, in-flight batching, and production-ready serving for real-time deployments.

Phylogenetics Pipeline Toolkit

K-Dense-AI/claude-scientific-skills

Build and analyze phylogenetic trees with MAFFT, IQ-TREE 2, FastTree, and visualize via ETE3 or FigTree, covering alignment, trimming, ML inference, and microbial/viral evolutionary studies.

1

Language