Login
Download
Skill UI
Browse and discover
6006+
curated skills
All
Development
Artificial Intelligence
Design & Creative
Product & Business
Data Science
Marketing
Soft Skills
Productivity
Engineering
Languages
Search
Fast Inference
, found
8
results
Default
Newest
Most Downloaded
Groq Reference Architecture
groq-reference-architecture
jeremylongshore/claude-code-plugins-plus-skills
54
Defines best-practice Groq deployment with tiered model routing, middleware, streaming pipelines, and fallback chains for ultra-fast LLM inference and production monitoring when launching new Groq integrations.
View Details
Llama cpp CPU Inference
llama-cpp
Orchestra-Research/AI-Research-SKILLs
382
Deploy llama.cpp to run LLM inference across CPUs, Apple Silicon, and non-NVIDIA GPUs, making it ideal for edge devices or CUDA-free setups with GGUF quantization for faster, lower-memory results.
View Details
LlamaGuard Content Moderation
llamaguard
Orchestra-Research/AI-Research-SKILLs
441
LlamaGuard is Meta’s 7–8B safety-specialized LLM that filters both prompts and responses by classifying six threat categories, enabling fast inference via vLLM/SageMaker and integration into NeMo Guardrails for end-to-end moderation.
View Details
Mamba Selective State Models
mamba-architecture
Orchestra-Research/AI-Research-SKILLs
491
Mamba provides selective state-space models with O(n) inference complexity, letting you handle million-token sequences faster than transformers while skipping KV caches and benefiting from a hardware-aware design. Use it for long-context language modeling, streaming applications, and scalable low-memory sequence learners.
View Details
Modal Serverless GPU
modal-serverless-gpu
Orchestra-Research/AI-Research-SKILLs
94
Modal's serverless GPU cloud platform lets teams run ML training, inference, and batch jobs with pay-per-second pricing, automatic scaling, Python-native infra definitions, fast cold starts, and container caching.
View Details
Structured Text Generation
outlines
Orchestra-Research/AI-Research-SKILLs
137
Outlines guarantees valid JSON/XML/code generation via CFG-driven FSM filtering, Pydantic schemas, and fast local or API models (Transformers, vLLM, llama.cpp), making structured inference safe and high-performance.
View Details
TensorRT LLM Optimizer
tensorrt-llm
Orchestra-Research/AI-Research-SKILLs
376
Optimizes LLM inference on NVIDIA GPUs, delivering 10-100× faster throughput, sub-10ms latency, and multi-GPU scaling with FP8/INT4 quantization, in-flight batching, and production-ready serving for real-time deployments.
View Details
Phylogenetics Pipeline Toolkit
phylogenetics
K-Dense-AI/claude-scientific-skills
128
Build and analyze phylogenetic trees with MAFFT, IQ-TREE 2, FastTree, and visualize via ETE3 or FigTree, covering alignment, trimming, ML inference, and microbial/viral evolutionary studies.
View Details
1
Language
简体中文
English