Login
Download
Skill UI
Browse and discover
10245+
curated skills
All
Development
Artificial Intelligence
Design & Creative
Product & Business
Data Science
Marketing
Soft Skills
Productivity
Engineering
Languages
Search
Health Check
, found
17
results
Default
Newest
Most Downloaded
Deploying GPU Inference on CoreWeave
coreweave-deploy-integration
jeremylongshore/claude-code-plugins-plus-skills
146
This skill provides comprehensive guidance for deploying and managing GPU-accelerated AI inference services on CoreWeave Kubernetes (CKS). It covers best practices including containerization using NVIDIA CUDA base images, configuring specific GPU resource limits (A100/H100), setting up robust health checks, and executing controlled rolling updates. Ideal for managing multi-model inference and scaling demanding AI workloads in a cloud environment.
View Details
CoreWeave Incident Troubleshooting Runbook
coreweave-incident-runbook
jeremylongshore/claude-code-plugins-plus-skills
430
This runbook provides structured steps for responding to critical production incidents on the CoreWeave platform. Use it when dealing with GPU workload failures, inference service outages, or general Kubernetes resource issues. It guides users through checking pod status, node health, and diagnosing common model loading errors to ensure rapid service recovery.
View Details
Fly.io Deployment and Edge Computing Strategy
flyio-deploy-integration
jeremylongshore/claude-code-plugins-plus-skills
208
This skill provides advanced integration for deploying applications on Fly.io, focusing on edge computing architectures. It covers building production-ready Docker images, configuring `fly.toml` for services and health checks, and executing sophisticated deployment strategies like blue-green and canary releases. It ensures reliable, scalable deployment across multiple regions, including automated rollbacks and micro-VM optimization.
View Details
Monitoring and Automating Fly.io Deployments
flyio-webhooks-events
jeremylongshore/claude-code-plugins-plus-skills
131
This guide provides methods for robust monitoring and automation tailored for services deployed on Fly.io. It covers key areas including polling the Machines API to detect state changes, implementing comprehensive health check endpoints, processing structured logs using tools like `jq`, and automating deployment notifications within CI/CD pipelines. Essential for achieving high availability and reliable operations.
View Details
Robust OneNote Container Deployment
onenote-deploy-integration
jeremylongshore/claude-code-plugins-plus-skills
246
Provides a comprehensive, production-ready deployment solution for OneNote integrations in containerized environments. This skill addresses critical issues like MSAL token persistence (supporting both file and Redis caches), implementing deep health/readiness checks that validate actual Graph API connectivity, and managing graceful shutdowns. Ideal for containerizing OneNote services for robust, scalable production use with Docker and Kubernetes.
View Details
Prev
1
2
Language
简体中文
English