cohere-performance-tuning
jeremylongshore/claude-code-plugins-plus-skills
A comprehensive guide for optimizing Cohere API v2 performance. Learn advanced strategies to improve throughput, reduce latency, and manage costs when using Chat, Embed, and Rerank endpoints. Techniques covered include dynamic model selection based on latency budgets, implementing streaming for faster perceived response times, efficient batch processing for embeddings, vector compression, and robust caching mechanisms.