advanced-evaluation
sickn33/antigravity-awesome-skills
A comprehensive guide to building production-grade evaluation systems for Large Language Models (LLMs). This skill covers advanced methodologies, including Direct Scoring and Pairwise Comparison, alongside critical techniques for mitigating systematic biases (e.g., Position Bias, Length Bias). Learn how to select appropriate metrics and structure prompts to ensure reliable, consistent, and objective quality assessment across various AI applications.