arize-evaluator
github/awesome-copilot
This skill facilitates comprehensive LLM-as-judge evaluation within the Arize platform. Users can create, update, and manage evaluators (defining prompt templates, classification criteria, and models). It allows running evaluations against live spans, historical experiments, or continuous data streams. It supports advanced features like column mapping and fine-grained data granularity (span, trace, session) for rigorous model performance monitoring.