Download

Skill UI

Browse and discover 9126+ curated skills

All Development Artificial Intelligence Design & Creative Product & Business Data Science Marketing Soft Skills Productivity Engineering Languages

Search RAGAS , found 1 results

Default Newest Most Downloaded

LLM Chain Evaluation and Regression Testing

langchain-eval-harness

jeremylongshore/claude-code-plugins-plus-skills

This harness provides comprehensive, reproducible evaluation pipelines for complex LLM chains and agents (LangChain/LangGraph 1.0). It integrates golden dataset management, LangSmith evaluation, RAGAS metrics, deepeval LLM-as-judge, and structured agent trajectory analysis. Use it when establishing quality benchmarks for new chains, diagnosing performance regressions after model switches, or implementing CI/CD gates to prevent quality drops.

1

Language