nemo-evaluator-sdk
Orchestra-Research/AI-Research-SKILLs
NeMo Evaluator SDK runs containerized, enterprise-grade evaluations across 100+ benchmarks and 18+ harnesses (MMLU, HumanEval, GSM8K, safety, VLM) with multi-backend execution on Docker, Slurm HPC, or cloud platforms for reproducible LLM insights.