Download

Skill UI

Browse and discover 11170+ curated skills

All Development Artificial Intelligence Design & Creative Product & Business Data Science Marketing Soft Skills Productivity Engineering Languages

Search MBPP , found 1 results

Default Newest Most Downloaded

Code Model Evaluation and Benchmarking

evaluating-code-models

Orchestra-Research/AI-Research-SKILLs

A comprehensive toolkit for evaluating the performance of code generation models. It supports benchmarking against industry standards like HumanEval, MBPP, and MultiPL-E across multiple languages. Use this tool to quantitatively compare different LLM coding abilities and measure code synthesis quality using pass@k metrics.

1

Language