Login
Download
Skill UI
Browse and discover
9191+
curated skills
All
Development
Artificial Intelligence
Design & Creative
Product & Business
Data Science
Marketing
Soft Skills
Productivity
Engineering
Languages
Search
MBPP
, found
1
results
Default
Newest
Most Downloaded
Code Model Evaluation and Benchmarking
evaluating-code-models
Orchestra-Research/AI-Research-SKILLs
200
A comprehensive toolkit for evaluating the performance of code generation models. It supports benchmarking against industry standards like HumanEval, MBPP, and MultiPL-E across multiple languages. Use this tool to quantitatively compare different LLM coding abilities and measure code synthesis quality using pass@k metrics.
View Details
1
Language
简体中文
English