Login
Download
Skill UI
Browse and discover
7087+
curated skills
All
Development
Artificial Intelligence
Design & Creative
Product & Business
Data Science
Marketing
Soft Skills
Productivity
Engineering
Languages
Search
Test Framework
, found
2
results
Default
Newest
Most Downloaded
Agent Evaluation for LLM Systems
agent-evaluation
sickn33/antigravity-awesome-skills
153
This framework addresses the gap between academic benchmarks and real-world production performance for LLM agents. It teaches advanced quality engineering practices, such as behavioral contract testing, adversarial testing, and statistical analysis, ensuring agents maintain reliability and robustness when deployed. It helps developers move beyond simple pass/fail rates to comprehensive capability assessment.
View Details
Causal Intervention for PyTorch Models
pyvene-interventions
Orchestra-Research/AI-Research-SKILLs
319
Pyvene is a declarative framework designed for performing causal interventions on PyTorch neural networks. It allows researchers to conduct advanced experiments such as activation patching, causal tracing (ROME-style), and interchange intervention training (IIT). Use this library when you need to test causal hypotheses about model behavior, deeply interpret model components, or ensure reproducibility in advanced AI research.
View Details
1
Language
简体中文
English