Login
Download
Skill UI
Browse and discover
9688+
curated skills
All
Development
Artificial Intelligence
Design & Creative
Product & Business
Data Science
Marketing
Soft Skills
Productivity
Engineering
Languages
Search
Reference-Free
, found
1
results
Default
Newest
Most Downloaded
Simple Preference Optimization for LLM Alignment
simpo-training
Orchestra-Research/AI-Research-SKILLs
112
SimPO (Simple Preference Optimization) is a state-of-the-art, reference-free method designed for aligning Large Language Models (LLMs) using human preference data. It is an efficient alternative to DPO and PPO, notably outperforming DPO without requiring a separate reference model. It is ideal for practitioners who need faster, simpler, and more resource-efficient fine-tuning for preference alignment.
View Details
1
Language
简体中文
English