ORPO

ORPO: Preference Optimization without the Supervised Positive-tuning (SFT) Step

A less expensive alignment method performing in addition to DPOThere are actually many methods to align large language models (LLMs) with human preferences. Reinforcement learning with human feedback (RLHF) was one in all the...

Recent posts

Popular categories

ASK ANA