Home
About Us
Contact Us
Terms & Conditions
Privacy Policy
Search
Home
About Us
Contact Us
Terms & Conditions
Privacy Policy
overoptimization
Artificial Intelligence
Scaling laws for reward model overoptimization
In reinforcement learning from human feedback, it is not uncommon to optimize against a reward model trained to predict human preferences. Since the reward model is an imperfect proxy, optimizing its value an excessive...
ASK ANA
-
March 15, 2023
Recent posts
Faster Decoding with Any Assistant Model
December 23, 2025
Google's yr in review: 8 areas with research breakthroughs in 2025
December 23, 2025
Stop Retraining Blindly: Use PSI to Construct a Smarter Monitoring Pipeline
December 23, 2025
Easily Construct High quality-Tuning and Evaluation Datasets on the Hub — No Code Required
December 23, 2025
8 areas with research breakthroughs in 2025
December 23, 2025
Popular categories
Artificial Intelligence
9782
New Post
1
My Blog
1
0
0