Weaktostrong

Weak-to-strong generalization

There are still necessary disanalogies between our current empirical setup and the last word problem of aligning superhuman models. For instance, it might be easier for future models to mimic weak human errors than...

Recent posts

Popular categories

ASK ANA