Home Artificial Intelligence Improving Mathematical Reasoning with Process Supervision

Improving Mathematical Reasoning with Process Supervision

1
Improving Mathematical Reasoning with Process Supervision

We have trained a model to realize a recent state-of-the-art in mathematical problem solving by rewarding each correct step of reasoning (“process supervision”) as a substitute of simply rewarding the right final answer (“end result supervision”). Along with boosting performance relative to end result supervision, process supervision also has a very important alignment profit: it directly trains the model to supply a chain-of-thought that’s endorsed by humans.

1 COMMENT

LEAVE A REPLY

Please enter your comment!
Please enter your name here