Microsoft (MS) announced a brand new technology, ‘rStar-Math’, which significantly improves the mathematical reasoning ability of Small Language Model (sLM). It was revealed that this technology dramatically improved sLM’s mathematical problem-solving ability and showed superior performance than OpenAI’s ‘o1-Preview’.
Researchers at Microsoft, Peking University, and Tsinghua University announced on the ninth (local time) that they may improve the mathematical inference performance of sLM. ‘rStar-Mass’ technical paperwas posted within the archive.
The core of rStar-Math is that it combines ‘Monte Carlo Tree Search (MCTS)’ with Chain of Considering (CoT), which solves problems step-by-step. MCTS solves complex mathematical problems by simplifying them step-by-step by choosing essentially the most promising path through simulation as an alternative of exploring all possibilities.
The researchers trained a ‘Policy Model’ based on 747,000 publicly available mathematical word problem data and created problem-solving steps. This policy model was at all times designed to explain the problem-solving process in natural language and Python code, and a ‘process preference model (PPM)’ was used to pick out the optimal step through the generation process. The reason is that the 2 models mutually improved their performance through 4 rounds of self-evolution.
This method was applied to sLMs equivalent to MS’s ‘Pi-3 Mini’ and Alibaba’s ‘Q1-1.5B’ and ‘Q1-7B’, and performance was greatly improved in all models.
Specifically, Q1-7B’s accuracy increased from 58.8% to 90.0% within the MATH benchmark, outperforming o1-Preview. As well as, within the American Mathematics Competition (AIME), he solved 53.3% of the issues and recorded a rating akin to the highest 20% of highschool students.
This method is a change from the present development method that focused on increasing performance by increasing the dimensions of the language model. It is alleged that rStar-Math, which emphasizes efficiency, has shown that sLM can match or surpass LLM.
The researchers said they plan to release rStar-Math’s code and data through GitHub and are currently conducting an internal review.
Reporter Park Chan cpark@aitimes.com