MS unveils math-specific inference technology… “Outperforms o1 performance with sLM”

-

(Photo = Shutterstock)

Microsoft (MS) announced a brand new technology, ‘rStar-Math’, which significantly improves the mathematical reasoning ability of Small Language Model (sLM). It was revealed that this technology dramatically improved sLM’s mathematical problem-solving ability and showed superior performance than OpenAI’s ‘o1-Preview’.

Researchers at Microsoft, Peking University, and Tsinghua University announced on the ninth (local time) that they may improve the mathematical inference performance of sLM. ‘rStar-Mass’ technical paperwas posted within the archive.

The core of rStar-Math is that it combines ‘Monte Carlo Tree Search (MCTS)’ with Chain of Considering (CoT), which solves problems step-by-step. MCTS solves complex mathematical problems by simplifying them step-by-step by choosing essentially the most promising path through simulation as an alternative of exploring all possibilities.

CoT case strengthened with Python code (Photo = arXiv)
CoT case strengthened with Python code (Photo = arXiv)

The researchers trained a ‘Policy Model’ based on 747,000 publicly available mathematical word problem data and created problem-solving steps. This policy model was at all times designed to explain the problem-solving process in natural language and Python code, and a ‘process preference model (PPM)’ was used to pick out the optimal step through the generation process. The reason is that the 2 models mutually improved their performance through 4 rounds of self-evolution.

Benchmark results (photo = arXiv)
Benchmark results (photo = arXiv)

This method was applied to sLMs equivalent to MS’s ‘Pi-3 Mini’ and Alibaba’s ‘Q1-1.5B’ and ‘Q1-7B’, and performance was greatly improved in all models.

Specifically, Q1-7B’s accuracy increased from 58.8% to 90.0% within the MATH benchmark, outperforming o1-Preview. As well as, within the American Mathematics Competition (AIME), he solved 53.3% of the issues and recorded a rating akin to the highest 20% of highschool students.

This method is a change from the present development method that focused on increasing performance by increasing the dimensions of the language model. It is alleged that rStar-Math, which emphasizes efficiency, has shown that sLM can match or surpass LLM.

The researchers said they plan to release rStar-Math’s code and data through GitHub and are currently conducting an internal review.

Reporter Park Chan cpark@aitimes.com

ASK DUKE

What are your thoughts on this topic?
Let us know in the comments below.

0 0 votes
Article Rating
guest
0 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments

Share this article

Recent posts

0
Would love your thoughts, please comment.x
()
x