“The explanation for reasoning is ‘O1’ ahead, but ‘R1’ is more useful for actual use.

-

(Photo = Shutterstock)

Although the ‘O1’ of the Open AI is a little more inferred than ‘Deep Chic-R1’, it’s analyzed that R1 is more useful because it could be seen when making a mistake. Even when you may have a high rating within the benchmark, each have an issue in actual use, and it is rather vital for users to know the reasoning process to discover and modify the model’s mistake.

Enterprise Beat introduced the outcomes of comparing O1 and R1 on the thirty first (local time) using the Perplexity Pro Search.

The goal of this experiment is to transcend the variety of numerical checks in line with the benchmark, but to envision which one is more useful within the actual model use.

The primary test is to envision if the model can calculate the ROI.

Suppose the user invested $ 140 in Magnipicent 7 (Alphabet, Amazon, Apple, Meta, Microsoft, Nvidia, Tesla) on the primary day of January -December 2024. He asked the model to calculate the portfolio value based on the present date.

For this work, the model must bring the stock price information on the primary day of every month, invest $ 20 in each stock, and add it to the portfolio value based on the present stock value of the present date.

Nonetheless, each models did not answer the proper answer. O1 derived an inventory of stock prices and calculations, but did not calculate the precise calculation and said, “There is no such thing as a ROI.” R1, then again, made a mistake in investing only in January 2024 and calculating only the yield for January 2025.

That is an example of O1’s reasoning ability than R1.

Nonetheless, O1 didn’t explain the best way to reach the result. As an alternative, the R1 showed that the search engine in Purllex City didn’t bring stock price data properly. This shows that models based on search skills fail because of incorrect search results relatively than lack of performance.

Due to this fact, the second test conducted the identical experiment as the primary by the model inputting the info file directly as a substitute of bringing information from the net. The files included each stock, the value of the primary day of each month in January -December 2024, and the HTML table with the ultimate stock price. Nonetheless, the info didn’t summarize the info, so the model may very well be organized and chosen appropriately.

Again, O1 showed somewhat higher ability than the R1, but each models didn’t provide accurate answers.

O1 succeeded in extracting data from files, but responded that ROI calculations were manually in tools like Excel.

R1 failed, nevertheless it provided useful information in the method. For instance, the model analyzed the stock HTML data and extracted the needed information, and was capable of calculate the ultimate value in line with the most recent stock prices after monthly investment calculation and the sum. Nonetheless, it was confirmed that the ultimate value was only within the calculation process and never included within the actual answer. As well as, NVIDIA’s 10 -to -one stock division on June 10, 2024 was found to be incorrect.

Due to this fact, R1 provided more information in the rationale, understanding the boundaries of the model and knowing the best way to modify the info to get the outcomes.

The third test is to offer the model with the statistics of 4 NBA centers to the model, and find probably the most improved player within the 2023/2024 season than the 2022/2023 season. That is the important thing to comparing multiple data points, which incorporates a player who entered the NBA in 2023. Due to this fact, this player should be excluded.

This experiment is comparatively easy. It is because the statistics of NBA players are handled within the news and community, and are included in Wikipedia and NBA profiles. Each models found the fitting answer, Janis Adetokunbo, but there have been some differences depending on the info used. Specifically, the 2 models ignored the indisputable fact that they were included, and so they brought statistics of the time he was within the European league.

Nonetheless, the R1 provided a touch to switch the prompt by providing a comparative table and a link to the reply. When the FG%of the NBA season was clearly identified, R1 excluded Rookie from the outcomes.

As such, O1 and R1 showed that there’s a problem in actual use, unlike the benchmarks which have excellent reasoning performance. Specifically, it’s an evaluation that an in depth and specific prompt is important to get the proper answer.

It’s also explained that the R1, which shows the reasoning process on this process, shines. The essential reasoning ability is that O1 is ahead of the ahead of the R1, but R1 is more useful for the actual model.

Experts also indicate that this is very important. Domestic startup Today said, “Deep chic models can expose the reasoning process and see what logic within the model can increase reliability.”

Sam Altman Open AI CEO also said in a user chat event held in Reddit, saying, “We’re searching for ways to point out more of the model’s pondering process.”

By Park Chan, reporter cpark@aitimes.com

ASK ANA

What are your thoughts on this topic?
Let us know in the comments below.

0 0 votes
Article Rating
guest
0 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments

Share this article

Recent posts

0
Would love your thoughts, please comment.x
()
x