AI benchmarking with ‘ball in a rotating figure’ … “Deep chic is healthier than open AI”

-

(Photo = X, Ivan Fioravanti)

Recently, the benchmarks that evaluate the performance of the unreal intelligence (AI) model are progressively diversified and evolved, and the test of ‘ball within the rotating figure’ is attracting attention. Specifically, because of this of the test, the claim that China’s deep chic’s open source ‘R1’ model is more efficient than Open AI’s ‘O1-Pro’.

TechCrunch introduced the benchmark that embodies a yellow ball that bounces in a slow rotating figure on the twenty fourth (local time).

This can be a test that evaluates how accurately and simulating the physics that bounces when the ball collides with the wall with a Python script.

On this simulation, where collision detection algorithms are necessary, the moment when the ball collides with the shapes should be accurately identified. If the algorithm is incorrect, the ball might be physically inaccurate, resembling going out of the border of the figure.

X (Twitter) user ‘N8 program’, a researcher at AI Startup Nus Research, said, “It took about two hours to program the ball bouncing within the rotating seven -piece from the start to the tip.” I made a decision to take care of it, and I needed to design the code firmly. ”

Some AI models showed good results on this benchmark. Specifically, the founding father of the Ivan Forvantic Core View said that the R1 model of China’s Deep Chic showed significantly better performance than the O1-Pro in Open AI.

Antropic’s ‘Claude 3.5 Sonnet’ and Google’s ‘Geminai 1.5 Pro’ also misunderstood the physical phenomenon, causing the results of the ball out of the figure.

Some users also reported that Google’s ‘Geminai 2.0 flash sinking’ model and ‘GPT-4O’ of Open AI successfully performed the evaluation in once.

Programming tests using bouncing balls and rotating shapes might be valid for evaluating the power of the AI ​​model, but it’s also said that AI benchmarks aren’t enough.

It’s identified that small changes within the prompt can depend on ‘luck’, resembling a big effect on the outcomes, in order that it just isn’t a criterion for objectively judging the programming ability of the AI ​​model.

Then again, this test is noted that testing the performance of the AI ​​model is increasingly difficult, and various methods are noticed.

Actually, many indicate that it is sort of difficult for almost all to differentiate between models. This benchmark is eye -catching because anyone can easily check the model’s ability.

By Park Chan, reporter cpark@aitimes.com

ASK ANA

What are your thoughts on this topic?
Let us know in the comments below.

0 0 votes
Article Rating
guest
0 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments

Share this article

Recent posts

0
Would love your thoughts, please comment.x
()
x