[1월 1주] Leaderboard Season 2, evaluation progressed to 86%… Top overseas developers with ‘Gemma 2’

-

As of the third, open Ko-LLM Leaderboard Season 2 top rank (Photo = Upstage, NIA)

‘Open Ko-LLM Leaderboard Season 2’ has entered the official opening countdown, completing the evaluation of 86% of all goal models. Amongst these, the most recent models from overseas developers based on ‘Gemma 2’ took the highest spot.

Upstage (CEO Kim Seong-hoon) announced that as of the third, it had accomplished the evaluation of 1,089 out of 1,250 models, achieving a progress rate of 86.4%. The variety of models awaiting evaluation has now been reduced to 173.

Amongst these, meaningful movements are continuing within the leaderboard rankings. A model with the very best average rating exceeding 50 points appeared in November last 12 months, and a model with a 55-point rating appeared two months later. Models reminiscent of 1st place Nicholas Bierbauer, 2nd place Byron Everson, and third place Unthroat AI all scored within the 55-point range.

Specifically, these overseas developers’ models all recorded 70 points within the ‘Ko-GSM8K’ category, which determines elementary school math skills, widening the gap with other models. Beerbower’s model (nbeerbower/gemma2-gutenberg-27B) scores a whopping 71.72 points.

In other words, models from overseas developers ranked high in Korean math skills with overwhelming performance. It is usually price noting that each one 1st to third places were based on Google’s Gemma 2.

It is a change from last 12 months’s Season 1, where ‘Solar’ and ‘Rama’ were the most important characters. Along with Gemma 2, the present top-ranking base models are mostly models that appeared in the midst of last 12 months, reminiscent of Alibaba’s ‘Q1 2.5’.

Upstage said, “Season 2 is clearly showing a special aspect from last 12 months’s Season 1,” and added, “Because the Korean language performance of open source models reminiscent of Gemma and Q1 has improved overall, overseas developers are also expected to have achieved good results attributable to this influence.” “I can see it,” he analyzed.

Amongst domestic corporations, Link Bricks, East Soft, Yanolja, and T3Q ranked high.

Details of the leaderboard co-hosted by Upstage and the National Intelligence Service Agency (NIA) are as follows: NIA homepageme Hugging Face HomepageYou may check it here.

Reporter Jang Se-min semim99@aitimes.com

ASK DUKE

What are your thoughts on this topic?
Let us know in the comments below.

0 0 votes
Article Rating
guest
0 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments

Share this article

Recent posts

0
Would love your thoughts, please comment.x
()
x