Hugging Face Leaderboard's first model to surpass 80 points on average appears… “Rapid development of open source”

Artificial Intelligence

Hugging Face Leaderboard's first model to surpass 80 points on average appears… “Rapid development of open source”

admin

February 12, 2024

Hugging Face Leaderboard's first model to surpass 80 points on average appears… “Rapid development of open source”

‘Smaug’, the dragon who protects the treasure (Photo = Abacus AI)

For the primary time, a model with a mean rating of over 80 has appeared on the Hugging Face open source large language model (LLM) leaderboard. As well as, the open source camp has been gaining momentum for the reason that starting of the yr, with Alibaba releasing the most recent version of 'Q1'.

VentureBeat is a recent American artificial intelligence (AI) startup. Abacus AI's model 'Smaug-72B'It was reported that took first place on the Hugging Face leaderboard.

In keeping with this, Smaug-72B has ▲language comprehension (MMLU) 77.15 points ▲common sense ability (HellaSwag) 89.27 points ▲reasoning ability (ARC) 76.02 ▲common sense reasoning (WinoGrade) 85.05 points ▲mathematical reasoning (GSM8K) 78.7 points ▲hallucination prevention ability ( TruthfulQA) recorded a mean rating of 80.48 with 76.67 points. That is the primary time that the common rating has exceeded 80 on the open source leader board.

This model is a fine-tuned model of Alibaba's open source model 'Qwen-72B', which became popular at the top of last yr. Q1 scored a mean of 73.6 points within the benchmark.

The corporate is headquartered in San Francisco and led by Bindu Reddy, former general manager of AI at AWS. As well as, there are major engineers who’ve worked at Google or Uber. And the model name Smaug is taken from the dragon that protects the treasure in JRR Tolkien's novel 'The Hobbit'.

Abacus AI revealed the benchmark and announced that Smaug outperforms not only Q1, but in addition OpenAI's 'GPT-3.5', Google's 'Geminii Pro', and Mistral AI's 'Mistral Medium'. By field, it only lagged behind Mistral-Medium in common sense reasoning, but recorded the very best rating in all other areas.

It also surpasses the common rating of 74.2 recorded by Upstage's 'Solar', which was released last December and ranked first on the leaderboard. Nevertheless, it have to be taken into consideration that Solar's parameters are lower than one-sixth of Smaug's.

Smaug-72B benchmark results (Photo = Abacus AI)

Bindi Reddy, CEO of Abacus AI, said through revealed. He predicted, “I’ll explain in my paper why I got excellent scores, especially in reasoning and arithmetic.”

Prior to this, last week Alibaba also ‘Q1 1.5’, the most recent open source modelwas released. It consists of 6 parameters from 0.5B model for local AI to 72B, and performance is enhanced with 32k context. As well as, a latest large-scale vision language model 'Qwen-VL-Max' was also released.

Alibaba also announced that Q1.5 is the perfect off-source model, surpassing 'Claude-2.1', Mistral, and GPT-3.5.

Benchmark results of QOne 1.5 (Photo = Alibaba)

As well as, the AI community is becoming more lively as a series of powerful open source models, including Miku, which was recently leaked, appear.

Sahamor, an AI influencer and analyst, expressed surprise in a LinkedIn post, saying, “Only a yr ago, we were enthusiastic about models like 'Dolly', but now the performance is catching up with closed models.”

Enterprise Beat also predicted, “Smaug and Q1.5 are only the most recent examples showing the rapid and surprising development of open source AI,” and added, “It can be difficult for Smaug to stay at the highest of the Hugging Face leaderboard for a very long time.”

Reporter Lim Da-jun ydj@aitimes.com

LEAVE A REPLY Cancel reply