NVIDIA joins the frontier model competition…”Launches open source LMM comparable to GPT-4o”

-

(Photo = Hugging Face)

NVIDIA has unveiled a frontier-class large multimodal model (LMM). Specifically, it’s attracting extraordinary attention by declaring competition with OpenAI’s ‘GPT-4o’.

VentureBeat reported on the first (local time) that Nvidia has launched an LMM with 72 billion parameters. ‘NVLM-D-72B’ released as open source to Hugging FaceIt was reported that it was done.

This model was released on the seventeenth of last month Papers via Archivesannounced. Nevertheless, because there was no separate announcement, it didn’t receive attention, but recently, it has received attention attributable to a series of favorable reviews from officials on X (Twitter) and other platforms.

“We introduce an LMM that achieves state-of-the-art results on visual language tasks,” the researchers wrote of their paper. “It competes with leading proprietary and open access models resembling GPT-4o.” In addition they promised to supply model weights publicly and to make training code public.

Impressive benchmark results were also revealed. In most benchmarks, it shows comparable performance to GPT-4o, ‘Claude 3.5 Sonnet’, ‘Gemini 1.5 Pro’, and ‘Rama 3-V 405B’, and is superior in visual query answering (VQA v2) and optical character recognition (OCR). Obtained the very best rating.

Specifically, it was emphasized that performance in text-only tasks is improved after multimodal training. On this case, while other models showed poor text performance, NVLM-D-72B increased accuracy by a median of 4.3 points on key text benchmarks.

Benchmark results (photo = arXiv)
Benchmark results (photo = arXiv)

When the model was revealed, experts gave favorable reviews. An AI researcher named Phil exclaimed on

As NVIDIA releases such a strong model as open source, it’s predicted that research within the LMM field, which has been the exclusive domain of closed corporations, will speed up. Meta also unveiled LMM ‘Rama 3.2’ with parameters 11B and 90B through the ‘Connect’ event on the twenty fifth.

Specifically, attention is drawn to the undeniable fact that NVIDIA, which has focused on the platform, has joined the launch of the Frontier model. NVIDIA has also released large language model (LLM) models and published many papers, but most of them have focused on on-device models optimized for GPUs, frameworks that support the distribution of other models, and artificial data generation models.

In response to this, Enterprise Beat commented, “Nvidia has taken a shot on the AI ​​industry,” and “the true impact of this model will probably be revealed in the approaching months.”

Reporter Lim Da-jun ydj@aitimes.com

ASK ANA

What are your thoughts on this topic?
Let us know in the comments below.

0 0 votes
Article Rating
guest
0 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments

Share this article

Recent posts

0
Would love your thoughts, please comment.x
()
x