Construct with Kimi K2.5 Multimodal VLM Using NVIDIA GPU-Accelerated Endpoints

Kimi K2.5 is the latest open vision language model (VLM) from the Kimi family of models. Kimi K2.5 is a general-purpose multimodal model that excels in current high-demand tasks similar to agentic AI workflows, chat, reasoning, coding, mathematics, and more.

The model was trained using the open source Megatron‑LM framework. Megatron-LM provides accelerated computing for scalability and GPU optimization through several varieties of parallelism (tensor, data, sequence) for training massive transformer-based models.

This model architecture builds on leading state-of-the-art large open models for efficiency and capability. The model consists of 384 experts with a single dense layer, which allows for smaller-sized experts and specialized routing for various modalities. Kimi K2.5 achieves a 3.2% activation rate of parameters per token.

Kimi K2.5
Modalities	Text, image, video
Total parameters	1T
Lively parameters	32.86B
Activation rate	3.2%
Input context length	262K
Additional configuration information
# experts	384
# shared experts	1
# experts per token	8
# layers	61 (1 dense, 60 MoE)
# attention heads	64
Vocab size	~164K

Table 1. Specifications and configuration details for the Kimi K2.5 model

For vision capability, the big training vocabulary of 164K incorporates vision-specific tokens. Kimi created the MoonViT3d Vision Tower for the visual processing component of this model, which converts images and video frames into embeddings.

Illustration of the Kimi K2.5 Vision Pipeline, which consists of a Vision Tower (MoonViT3d) (left), a Visual and Text Embedding Merger (center), and a Language Model (right). — *Figure 1. Kimi K2.5 vision pipeline*

Construct with Kimi K2.5 Multimodal VLM Using NVIDIA GPU-Accelerated Endpoints

Construct with NVIDIA GPU-accelerated endpoints

Deploying with vLLM

Effective-tuning with NVIDIA NeMo Framework

Start with Kimi K2.5

What are your thoughts on this topic?
Let us know in the comments below.

Share this article

Recent posts

Claude AI Powers First AI-Planned Mars Rover Drive

AWS vs. Azure: A Deep Dive into Model Training – Part 2

A Complete Guide to Audio Datasets

Brian Hedden named co-associate dean of Social and Ethical Responsibilities of Computing

Voxtral transcribes on the speed of sound.

Construct with Kimi K2.5 Multimodal VLM Using NVIDIA GPU-Accelerated Endpoints

Construct with NVIDIA GPU-accelerated endpoints

Deploying with vLLM

Effective-tuning with NVIDIA NeMo Framework

Start with Kimi K2.5

What are your thoughts on this topic? Let us know in the comments below.

Share this article

Recent posts

What are your thoughts on this topic?
Let us know in the comments below.