Kimi K2.5 is the latest open vision language model (VLM) from the Kimi family of models. Kimi K2.5 is a general-purpose multimodal model that excels in current high-demand tasks similar to agentic AI workflows, chat, reasoning, coding, mathematics, and more. Â
The model was trained using the open source Megatron‑LM framework. Megatron-LM provides accelerated computing for scalability and GPU optimization through several varieties of parallelism (tensor, data, sequence) for training massive transformer-based models. Â
This model architecture builds on leading state-of-the-art large open models for efficiency and capability. The model consists of 384 experts with a single dense layer, which allows for smaller-sized experts and specialized routing for various modalities. Kimi K2.5 achieves a 3.2% activation rate of parameters per token.Â
| Kimi K2.5Â | |
| Modalities | Text, image, video |
| Total parameters | 1T |
| Lively parameters | 32.86B |
| Activation rate | 3.2% |
| Input context length | 262K |
| Additional configuration information | |
| # experts | 384 |
| # shared experts | 1 |
| # experts per token | 8 |
| # layers | 61 (1 dense, 60 MoE) |
| # attention heads | 64 |
| Vocab size | ~164K |
For vision capability, the big training vocabulary of 164K incorporates vision-specific tokens. Kimi created the MoonViT3d Vision Tower for the visual processing component of this model, which converts images and video frames into embeddings.Â


Construct with NVIDIA GPU-accelerated endpointsÂ
You may start constructing with Kimi K2.5 with free access for prototyping to GPU-accelerated endpoints on construct.nvidia.com as a part of the NVIDIA Developer Program. You should use your personal data within the browser experience. NVIDIA NIM microservices, containers for production inference, are coming soon.Â
You also can use the NVIDIA-hosted model through the API, free with registration within the NVIDIA Developer Program. Â
import requests
invoke_url = "https://integrate.api.nvidia.com/v1/chat/completions"
headers = {
"Authorization": "Bearer $NVIDIA_API_KEY",
"Accept": "application/json",
}
payload = {
"messages": [
{
"role": "user",
"content": ""
}
],
"model": "moonshotai/kimi-k2.5",
"chat_template_kwargs": {
"considering": True
},
"frequency_penalty": 0,
"max_tokens": 16384,
"presence_penalty": 0,
"stream": True,
"temperature": 1,
"top_p": 1
}
# re-use connections
session = requests.Session()
response = session.post(invoke_url, headers=headers, json=payload)
response.raise_for_status()
response_body = response.json()
print(response_body)
To make the most of tool calling, simply define an array of OpenAI compatible tools so as to add to the chat completions tools parameter.
Deploying with vLLMÂ
When deploying models with the vLLM serving framework, use the next instructions. For more information, see the vLLM recipe for Kimi K2.5.
$ uv venv
$ source .venv/bin/activate
$ uv pip install -U vllm --pre
--extra-index-url https://wheels.vllm.ai/nightly/cu129
--extra-index-url https://download.pytorch.org/whl/cu129
--index-strategy unsafe-best-match
Effective-tuning with NVIDIA NeMo FrameworkÂ
Kimi K2.5 may be customized and fine-tuned with the open source NeMo Framework using NeMo AutoModel library to adapt the model for domain-specific multimodal tasks, agentic workflows, and enterprise reasoning use cases.Â
NeMo Framework is a collection of open libraries enabling scalable model pretraining and post-training, including supervised fine-tuning, parameter-efficient methods, and reinforcement learning for models of all sizes and modalities.Â
NeMo AutoModel is a PyTorch Distributed native training library inside NeMo Framework that gives high throughput training directly on the Hugging Face checkpoint without the necessity for conversion. This provides a light-weight and versatile tool for developers and researchers to do rapid experimentation on the most recent frontier models.Â
Try fine-tuning Kimi K2.5 with the NeMo AutoModel recipe.Â
Start with Kimi K2.5Â Â
From data center deployments on NVIDIA Blackwell to the fully managed enterprise NVIDIA NIM microservice, NVIDIA offers solutions to your integration of Kimi K2.5. To start, try the Kimi K2.5 model page on Hugging Face and Kimi API Platform, and test Kimi K2.5 on the construct.nvidia.com playground.
