Open R1: Update #4

Welcome DeepSeek-V3 0324

This week, a latest model from DeepSeek silently landed on the Hub. It’s an updated version of DeepSeek-V3, the bottom model underlying the R1 reasoning model. There isn’t much information shared yet on this latest model, but we do know just a few things!

What we all know thus far

The model has the identical architecture as the unique DeepSeek-V3 and now also comes with an MIT license, while the previous V3 model had a custom model license. The main target of this model release was on improving the instruction following in addition to code and math capabilities. Let’s take a look!

How good is it?

The DeepSeek team has evaluated the model on a variety of math and coding tasks and we will see the model’s strong capabilities in comparison with other frontier models:

Clearly, the model plays in the highest league: often on par with GPT-4.5 and usually stronger than Claude-Sonnet-3.7.

To summarise the model has seen significant improvements across benchmarks

MMLU-Pro: 75.9 → 81.2 (+5.3) (An excellent benchmark for overall understanding)
GPQA: 59.1 → 68.4 (+9.3)
AIME: 39.6 → 59.4 (+19.8) (proxy for MATH capabilities)
LiveCodeBench: 39.2 → 49.2 (+10.0) (indicator of coding abilities)

Specifically, within the model card the DeepSeek mentions targeted improvements in the next areas:

Front-End Web Development
- Improved executability of the code
- More aesthetically pleasing web pages and game front-ends
Chinese Writing Proficiency
- Enhanced style and content quality
  - Aligned with the R1 writing style
  - Higher quality in medium-to-long-form writing
- Feature Enhancements
  - Improved mutli-turn interactive rewriting
  - Optimized translation quality and letter writing
Chinese Search Capabilities
- Enhanced report evaluation requests with more detailed outputs
Function Calling Improvements
- Increased accuracy in Function Calling, fixing issues in previous V3 versions

So the query might pop-up: how did they really do that? Let’s speculate a bit!

How did they do it?

Given the naming and architecture it’s fairly secure to assume that the brand new model is predicated on the previous V3 model and trained on top of it. There are two possible areas how they improved the models:

Continual pretraining: Starting with the V3 model it’s possible to proceed the pretraining process with a) newer, more up-to-date data and b) use data that has been higher curated and thus higher quality. This can improve the factuality on recent events and improve the capabilities generally.
Improved post-training: Especially within the era of instruction following and magnificence post-training plays crucial role. Likely they improved the post-training data mix and possibly even the algorithm.

Until the team releases a technical report we don’t know needless to say what they tweaked however the post-training pipeline is sort of likely and potentially also adding a little bit of pretraining. So have a take a look at the best way to use the models next!

Learn how to use the model

Inference Providers

You should utilize Hugging Face’s Inference Providers to quickly experiment with this model. It’s available through Fireworks, Hyperbolic, and Novita.

Here’s an example using the huggingface_hub library. You can too use the OpenAI client library like on this example.

from huggingface_hub import InferenceClient

client = InferenceClient(
    provider="fireworks-ai",
    
)

messages = [
    {
        "role": "user",
        "content": "My first is second in line; I send shivers up your spine; not quite shining bright. I glitter in the light."
    }
]

completion = client.chat.completions.create(
    model="deepseek-ai/DeepSeek-V3-0324",
    messages=messages,
    temperature=0.3,
)

print(completion.selections[0].message['content'])

Text Generation Inference

TGI supports running DeepSeek V3-0324 with its latest release as well. You should utilize it directly with the tagged docker image on a node of H100s

docker run --gpus all --shm-size 1g -p 8080:80 -v $ volume:/data 
    ghcr.io/huggingface/text-generation-inference:3.2.1 --model-id deepseek-ai/DeepSeek-V3-0324

SGLang

SGLang supports running DeepSeek V3-0324 out of the box together with the Multi Latent Attention and Data Parallelism optimisations as well. To make use of you may simply just run the next on a node of H100s. For more information follow along here.

docker pull lmsysorg/sglang:latest

docker run --gpus all --shm-size 32g -p 30000:30000 -v ~/.cache/huggingface:/root/.cache/huggingface --ipc=host --network=host --privileged lmsysorg/sglang:latest 
    python3 -m sglang.launch_server --model deepseek-ai/DeepSeek-V3-0324 --tp 8 --trust-remote-code --port 30000

Dynamic Quants from Unsloth and Llama.cpp

Running large LLMs like DeepSeek V3-0324 will be quite compute intensive and would require a great amount GPU VRAM to run. That is where Quantization is available in, it allows the top user to make use of the identical model but with much lower VRAM consumption with a small trade-off in downstream performance.

Unsloth AI created Dynamic quantisations which permit one to run DeepSeek V3 with half the quantity of compute as one node of H100 and might run with llama.cpp without as much degradation in benchmarks. Read more about it here: https://huggingface.co/unsloth/DeepSeek-V3-0324-GGUF

Is it secure?

Running language model safely has all the time been on the focus, ever for the reason that first GPT models have been released. With the immense popularity of the DeepSeek models and their origin the query has found latest interest. Allow us to run down the things which might be secure to do and areas where some caution is a great idea. This isn’t DeepSeek specific but true for any open model!

To start with – is it secure to even download the model?

Downloading and running the model

Yes, downloading the model is secure. There are just a few precautions on the Hub side that be certain that it’s secure to download and run models:

Safetensors: The safetensors format is used to store the DeepSeek model weights on the Hub ensuring no hidden code execution is feasible; which was a risk with the older PyTorch pickle format. Thus no malicious code will be hidden within the weights file. Read more within the Safetensors blog.
Modeling code: To run the model, the modeling code also must be downloaded together with the load files. There are three mechanisms in place to enhance safety there: 1. the files are fully visible on the hub, 2. the user must explicitly set trust_remote_code=True to execute any code related to the model, 3. a security scanner runs over files on the hub and flags any malicious code files. If you must be extra careful you may pin the model version with the revision setting to be certain that you download the version of the modeling code that has been reviewed.

So downloading the weights is secure, and upon code review so is executing the modeling code. This implies you may run the DeepSeek model locally without the danger of backdoors or malicious code execution.

So what could be the primary risks outside of downloading and running the model? It relies on what you do with the model outputs!

Model outputs

The recommendation that follows isn’t specific to any model, and applies to each open and closed models: whether considering risks stemming from built-in secret behaviours within the model or from a model unintentionally producing bad outputs.

We’ll cover risks in three areas: alignment, code generation and agents.

Alignment mismatch: Every model provider chooses how and to which values their models are aligned. What these values are and the way they’re chosen typically stays opaque and so they may additionally change over time (see this study). The advantage of open models is that the alignment will be modified with custom fine-tuning at a later stage still as the instance of Perplexity’s DeepSeek 1776 shows.

First image — Economic and social value shift in GPT-3.5-turbo

Second image — Refusal frequency of models DeepSeek’s model vs. R1 1776 by Perplexity.

As a rule, users must be aware that any LLM is biased in a technique or one other and treat the model outputs accordingly.

Code generation: One of the popular use-cases of LLMs is as coding assistants. Nevertheless, this can be where indiscriminate usage of the model outputs can have essentially the most negative effects. Models are trained on vast amounts of published code, latest and old. This typically includes potentially malicious code or code that accommodates known vulnerabilities. So models might produce similar vulnerabilities when proposing code solutions.

So, how are you going to prevent security issues when using LLMs for code development? Run thorough code reviews of the proposed changes and scan the code with appropriate tools for vulnerabilities, as you’ll with every other code contribution.

Agents: Prior to now few months agent applications have gained significant interest, giving LLMs more autonomy and agency also bears risks. It’s necessary to watch out about what sort of system access agents have and which information you provide them. Some good practices:

Sandboxes: don’t run agents in your machine where they’ve access and control of your computer. This avoids leaking private information or unintentionally deleting necessary files.
Private information: don’t share private information comparable to logins with the LLM. If you could give the model access to a system use dedicated access keys with strict access rules.
Human-in-the-loop: for top stakes processes that you must automate with agents be certain that there’s a human within the loop for final confirmation.

TL;DR: Is it secure to run the models? Yes, downloading and running the models is secure, but, as with every model, it’s best to take precautions to make use of the models generations with the suitable safety measures.

Source link

Open R1: Update #4

Welcome DeepSeek-V3 0324

What we all know thus far

How good is it?

How did they do it?

Learn how to use the model

Inference Providers

Text Generation Inference

Is it secure?

Downloading and running the model

Model outputs

What are your thoughts on this topic?
Let us know in the comments below.

Share this article

Recent posts

NeurIPS 2025 Best Paper Review: Qwen’s Systematic Exploration of Attention Gating

Pushing the frontiers of audio generation

🚀 Accelerating LLM Inference with TGI on Intel Gaudi

A brand new era of discovery

How you can Construct Privacy-Preserving Evaluation Benchmarks with Synthetic Data

Open R1: Update #4

Welcome DeepSeek-V3 0324

What we all know thus far

How good is it?

How did they do it?

Learn how to use the model

Inference Providers

Text Generation Inference

Is it secure?

Downloading and running the model

Model outputs

What are your thoughts on this topic? Let us know in the comments below.

Share this article

Recent posts

What are your thoughts on this topic?
Let us know in the comments below.