Founded by alums from Google’s DeepMind and Meta, Paris-based startup Mistral AI has consistently made waves within the AI community since 2023.
Mistral AI first caught the world’s attention with its debut model, Mistral 7B, released in 2023. This 7-billion parameter model quickly gained traction for its impressive performance, surpassing larger models like Llama 2 13B in various benchmarks and even rivaling Llama 1 34B in lots of metrics. What set Mistral 7B apart was not only its performance, but in addition its accessibility – the model could possibly be easily downloaded from GitHub and even via a 13.4-gigabyte torrent, making it available for researchers and developers worldwide.
The corporate’s unconventional approach to releases, often foregoing traditional papers, blogs, or press releases, has proven remarkably effective in capturing the AI community’s attention. This strategy, coupled with their commitment to open-source principles, has positioned Mistral AI as a formidable player within the AI landscape.
Mistral AI’s rapid ascent within the industry is further evidenced by their recent funding success. The corporate achieved a staggering $2 billion valuation following a funding round led by Andreessen Horowitz. This got here on the heels of a historic $118 million seed round – the biggest in European history – showcasing the immense faith investors have in Mistral AI’s vision and capabilities.
Beyond their technological advancements, Mistral AI has also been actively involved in shaping AI policy, particularly in discussions across the EU AI Act, where they’ve advocated for reduced regulation in open-source AI.
Now, in 2024, Mistral AI has once more raised the bar with two groundbreaking models: Mistral Large 2 (also referred to as Mistral-Large-Instruct-2407) and Mistral NeMo. On this comprehensive guide, we’ll dive deep into the features, performance, and potential applications of those impressive AI models.
Key specifications of Mistral Large 2 include:
- 123 billion parameters
- 128k context window
- Support for dozens of languages
- Proficiency in 80+ coding languages
- Advanced function calling capabilities
The model is designed to push the boundaries of cost efficiency, speed, and performance, making it a lovely option for each researchers and enterprises seeking to leverage cutting-edge AI.
Mistral NeMo: The Latest Smaller Model
While Mistral Large 2 represents the very best of Mistral AI’s large-scale models, Mistral NeMo, released on July, 2024, takes a special approach. Developed in collaboration with NVIDIA, Mistral NeMo is a more compact 12 billion parameter model that also offers impressive capabilities:
- 12 billion parameters
- 128k context window
- State-of-the-art performance in its size category
- Apache 2.0 license for open use
- Quantization-aware training for efficient inference
Mistral NeMo is positioned as a drop-in alternative for systems currently using Mistral 7B, offering enhanced performance while maintaining ease of use and compatibility.
Key Features and Capabilities
Each Mistral Large 2 and Mistral NeMo share several key features that set them apart within the AI landscape:
- Large Context Windows: With 128k token context lengths, each models can process and understand for much longer pieces of text, enabling more coherent and contextually relevant outputs.
- Multilingual Support: The models excel in a big selection of languages, including English, French, German, Spanish, Italian, Chinese, Japanese, Korean, Arabic, and Hindi.
- Advanced Coding Capabilities: Each models show exceptional proficiency in code generation across quite a few programming languages.
- Instruction Following: Significant improvements have been made within the models’ ability to follow precise instructions and handle multi-turn conversations.
- Function Calling: Native support for function calling allows these models to interact dynamically with external tools and services.
- Reasoning and Problem-Solving: Enhanced capabilities in mathematical reasoning and sophisticated problem-solving tasks.
Let’s delve deeper into a few of these features and examine how they perform in practice.
Performance Benchmarks
To grasp the true capabilities of Mistral Large 2 and Mistral NeMo, it’s essential to take a look at their performance across various benchmarks. Let’s examine some key metrics:
Mistral Large 2 Benchmarks
This table presents the proficiency of varied LLMs in numerous programming languages. Models like Mistral Large 2 (24.07), Llama 3.1 (405B), and GPT-4o are evaluated on their ability to know and generate code in languages equivalent to Python, C++, Bash, Java, TypeScript, PHP, and C#.
Llama 3.1 405B shows strong performance across multiple languages, with particularly high scores in Python and Java. This proficiency is critical for applications involving code generation, debugging, and software development, making these models priceless tools for developers.
Mistral NeMo Benchmarks
While specific benchmark scores for Mistral NeMo should not provided within the given information, it’s described as offering state-of-the-art performance in its size category. The model outperforms other recent open-source pre-trained models like Gemma 2 9B and Llama 3 8B in various tasks.
Multilingual and Coding Prowess
Certainly one of the standout features of each Mistral Large 2 and Mistral NeMo is their exceptional multilingual capabilities. This is especially vital in our increasingly globalized world, where AI models need to know and generate content in multiple languages.
Mistral Large 2, as an example, supports dozens of languages including French, German, Spanish, Italian, Portuguese, Arabic, Hindi, Russian, Chinese, Japanese, and Korean. This multilingual proficiency extends beyond mere translation, encompassing nuanced understanding and generation in these languages.
As an example this, let us take a look at an example of Mistral Large 2 generating text in multiple languages:
[INST] Generate a brief greeting in English, French, German, Spanish, and Japanese. [/INST]
Listed here are short greetings within the requested languages:
English: Hello! How are you today?
French: Bonjour ! Comment allez-vous aujourd’hui ?
German: Hallo! Wie geht es Ihnen heute?
Spanish: ¡Hola! ¿Cómo estás hoy?
Japanese: こんにちは!今日はお元気ですか?
[INST]
Now, write a straightforward “Hello, World!” program in Python, Java, and JavaScript. [/INST]
These examples showcase each the multilingual text generation and coding capabilities of the model.
Each models can be found on various platforms equivalent to Hugging Face, Mistral AI’s platform, and major cloud service providers like Google Cloud Platform, Azure AI Studio, Amazon Bedrock, and IBM watsonx.ai (Mistral AI | Frontier AI in your hands).
The Agentic Paradigm and Function Calling
Each Mistral Large 2 and Mistral NeMo embrace an agentic-centric design, which represents a paradigm shift in how we interact with AI models. This approach focuses on constructing models able to interacting with their environment, making decisions, and taking actions to realize specific goals.
A key feature enabling this paradigm is the native support for function calling. This enables the models to dynamically interact with external tools and services, effectively expanding their capabilities beyond easy text generation.
Let us take a look at an example of how function calling might work with Mistral Large 2:
from mistral_common.protocol.instruct.tool_calls import Function, Tool from mistral_inference.transformer import Transformer from mistral_inference.generate import generate from mistral_common.tokens.tokenizers.mistral import MistralTokenizer from mistral_common.protocol.instruct.messages import UserMessage from mistral_common.protocol.instruct.request import ChatCompletionRequest # Initialize tokenizer and model mistral_models_path = "path/to/mistral/models" # Ensure this path is correct tokenizer = MistralTokenizer.from_file(f"{mistral_models_path}/tokenizer.model.v3") model = Transformer.from_folder(mistral_models_path) # Define a function for getting weather information weather_function = Function( name="get_current_weather", description="Get the present weather", parameters={ "type": "object", "properties": { "location": { "type": "string", "description": "Town and state, e.g. San Francisco, CA", }, "format": { "type": "string", "enum": ["celsius", "fahrenheit"], "description": "The temperature unit to make use of. Infer this from the user's location.", }, }, "required": ["location", "format"], }, ) # Create a chat completion request with the function completion_request = ChatCompletionRequest( tools=[Tool(function=weather_function)], messages=[ UserMessage(content="What's the weather like today in Paris?"), ], ) # Encode the request tokens = tokenizer.encode_chat_completion(completion_request).tokens # Generate a response out_tokens, _ = generate([tokens], model, max_tokens=256, temperature=0.7, eos_id=tokenizer.instruct_tokenizer.tokenizer.eos_id) result = tokenizer.decode(out_tokens[0]) print(result)
In this instance, we define a function for getting weather information and include it in our chat completion request. The model can then use this function to retrieve real-time weather data, demonstrating how it may well interact with external systems to supply more accurate and up-to-date information.
Tekken: A More Efficient Tokenizer
Mistral NeMo introduces a brand new tokenizer called Tekken, which is predicated on Tiktoken and trained on over 100 languages. This latest tokenizer offers significant improvements in text compression efficiency in comparison with previous tokenizers like SentencePiece.
Key features of Tekken include:
- 30% more efficient compression for source code, Chinese, Italian, French, German, Spanish, and Russian
- 2x more efficient compression for Korean
- 3x more efficient compression for Arabic
- Outperforms the Llama 3 tokenizer in compressing text for roughly 85% of all languages
This improved tokenization efficiency translates to raised model performance, especially when coping with multilingual text and source code. It allows the model to process more information inside the same context window, resulting in more coherent and contextually relevant outputs.
Licensing and Availability
Mistral Large 2 and Mistral NeMo have different licensing models, reflecting their intended use cases:
Mistral Large 2
- Released under the Mistral Research License
- Allows usage and modification for research and non-commercial purposes
- Business usage requires a Mistral Business License
Mistral NeMo
- Released under the Apache 2.0 license
- Allows for open use, including business applications
Each models can be found through various platforms:
- Hugging Face: Weights for each base and instruct models are hosted here
- Mistral AI: Available as
mistral-large-2407
(Mistral Large 2) andopen-mistral-nemo-2407
(Mistral NeMo) - Cloud Service Providers: Available on Google Cloud Platform’s Vertex AI, Azure AI Studio, Amazon Bedrock, and IBM watsonx.ai
For developers seeking to use these models, here’s a fast example of find out how to load and use Mistral Large 2 with Hugging Face transformers:
from transformers import AutoModelForCausalLM, AutoTokenizer model_name = "mistralai/Mistral-Large-Instruct-2407" device = "cuda" # Use GPU if available # Load the model and tokenizer model = AutoModelForCausalLM.from_pretrained(model_name) tokenizer = AutoTokenizer.from_pretrained(model_name) # Move the model to the suitable device model.to(device) # Prepare input messages = [ {"role": "system", "content": "You are a helpful AI assistant."}, {"role": "user", "content": "Explain the concept of neural networks in simple terms."} ] # Encode input input_ids = tokenizer.apply_chat_template(messages, return_tensors="pt").to(device) # Generate response output_ids = model.generate(input_ids, max_new_tokens=500, do_sample=True) # Decode and print the response response = tokenizer.decode(output_ids[0], skip_special_tokens=True) print(response)
This code demonstrates find out how to load the model, prepare input in a chat format, generate a response, and decode the output.