Skip to main content
MindStudio
Pricing
Blog About
My Workspace
Text Generation Model

Mistral Large 3

Mistral Large 3 is Mistral’s first mixture-of-experts model since the seminal Mixtral series, and represents a substantial step forward in pretraining at Mistral.

Publisher Mistral
Type Text
Context Window 256,000 tokens
Training Data n/a
Input $0.50/MTok
Output $1.50/MTok
OPEN SOURCE

Open-weight MoE model with multilingual and vision support

Mistral Large 3 is a 675-billion-parameter mixture-of-experts (MoE) text generation model developed by Mistral. It is the first MoE model Mistral has released since the Mixtral series, and was trained from scratch on 3,000 NVIDIA H200 GPUs. The model is released under a permissive open-weight license, making the weights publicly available for download and self-hosting.

Mistral Large 3 supports a 256,000-token context window and includes image understanding alongside text generation. It is particularly noted for multilingual conversation handling, with Mistral highlighting non-English and non-Chinese language performance as a focus area. The model is well-suited for tasks requiring long-context reasoning, multilingual text processing, and instruction following across general-purpose prompts.

What Mistral Large 3 supports

Long Context Window

Processes up to 256,000 tokens in a single context, enabling analysis of long documents, codebases, or extended conversations.

Mixture-of-Experts Architecture

Uses a sparse MoE design across 675 billion total parameters, activating only a subset of experts per token during inference.

Multilingual Text Generation

Handles conversations and instructions in a wide range of languages, with Mistral specifically highlighting performance on non-English and non-Chinese languages.

Image Understanding

Accepts image inputs alongside text, enabling tasks such as visual question answering and image-based reasoning.

Open-Weight Access

Model weights are publicly available on Hugging Face under a permissive license, supporting local deployment and fine-tuning.

Instruction Following

Post-training aligns the model to follow general-purpose instructions, with Mistral reporting parity with leading instruction-tuned open-weight models on general prompts.

Ready to build with Mistral Large 3?

Get Started Free

Benchmark scores

Scores represent accuracy — the percentage of questions answered correctly on each test.

Benchmark What it tests Score
MMLU-Pro Expert knowledge across 14 academic disciplines 80.7%
GPQA Diamond PhD-level science questions (biology, physics, chemistry) 68.0%
LiveCodeBench Real-world coding tasks from recent competitions 46.5%
HLE Questions that challenge frontier models across many domains 4.1%
SciCode Scientific research coding and numerical methods 36.2%

Common questions about Mistral Large 3

What is the context window for Mistral Large 3?

Mistral Large 3 supports a context window of 256,000 tokens.

How many parameters does Mistral Large 3 have?

Mistral Large 3 has 675 billion total parameters and uses a mixture-of-experts architecture, meaning only a subset of parameters are active for any given token.

Is Mistral Large 3 open source?

Yes. Mistral Large 3 is released as an open-weight model, meaning the weights are publicly available. The model can be downloaded from Hugging Face and run locally or fine-tuned.

What input types does Mistral Large 3 support?

Mistral Large 3 supports text input and also includes image understanding capabilities, allowing it to process image inputs alongside text prompts.

What hardware was used to train Mistral Large 3?

According to Mistral, the model was trained from scratch on 3,000 NVIDIA H200 GPUs.

Is there a knowledge cutoff date for Mistral Large 3?

A specific training data cutoff date has not been published in the available metadata for Mistral Large 3.

What people think about Mistral Large 3

Community discussion on r/LocalLLaMA has been active around Mistral Large 3, with threads covering its availability on Hugging Face and interest in GGUF quantizations for local deployment. Users have shown enthusiasm for the model's open-weight release and its large 675B parameter count.

Some users have expressed mixed impressions about real-world performance relative to expectations, as reflected in a thread titled "Unimpressed with Mistral Large 3 675B." Discussions have also touched on inference framework support, including upcoming vLLM compatibility and EQ-Bench evaluation results.

View more discussions →

Parameters & options

Max Temperature 1
Max Response Size 16,000 tokens

Start building with Mistral Large 3

No API keys required. Create AI-powered workflows with Mistral Large 3 in minutes — free.