Text Generation Model

Mistral Large 3

Mistral Large 3 is Mistral’s first mixture-of-experts model since the seminal Mixtral series, and represents a substantial step forward in pretraining at Mistral.

Start Building with Mistral Large 3 View All Models

Publisher

Mistral

Type Text

Context Window 256,000 tokens

Training Data n/a

Input $0.50/MTok

Output $1.50/MTok

OPEN SOURCE

Try Mistral Large 3 →

About Mistral Large 3

Open-weight MoE model with multilingual and vision support

Mistral Large 3 is a 675-billion-parameter mixture-of-experts (MoE) text generation model developed by Mistral. It is the first MoE model Mistral has released since the Mixtral series, and was trained from scratch on 3,000 NVIDIA H200 GPUs. The model is released under a permissive open-weight license, making the weights publicly available for download and self-hosting.

Mistral Large 3 supports a 256,000-token context window and includes image understanding alongside text generation. It is particularly noted for multilingual conversation handling, with Mistral highlighting non-English and non-Chinese language performance as a focus area. The model is well-suited for tasks requiring long-context reasoning, multilingual text processing, and instruction following across general-purpose prompts.

Capabilities

What Mistral Large 3 supports

Long Context Window

Processes up to 256,000 tokens in a single context, enabling analysis of long documents, codebases, or extended conversations.

Mixture-of-Experts Architecture

Uses a sparse MoE design across 675 billion total parameters, activating only a subset of experts per token during inference.

Multilingual Text Generation

Handles conversations and instructions in a wide range of languages, with Mistral specifically highlighting performance on non-English and non-Chinese languages.

Image Understanding

Accepts image inputs alongside text, enabling tasks such as visual question answering and image-based reasoning.

Open-Weight Access

Model weights are publicly available on Hugging Face under a permissive license, supporting local deployment and fine-tuning.

Instruction Following

Post-training aligns the model to follow general-purpose instructions, with Mistral reporting parity with leading instruction-tuned open-weight models on general prompts.

Ready to build with Mistral Large 3?

Get Started Free

Performance

Benchmark scores

Scores represent accuracy — the percentage of questions answered correctly on each test.

Benchmark	What it tests	Score
MMLU-Pro	Expert knowledge across 14 academic disciplines	80.7%
GPQA Diamond	PhD-level science questions (biology, physics, chemistry)	68.0%
LiveCodeBench	Real-world coding tasks from recent competitions	46.5%
HLE	Questions that challenge frontier models across many domains	4.1%
SciCode	Scientific research coding and numerical methods	36.2%

FAQ

Common questions about Mistral Large 3

What is the context window for Mistral Large 3?

Mistral Large 3 supports a context window of 256,000 tokens.

How many parameters does Mistral Large 3 have?

Mistral Large 3 has 675 billion total parameters and uses a mixture-of-experts architecture, meaning only a subset of parameters are active for any given token.

Is Mistral Large 3 open source?

Yes. Mistral Large 3 is released as an open-weight model, meaning the weights are publicly available. The model can be downloaded from Hugging Face and run locally or fine-tuned.

What input types does Mistral Large 3 support?

Mistral Large 3 supports text input and also includes image understanding capabilities, allowing it to process image inputs alongside text prompts.

What hardware was used to train Mistral Large 3?

According to Mistral, the model was trained from scratch on 3,000 NVIDIA H200 GPUs.

Is there a knowledge cutoff date for Mistral Large 3?

A specific training data cutoff date has not been published in the available metadata for Mistral Large 3.

Community Discussion

What people think about Mistral Large 3

Community discussion on r/LocalLLaMA has been active around Mistral Large 3, with threads covering its availability on Hugging Face and interest in GGUF quantizations for local deployment. Users have shown enthusiasm for the model's open-weight release and its large 675B parameter count.

Some users have expressed mixed impressions about real-world performance relative to expectations, as reflected in a thread titled "Unimpressed with Mistral Large 3 675B." Discussions have also touched on inference framework support, including upcoming vLLM compatibility and EQ-Bench evaluation results.

r/LocalLLaMA 131 pts 67 comments

Unimpressed with Mistral Large 3 675B

r/LocalLLaMA 207 pts 60 comments

Mistral 3 Large 675B up on huggingface

r/LocalLLaMA 431 pts 16 comments

Who’s got them Q_001_X_S_REAP Mistral Large 3 GGUFs?

r/LocalLLaMA 63 pts 57 comments

EQ-Bench updates: Gpt-5.2, Opus 4.5, Mistral Large 3 and Nanbeige4-3B

r/LocalLLaMA 136 pts 42 comments

Upcoming vllm Mistral Large 3 support

View more discussions →

Resources