Skip to main content
MindStudio
Pricing
Blog About
My Workspace
Text Generation Model

Llama 4 Scout

Llama 4 Scout is a powerful multimodal model with 17 billion active parameters, offering state-of-the-art performance in its class.

Publisher Meta
Type Text
Context Window 130,000 tokens
Training Data Early 2025
Input $0.11/MTok
Output $0.34/MTok
Provider Groq

Multimodal MoE model with 17B active parameters

Llama 4 Scout is a multimodal AI model developed by Meta, released in early 2025 as part of the Llama 4 model family. It uses a Mixture of Experts (MoE) architecture with 17 billion active parameters, 16 experts, and 109 billion total parameters, meaning only a subset of parameters is activated per token during inference. The model processes both text and image inputs within a unified backbone and supports a 130,000-token context window.

Llama 4 Scout is designed for developers and enterprises building applications that require combined text and vision understanding. Its MoE design makes it more compute-efficient during training and inference compared to dense models of similar total parameter counts. On MindStudio, it is served via Groq, which provides low-latency inference for the instruct-tuned variant.

What Llama 4 Scout supports

Multimodal Input

Processes both text and image inputs within a single unified model backbone, enabling tasks that combine visual and language understanding.

Long Context Window

Supports up to 130,000 tokens of context, allowing it to handle long documents, extended conversations, or large code files in a single request.

Mixture of Experts

Uses a 16-expert MoE architecture with 109 billion total parameters, activating only 17 billion per token to reduce compute cost while maintaining output quality.

Instruction Following

Fine-tuned as an instruct model, enabling it to follow natural language instructions for tasks like summarization, Q&A, and structured generation.

Fast Inference via Groq

Served on Groq's LPU infrastructure, which is designed to deliver low-latency token generation for real-time applications.

Code Generation

Capable of generating, explaining, and debugging code across common programming languages as part of its general instruction-following training.

Ready to build with Llama 4 Scout?

Get Started Free

Benchmark scores

Scores represent accuracy — the percentage of questions answered correctly on each test.

Benchmark What it tests Score
MMLU-Pro Expert knowledge across 14 academic disciplines 75.2%
GPQA Diamond PhD-level science questions (biology, physics, chemistry) 58.7%
MATH-500 Undergraduate and competition-level math problems 84.4%
AIME 2024 American math olympiad problems 28.3%
LiveCodeBench Real-world coding tasks from recent competitions 29.9%
HLE Questions that challenge frontier models across many domains 4.3%
SciCode Scientific research coding and numerical methods 17.0%

Common questions about Llama 4 Scout

What is the context window for Llama 4 Scout?

Llama 4 Scout supports a context window of 130,000 tokens, which allows for long documents, extended conversations, or large inputs to be processed in a single request.

How many parameters does Llama 4 Scout have?

Llama 4 Scout has 109 billion total parameters, but uses a Mixture of Experts architecture that activates only 17 billion parameters per token during inference.

Does Llama 4 Scout support image inputs?

Yes. Llama 4 Scout is a multimodal model that can process both text and image inputs within a unified model backbone.

When was Llama 4 Scout trained?

According to the model metadata, Llama 4 Scout's training data has a cutoff in early 2025.

Who publishes Llama 4 Scout and where is it hosted on MindStudio?

Llama 4 Scout is developed and published by Meta. On MindStudio, it is served via Groq using the llama-4-scout-17b-16e-instruct model variant.

What people think about Llama 4 Scout

Community reception of Llama 4 Scout on Reddit has been mixed, with some users acknowledging the model's architectural novelty and its availability on platforms like Hugging Face shortly after release. However, the most upvoted threads reflect significant disappointment, with many users feeling the model did not meet expectations set by Meta's pre-release benchmarks.

Common concerns include perceived gaps between benchmark performance and real-world usability, as well as comparisons to what users hoped the Llama 4 generation would deliver. The threads with the highest engagement (2,179 and 541 upvotes respectively) both center on unmet expectations rather than specific use case successes.

View more discussions →

Parameters & options

Max Temperature 1
Max Response Size 8,192 tokens

Start building with Llama 4 Scout

No API keys required. Create AI-powered workflows with Llama 4 Scout in minutes — free.