Skip to main content
MindStudio
Pricing
Blog About
My Workspace
Text Generation Model

Gemini 3 Flash

A fast, capable thinking model from Google designed for agentic workflows, coding, and multi-turn chat with near Pro-level reasoning at lower latency.

Publisher Google
Type Text
Context Window 1,048,576 tokens
Training Data December 2025
Input $0.50/MTok
Output $3.00/MTok
LARGE CONTEXTREAL-TIME LATENCYLATESTTOOLS

Fast thinking model for agentic and coding workflows

Gemini 3 Flash is a text generation model developed by Google, released in December 2025 as part of the Gemini 3 family. It is designed to deliver near-frontier reasoning performance at lower latency than full-scale models, making it suitable for interactive and production-grade applications. The model accepts multimodal inputs including text, images, audio, video, and PDFs, and produces text output. A configurable reasoning system allows users to select thinking levels — minimal, low, medium, or high — to balance response speed against reasoning depth.

The model supports a context window of up to 1,048,576 tokens, enabling it to process very long documents, codebases, and extended conversation histories in a single pass. It includes built-in support for tool use, structured output, and automatic context caching, which makes it well-suited for agentic workflows and multi-step pipelines. Developers working on coding assistants, automated agents, and multi-turn chat applications are the primary intended audience. It is available via the Gemini API and through third-party providers such as OpenRouter.

What Gemini 3 Flash supports

Large Context Window

Processes up to 1,048,576 tokens in a single request, allowing entire codebases, long documents, or extended conversation histories to be included as context.

Configurable Reasoning

Offers selectable thinking levels (minimal, low, medium, high) so developers can tune the trade-off between response latency and reasoning depth per request.

Multimodal Input

Accepts text, images, audio, video, and PDF files as input, producing text output from any combination of these modalities.

Tool Use & Agents

Supports function calling and tool use natively, enabling reliable multi-step agent loops and integration with external APIs or services.

Structured Output

Can return responses in structured formats such as JSON, making it straightforward to parse model outputs in automated pipelines.

Context Caching

Supports automatic context caching to reduce redundant token processing across repeated or long-running agentic sessions.

Low-Latency Responses

Optimized for real-time and interactive use cases, delivering responses at substantially lower latency than larger Gemini model variants.

Coding Assistance

Designed for coding tasks including code generation, debugging, and explanation, with support for long codebases via the 1M-token context window.

Ready to build with Gemini 3 Flash?

Get Started Free

Benchmark scores

Scores represent accuracy — the percentage of questions answered correctly on each test.

Benchmark What it tests Score
MMLU-Pro Expert knowledge across 14 academic disciplines 88.2%
GPQA Diamond PhD-level science questions (biology, physics, chemistry) 81.2%
LiveCodeBench Real-world coding tasks from recent competitions 79.7%
HLE Questions that challenge frontier models across many domains 14.1%
SciCode Scientific research coding and numerical methods 49.9%
SWE-bench Verified Real GitHub issues requiring multi-file code fixes 78.0%

Common questions about Gemini 3 Flash

What is the context window size for Gemini 3 Flash?

Gemini 3 Flash supports a context window of up to 1,048,576 tokens, which allows it to process very long documents, codebases, or conversation histories in a single request.

What is the training data cutoff for Gemini 3 Flash?

Based on the available metadata, the model's training date is listed as December 2025.

What input types does Gemini 3 Flash accept?

The model accepts text, images, audio, video, and PDF files as inputs, and produces text as output.

Does Gemini 3 Flash support tool use and function calling?

Yes. Gemini 3 Flash includes native support for tool use, function calling, and structured output, making it suitable for agentic workflows and automated pipelines.

What are the configurable reasoning options in Gemini 3 Flash?

The model offers selectable thinking levels — minimal, low, medium, and high — allowing developers to adjust the balance between response speed and reasoning depth depending on the use case.

How is Gemini 3 Flash priced?

Based on community-reported information, Gemini 3 Flash is priced at approximately $0.50 per 1 million tokens. For the most current and authoritative pricing, refer to the official Google Gemini API documentation.

What people think about Gemini 3 Flash

Community reception on Reddit has been largely positive, with users highlighting the model's benchmark results including a reported 99.7% score on AIME and a rank of #3 on LMArena at the time of release. The low cost of approximately $0.50 per 1 million tokens relative to its reported reasoning performance has been a frequently cited point of interest.

Discussions have also focused on specific capabilities such as agentic vision features introduced in a subsequent update, and independent benchmark results including a reported high "Omniscience" score. Some threads reference deleted posts from researchers at Google DeepMind, suggesting community interest in behind-the-scenes development context.

View more discussions →

Parameters & options

Max Temperature 2
Max Response Size 65,535 tokens
Thinking Budget Select
Default: auto
OffManualAuto
Thinking Budget Limit Number

Must be less than Max Response Size

Range: 1–24576

Start building with Gemini 3 Flash

No API keys required. Create AI-powered workflows with Gemini 3 Flash in minutes — free.