Text Generation Model

Gemini 3 Flash

A fast, capable thinking model from Google designed for agentic workflows, coding, and multi-turn chat with near Pro-level reasoning at lower latency.

Start Building with Gemini 3 Flash View All Models

Publisher

Google

Type Text

Context Window 1,048,576 tokens

Training Data December 2025

Input $0.50/MTok

Output $3.00/MTok

LARGE CONTEXTREAL-TIME LATENCYLATESTTOOLS

Try Gemini 3 Flash →

About Gemini 3 Flash

Fast thinking model for agentic and coding workflows

Gemini 3 Flash is a text generation model developed by Google, released in December 2025 as part of the Gemini 3 family. It is designed to deliver near-frontier reasoning performance at lower latency than full-scale models, making it suitable for interactive and production-grade applications. The model accepts multimodal inputs including text, images, audio, video, and PDFs, and produces text output. A configurable reasoning system allows users to select thinking levels — minimal, low, medium, or high — to balance response speed against reasoning depth.

The model supports a context window of up to 1,048,576 tokens, enabling it to process very long documents, codebases, and extended conversation histories in a single pass. It includes built-in support for tool use, structured output, and automatic context caching, which makes it well-suited for agentic workflows and multi-step pipelines. Developers working on coding assistants, automated agents, and multi-turn chat applications are the primary intended audience. It is available via the Gemini API and through third-party providers such as OpenRouter.

Capabilities

What Gemini 3 Flash supports

Large Context Window

Processes up to 1,048,576 tokens in a single request, allowing entire codebases, long documents, or extended conversation histories to be included as context.

Configurable Reasoning

Offers selectable thinking levels (minimal, low, medium, high) so developers can tune the trade-off between response latency and reasoning depth per request.

Multimodal Input

Accepts text, images, audio, video, and PDF files as input, producing text output from any combination of these modalities.

Tool Use & Agents

Supports function calling and tool use natively, enabling reliable multi-step agent loops and integration with external APIs or services.

Structured Output

Can return responses in structured formats such as JSON, making it straightforward to parse model outputs in automated pipelines.

Context Caching

Supports automatic context caching to reduce redundant token processing across repeated or long-running agentic sessions.

Low-Latency Responses

Optimized for real-time and interactive use cases, delivering responses at substantially lower latency than larger Gemini model variants.

Coding Assistance

Designed for coding tasks including code generation, debugging, and explanation, with support for long codebases via the 1M-token context window.

Ready to build with Gemini 3 Flash?

Get Started Free

Performance

Benchmark scores

Scores represent accuracy — the percentage of questions answered correctly on each test.

Benchmark	What it tests	Score
MMLU-Pro	Expert knowledge across 14 academic disciplines	88.2%
GPQA Diamond	PhD-level science questions (biology, physics, chemistry)	81.2%
LiveCodeBench	Real-world coding tasks from recent competitions	79.7%
HLE	Questions that challenge frontier models across many domains	14.1%
SciCode	Scientific research coding and numerical methods	49.9%
SWE-bench Verified	Real GitHub issues requiring multi-file code fixes	78.0%

FAQ

Common questions about Gemini 3 Flash

What is the context window size for Gemini 3 Flash?

Gemini 3 Flash supports a context window of up to 1,048,576 tokens, which allows it to process very long documents, codebases, or conversation histories in a single request.

What is the training data cutoff for Gemini 3 Flash?

Based on the available metadata, the model's training date is listed as December 2025.

What input types does Gemini 3 Flash accept?

The model accepts text, images, audio, video, and PDF files as inputs, and produces text as output.

Does Gemini 3 Flash support tool use and function calling?

Yes. Gemini 3 Flash includes native support for tool use, function calling, and structured output, making it suitable for agentic workflows and automated pipelines.

What are the configurable reasoning options in Gemini 3 Flash?

The model offers selectable thinking levels — minimal, low, medium, and high — allowing developers to adjust the balance between response speed and reasoning depth depending on the use case.

How is Gemini 3 Flash priced?

Based on community-reported information, Gemini 3 Flash is priced at approximately $0.50 per 1 million tokens. For the most current and authoritative pricing, refer to the official Google Gemini API documentation.

Community Discussion

What people think about Gemini 3 Flash

Community reception on Reddit has been largely positive, with users highlighting the model's benchmark results including a reported 99.7% score on AIME and a rank of #3 on LMArena at the time of release. The low cost of approximately $0.50 per 1 million tokens relative to its reported reasoning performance has been a frequently cited point of interest.

Discussions have also focused on specific capabilities such as agentic vision features introduced in a subsequent update, and independent benchmark results including a reported high "Omniscience" score. Some threads reference deleted posts from researchers at Google DeepMind, suggesting community interest in behind-the-scenes development context.

r/singularity 517 pts 124 comments

Google releases Gemini 3 Flash: Ranks #3 on LMArena (above Opus 4.5), scores 99.7% on AIME and costs $0.50/1M plus Benchmarks.

r/singularity 496 pts 64 comments

Google introduces Agentic Vision in Gemini 3 Flash

r/singularity 119 pts 23 comments

UPDATE: Independent Benchmarks for Gemini 3 Flash (Highest "Omniscience" Score ever recorded) + Google Lead teases: "The week is not over yet." Gemma 4 incoming?

r/singularity 1,312 pts 169 comments

deleted post from a research scientist @ GoogleDeepMind

View more discussions →

Resources