Gemini 3 Flash
A fast, capable thinking model from Google designed for agentic workflows, coding, and multi-turn chat with near Pro-level reasoning at lower latency.
Fast thinking model for agentic and coding workflows
Gemini 3 Flash is a text generation model developed by Google, released in December 2025 as part of the Gemini 3 family. It is designed to deliver near-frontier reasoning performance at lower latency than full-scale models, making it suitable for interactive and production-grade applications. The model accepts multimodal inputs including text, images, audio, video, and PDFs, and produces text output. A configurable reasoning system allows users to select thinking levels — minimal, low, medium, or high — to balance response speed against reasoning depth.
The model supports a context window of up to 1,048,576 tokens, enabling it to process very long documents, codebases, and extended conversation histories in a single pass. It includes built-in support for tool use, structured output, and automatic context caching, which makes it well-suited for agentic workflows and multi-step pipelines. Developers working on coding assistants, automated agents, and multi-turn chat applications are the primary intended audience. It is available via the Gemini API and through third-party providers such as OpenRouter.
What Gemini 3 Flash supports
Large Context Window
Processes up to 1,048,576 tokens in a single request, allowing entire codebases, long documents, or extended conversation histories to be included as context.
Configurable Reasoning
Offers selectable thinking levels (minimal, low, medium, high) so developers can tune the trade-off between response latency and reasoning depth per request.
Multimodal Input
Accepts text, images, audio, video, and PDF files as input, producing text output from any combination of these modalities.
Tool Use & Agents
Supports function calling and tool use natively, enabling reliable multi-step agent loops and integration with external APIs or services.
Structured Output
Can return responses in structured formats such as JSON, making it straightforward to parse model outputs in automated pipelines.
Context Caching
Supports automatic context caching to reduce redundant token processing across repeated or long-running agentic sessions.
Low-Latency Responses
Optimized for real-time and interactive use cases, delivering responses at substantially lower latency than larger Gemini model variants.
Coding Assistance
Designed for coding tasks including code generation, debugging, and explanation, with support for long codebases via the 1M-token context window.
Ready to build with Gemini 3 Flash?
Get Started FreeBenchmark scores
Scores represent accuracy — the percentage of questions answered correctly on each test.
| Benchmark | What it tests | Score |
|---|---|---|
| MMLU-Pro | Expert knowledge across 14 academic disciplines | 88.2% |
| GPQA Diamond | PhD-level science questions (biology, physics, chemistry) | 81.2% |
| LiveCodeBench | Real-world coding tasks from recent competitions | 79.7% |
| HLE | Questions that challenge frontier models across many domains | 14.1% |
| SciCode | Scientific research coding and numerical methods | 49.9% |
| SWE-bench Verified | Real GitHub issues requiring multi-file code fixes | 78.0% |
Common questions about Gemini 3 Flash
What is the context window size for Gemini 3 Flash?
Gemini 3 Flash supports a context window of up to 1,048,576 tokens, which allows it to process very long documents, codebases, or conversation histories in a single request.
What is the training data cutoff for Gemini 3 Flash?
Based on the available metadata, the model's training date is listed as December 2025.
What input types does Gemini 3 Flash accept?
The model accepts text, images, audio, video, and PDF files as inputs, and produces text as output.
Does Gemini 3 Flash support tool use and function calling?
Yes. Gemini 3 Flash includes native support for tool use, function calling, and structured output, making it suitable for agentic workflows and automated pipelines.
What are the configurable reasoning options in Gemini 3 Flash?
The model offers selectable thinking levels — minimal, low, medium, and high — allowing developers to adjust the balance between response speed and reasoning depth depending on the use case.
How is Gemini 3 Flash priced?
Based on community-reported information, Gemini 3 Flash is priced at approximately $0.50 per 1 million tokens. For the most current and authoritative pricing, refer to the official Google Gemini API documentation.
What people think about Gemini 3 Flash
Community reception on Reddit has been largely positive, with users highlighting the model's benchmark results including a reported 99.7% score on AIME and a rank of #3 on LMArena at the time of release. The low cost of approximately $0.50 per 1 million tokens relative to its reported reasoning performance has been a frequently cited point of interest.
Discussions have also focused on specific capabilities such as agentic vision features introduced in a subsequent update, and independent benchmark results including a reported high "Omniscience" score. Some threads reference deleted posts from researchers at Google DeepMind, suggesting community interest in behind-the-scenes development context.
Google releases Gemini 3 Flash: Ranks #3 on LMArena (above Opus 4.5), scores 99.7% on AIME and costs $0.50/1M plus Benchmarks.
Google introduces Agentic Vision in Gemini 3 Flash
UPDATE: Independent Benchmarks for Gemini 3 Flash (Highest "Omniscience" Score ever recorded) + Google Lead teases: "The week is not over yet." Gemma 4 incoming?
deleted post from a research scientist @ GoogleDeepMind
Parameters & options
Must be less than Max Response Size
Explore similar models
Start building with Gemini 3 Flash
No API keys required. Create AI-powered workflows with Gemini 3 Flash in minutes — free.