Text Generation Model

Gemini 2.5 Flash

Gemini 2.5 Flash is a thinking model that offers great, well-rounded capabilities. It is designed to offer a balance between price and performance.

Start Building with Gemini 2.5 Flash View All Models

Publisher

Google

Type Text

Context Window 1,048,576 tokens

Training Data June 2025

Input $0.30/MTok

Output $2.50/MTok

LARGE CONTEXTREAL-TIME LATENCYTOOLS

Try Gemini 2.5 Flash →

About Gemini 2.5 Flash

Balanced thinking model with large context window

Gemini 2.5 Flash is a text generation model developed by Google, designed to balance performance and cost efficiency. It is a thinking model, meaning it applies internal reasoning steps before producing a response, which supports more deliberate outputs across a range of tasks. The model supports a context window of 1,048,576 tokens, making it suitable for processing long documents, extended conversations, and large codebases in a single request.

Gemini 2.5 Flash is well-suited for tasks that require both speed and reasoning, such as summarization, question answering, tool use, and multi-step instruction following. It supports tool integrations, allowing it to be used in agentic workflows where external functions or APIs need to be called. The model reached general availability with a training data cutoff of June 2025, and is accessible through Google's Vertex AI platform.

Capabilities

What Gemini 2.5 Flash supports

Extended Context Window

Processes up to 1,048,576 tokens in a single request, enabling analysis of long documents, large codebases, or extended conversation histories without truncation.

Thinking / Reasoning

Applies internal chain-of-thought reasoning before generating a final response, supporting more deliberate outputs on multi-step or complex tasks.

Tool Use

Supports structured tool and function calling, allowing the model to invoke external APIs or defined functions as part of an agentic workflow.

Low-Latency Output

Optimized for real-time response latency, making it suitable for interactive applications and user-facing products that require timely replies.

Multimodal Input

Accepts text alongside other input modalities including images, enabling tasks like document understanding, visual question answering, and image-based reasoning.

Structured Configuration

Supports numeric and select-type parameters for controlling generation behavior, such as temperature and output length, through the API.

Ready to build with Gemini 2.5 Flash?

Get Started Free

Performance

Benchmark scores

Scores represent accuracy — the percentage of questions answered correctly on each test.

Benchmark	What it tests	Score
MMLU-Pro	Expert knowledge across 14 academic disciplines	80.9%
GPQA Diamond	PhD-level science questions (biology, physics, chemistry)	68.3%
MATH-500	Undergraduate and competition-level math problems	93.2%
AIME 2024	American math olympiad problems	50.0%
LiveCodeBench	Real-world coding tasks from recent competitions	49.5%
HLE	Questions that challenge frontier models across many domains	5.1%
SciCode	Scientific research coding and numerical methods	29.1%

FAQ

Common questions about Gemini 2.5 Flash

What is the context window size for Gemini 2.5 Flash?

Gemini 2.5 Flash supports a context window of 1,048,576 tokens, which allows it to process very long inputs such as large documents or extended conversation histories in a single request.

What is the training data cutoff for Gemini 2.5 Flash?

The model's training data cutoff is June 2025, meaning it has knowledge of events and information up to that date.

Does Gemini 2.5 Flash support tool or function calling?

Yes, Gemini 2.5 Flash supports tool use and function calling, making it suitable for agentic workflows where the model needs to interact with external APIs or defined functions.

What type of model is Gemini 2.5 Flash?

Gemini 2.5 Flash is a thinking model, meaning it uses internal reasoning steps before producing its final output. It is categorized as a text generation model published by Google.

Where can Gemini 2.5 Flash be accessed?

Gemini 2.5 Flash is available through Google's Vertex AI platform and can also be used via MindStudio without requiring separate API key management.

Community Discussion

What people think about Gemini 2.5 Flash

Community discussions around Gemini 2.5 Flash reflect general interest in its image editing capabilities and the rollout of related model variants, with posts about the Flash Image model receiving hundreds of upvotes for demonstrating high-level image edits. Users have also noted Google's informal naming of a Flash image preview variant as "Nano Banana," which generated notable engagement.

A recurring concern in the community is pricing, with one widely discussed thread noting that the cost of thinking output tokens doubled from $0.15 to $0.30 after the model reached general availability. This pricing change prompted significant discussion among developers evaluating the model for cost-sensitive production use cases.

r/singularity 623 pts 54 comments

Google is now officially calling "Gemini 2.5 Flash image preview", "Nano Banana"

r/LocalLLaMA 225 pts 82 comments

Google doubled the price of Gemini 2.5 Flash thinking output after GA from 0.15 to 0.30 what

r/singularity 465 pts 8 comments

Google's new Gemini 2.5 Flash Image model can do some very impressive high-level image edits

r/singularity 829 pts 334 comments

BREAKING: OpenAI releases "GPT-Image-1.5" (ChatGPT Images) & It instantly takes the #1 Spot on LMArena, beating Google's Nano Banana Pro.

View more discussions →

Resources