Text Generation Model

Gemini 2.5 Flash Lite

Google's fastest and most efficient Gemini 2.5 model, delivering high-quality AI performance at scale with optional reasoning capabilities.

Start Building with Gemini 2.5 Flash Lite View All Models

Publisher

Google

Type Text

Context Window 1,000,000 tokens

Training Data July 2025

Input $0.10/MTok

Output $0.40/MTok

FAST

Try Gemini 2.5 Flash Lite →

About Gemini 2.5 Flash Lite

Google's fastest, most cost-efficient Gemini 2.5 model

Gemini 2.5 Flash Lite is Google's most cost-efficient model in the Gemini 2.5 family, designed for high-volume, latency-sensitive workloads. It supports a 1 million-token context window and includes optional reasoning capabilities that can be toggled on or off via controllable thinking budgets, allowing developers to balance speed and depth depending on the task. The model also supports Grounding with Google Search, Code Execution, and URL Context as built-in features.

Gemini 2.5 Flash Lite is well-suited for production applications that require processing large numbers of requests efficiently, such as document classification, real-time translation, content moderation, and coding assistance. Its multimodal input support and broad benchmark coverage across coding, math, science, and reasoning tasks make it a practical choice for developers building scalable AI pipelines where cost and throughput are primary constraints.

Capabilities

What Gemini 2.5 Flash Lite supports

Low Latency Responses

Optimized for speed-sensitive workloads, delivering responses faster than previous Flash-Lite generations across a broad range of prompt types.

1M Token Context

Supports a context window of up to 1 million tokens, enabling processing of long documents, codebases, or extended conversation histories in a single request.

Optional Reasoning

Includes native reasoning that can be enabled or disabled via controllable thinking budgets, letting developers trade off latency against depth of reasoning per task.

Grounding with Search

Supports Grounding with Google Search, allowing the model to anchor responses in up-to-date web information during inference.

Code Execution

Built-in Code Execution capability allows the model to write and run code as part of a response, returning computed results directly.

Multimodal Input

Accepts text, images, video, audio, and document inputs, supporting tasks that combine multiple modalities in a single prompt.

Configurable Parameters

Exposes select and number input types for runtime configuration, enabling fine-grained control over model behavior such as thinking budget settings.

Ready to build with Gemini 2.5 Flash Lite?

Get Started Free

Performance

Benchmark scores

Scores represent accuracy — the percentage of questions answered correctly on each test.

Benchmark	What it tests	Score
MMLU-Pro	Expert knowledge across 14 academic disciplines	72.4%
GPQA Diamond	PhD-level science questions (biology, physics, chemistry)	47.4%
MATH-500	Undergraduate and competition-level math problems	92.6%
AIME 2024	American math olympiad problems	50.0%
LiveCodeBench	Real-world coding tasks from recent competitions	40.0%
HLE	Questions that challenge frontier models across many domains	3.7%
SciCode	Scientific research coding and numerical methods	17.7%

FAQ

Common questions about Gemini 2.5 Flash Lite

What is the context window size for Gemini 2.5 Flash Lite?

Gemini 2.5 Flash Lite supports a context window of up to 1 million tokens, which allows it to process very long documents or extended conversations in a single request.

Does Gemini 2.5 Flash Lite support reasoning?

Yes. The model includes optional reasoning capabilities that can be toggled on or off using controllable thinking budgets, so you can enable deeper reasoning for complex tasks or disable it to prioritize speed.

What is the training data cutoff for Gemini 2.5 Flash Lite?

Based on the available metadata, the model's training date is listed as July 2025.

What types of inputs does Gemini 2.5 Flash Lite accept?

The model supports multimodal inputs including text, images, video, audio, and documents, in addition to configurable runtime parameters via select and number input types.

What use cases is Gemini 2.5 Flash Lite best suited for?

It is designed for high-volume, latency-sensitive tasks such as translation, classification, document processing, coding assistance, and content moderation — scenarios where throughput and cost efficiency are priorities.

Community Discussion

What people think about Gemini 2.5 Flash Lite

Community discussion around Gemini 2.5 Flash Lite is generally positive, with users highlighting the addition of Thinking, Live Audio, and Grounding as notable features for Google's most affordable model in the 2.5 family. The thread received 139 upvotes with minimal controversy, suggesting broad approval of the feature expansion.

Commenters focused primarily on the model's cost-efficiency and the practical value of optional reasoning at a low price point, with limited discussion of limitations. The small comment count (5) indicates the announcement was well-received but did not generate significant debate.

r/singularity 139 pts 5 comments

Google’s cheapest model (Gemini 2.5 Flash Lite) now supports Thinking, Live Audio and Grounding

View more discussions →

Resources