Gemini 2.5 Flash Lite
Google's fastest and most efficient Gemini 2.5 model, delivering high-quality AI performance at scale with optional reasoning capabilities.
Google's fastest, most cost-efficient Gemini 2.5 model
Gemini 2.5 Flash Lite is Google's most cost-efficient model in the Gemini 2.5 family, designed for high-volume, latency-sensitive workloads. It supports a 1 million-token context window and includes optional reasoning capabilities that can be toggled on or off via controllable thinking budgets, allowing developers to balance speed and depth depending on the task. The model also supports Grounding with Google Search, Code Execution, and URL Context as built-in features.
Gemini 2.5 Flash Lite is well-suited for production applications that require processing large numbers of requests efficiently, such as document classification, real-time translation, content moderation, and coding assistance. Its multimodal input support and broad benchmark coverage across coding, math, science, and reasoning tasks make it a practical choice for developers building scalable AI pipelines where cost and throughput are primary constraints.
What Gemini 2.5 Flash Lite supports
Low Latency Responses
Optimized for speed-sensitive workloads, delivering responses faster than previous Flash-Lite generations across a broad range of prompt types.
1M Token Context
Supports a context window of up to 1 million tokens, enabling processing of long documents, codebases, or extended conversation histories in a single request.
Optional Reasoning
Includes native reasoning that can be enabled or disabled via controllable thinking budgets, letting developers trade off latency against depth of reasoning per task.
Grounding with Search
Supports Grounding with Google Search, allowing the model to anchor responses in up-to-date web information during inference.
Code Execution
Built-in Code Execution capability allows the model to write and run code as part of a response, returning computed results directly.
Multimodal Input
Accepts text, images, video, audio, and document inputs, supporting tasks that combine multiple modalities in a single prompt.
Configurable Parameters
Exposes select and number input types for runtime configuration, enabling fine-grained control over model behavior such as thinking budget settings.
Ready to build with Gemini 2.5 Flash Lite?
Get Started FreeBenchmark scores
Scores represent accuracy — the percentage of questions answered correctly on each test.
| Benchmark | What it tests | Score |
|---|---|---|
| MMLU-Pro | Expert knowledge across 14 academic disciplines | 72.4% |
| GPQA Diamond | PhD-level science questions (biology, physics, chemistry) | 47.4% |
| MATH-500 | Undergraduate and competition-level math problems | 92.6% |
| AIME 2024 | American math olympiad problems | 50.0% |
| LiveCodeBench | Real-world coding tasks from recent competitions | 40.0% |
| HLE | Questions that challenge frontier models across many domains | 3.7% |
| SciCode | Scientific research coding and numerical methods | 17.7% |
Common questions about Gemini 2.5 Flash Lite
What is the context window size for Gemini 2.5 Flash Lite?
Gemini 2.5 Flash Lite supports a context window of up to 1 million tokens, which allows it to process very long documents or extended conversations in a single request.
Does Gemini 2.5 Flash Lite support reasoning?
Yes. The model includes optional reasoning capabilities that can be toggled on or off using controllable thinking budgets, so you can enable deeper reasoning for complex tasks or disable it to prioritize speed.
What is the training data cutoff for Gemini 2.5 Flash Lite?
Based on the available metadata, the model's training date is listed as July 2025.
What types of inputs does Gemini 2.5 Flash Lite accept?
The model supports multimodal inputs including text, images, video, audio, and documents, in addition to configurable runtime parameters via select and number input types.
What use cases is Gemini 2.5 Flash Lite best suited for?
It is designed for high-volume, latency-sensitive tasks such as translation, classification, document processing, coding assistance, and content moderation — scenarios where throughput and cost efficiency are priorities.
What people think about Gemini 2.5 Flash Lite
Community discussion around Gemini 2.5 Flash Lite is generally positive, with users highlighting the addition of Thinking, Live Audio, and Grounding as notable features for Google's most affordable model in the 2.5 family. The thread received 139 upvotes with minimal controversy, suggesting broad approval of the feature expansion.
Commenters focused primarily on the model's cost-efficiency and the practical value of optional reasoning at a low price point, with limited discussion of limitations. The small comment count (5) indicates the announcement was well-received but did not generate significant debate.
Parameters & options
Must be less than Max Response Size
Explore similar models
Start building with Gemini 2.5 Flash Lite
No API keys required. Create AI-powered workflows with Gemini 2.5 Flash Lite in minutes — free.