Skip to main content
MindStudio
Pricing
Blog About
My Workspace
Text Generation Model

Gemini 2.5 Flash Lite

Google's fastest and most efficient Gemini 2.5 model, delivering high-quality AI performance at scale with optional reasoning capabilities.

Publisher Google
Type Text
Context Window 1,000,000 tokens
Training Data July 2025
Input $0.10/MTok
Output $0.40/MTok
FAST

Google's fastest, most cost-efficient Gemini 2.5 model

Gemini 2.5 Flash Lite is Google's most cost-efficient model in the Gemini 2.5 family, designed for high-volume, latency-sensitive workloads. It supports a 1 million-token context window and includes optional reasoning capabilities that can be toggled on or off via controllable thinking budgets, allowing developers to balance speed and depth depending on the task. The model also supports Grounding with Google Search, Code Execution, and URL Context as built-in features.

Gemini 2.5 Flash Lite is well-suited for production applications that require processing large numbers of requests efficiently, such as document classification, real-time translation, content moderation, and coding assistance. Its multimodal input support and broad benchmark coverage across coding, math, science, and reasoning tasks make it a practical choice for developers building scalable AI pipelines where cost and throughput are primary constraints.

What Gemini 2.5 Flash Lite supports

Low Latency Responses

Optimized for speed-sensitive workloads, delivering responses faster than previous Flash-Lite generations across a broad range of prompt types.

1M Token Context

Supports a context window of up to 1 million tokens, enabling processing of long documents, codebases, or extended conversation histories in a single request.

Optional Reasoning

Includes native reasoning that can be enabled or disabled via controllable thinking budgets, letting developers trade off latency against depth of reasoning per task.

Grounding with Search

Supports Grounding with Google Search, allowing the model to anchor responses in up-to-date web information during inference.

Code Execution

Built-in Code Execution capability allows the model to write and run code as part of a response, returning computed results directly.

Multimodal Input

Accepts text, images, video, audio, and document inputs, supporting tasks that combine multiple modalities in a single prompt.

Configurable Parameters

Exposes select and number input types for runtime configuration, enabling fine-grained control over model behavior such as thinking budget settings.

Ready to build with Gemini 2.5 Flash Lite?

Get Started Free

Benchmark scores

Scores represent accuracy — the percentage of questions answered correctly on each test.

Benchmark What it tests Score
MMLU-Pro Expert knowledge across 14 academic disciplines 72.4%
GPQA Diamond PhD-level science questions (biology, physics, chemistry) 47.4%
MATH-500 Undergraduate and competition-level math problems 92.6%
AIME 2024 American math olympiad problems 50.0%
LiveCodeBench Real-world coding tasks from recent competitions 40.0%
HLE Questions that challenge frontier models across many domains 3.7%
SciCode Scientific research coding and numerical methods 17.7%

Common questions about Gemini 2.5 Flash Lite

What is the context window size for Gemini 2.5 Flash Lite?

Gemini 2.5 Flash Lite supports a context window of up to 1 million tokens, which allows it to process very long documents or extended conversations in a single request.

Does Gemini 2.5 Flash Lite support reasoning?

Yes. The model includes optional reasoning capabilities that can be toggled on or off using controllable thinking budgets, so you can enable deeper reasoning for complex tasks or disable it to prioritize speed.

What is the training data cutoff for Gemini 2.5 Flash Lite?

Based on the available metadata, the model's training date is listed as July 2025.

What types of inputs does Gemini 2.5 Flash Lite accept?

The model supports multimodal inputs including text, images, video, audio, and documents, in addition to configurable runtime parameters via select and number input types.

What use cases is Gemini 2.5 Flash Lite best suited for?

It is designed for high-volume, latency-sensitive tasks such as translation, classification, document processing, coding assistance, and content moderation — scenarios where throughput and cost efficiency are priorities.

What people think about Gemini 2.5 Flash Lite

Community discussion around Gemini 2.5 Flash Lite is generally positive, with users highlighting the addition of Thinking, Live Audio, and Grounding as notable features for Google's most affordable model in the 2.5 family. The thread received 139 upvotes with minimal controversy, suggesting broad approval of the feature expansion.

Commenters focused primarily on the model's cost-efficiency and the practical value of optional reasoning at a low price point, with limited discussion of limitations. The small comment count (5) indicates the announcement was well-received but did not generate significant debate.

View more discussions →

Parameters & options

Max Temperature 2
Max Response Size 65,535 tokens
Thinking Budget Select
Default: auto
OffManualAuto
Thinking Budget Limit Number

Must be less than Max Response Size

Range: 1–24576

Start building with Gemini 2.5 Flash Lite

No API keys required. Create AI-powered workflows with Gemini 2.5 Flash Lite in minutes — free.