Text Generation Model

Gemini 2.0 Flash Lite

Speedy, cost-effective multimodal model for high-volume applications without compromising quality.

Start Building with Gemini 2.0 Flash Lite View All Models

Publisher

Google

Type Text

Context Window 1,048,576 tokens

Training Data June 2024

Input $0.08/MTok

Output $0.30/MTok

FASTLARGE CONTEXTCOST EFFECTIVE

Try Gemini 2.0 Flash Lite →

About Gemini 2.0 Flash Lite

Fast, affordable multimodal model for high-volume tasks

Gemini 2.0 Flash Lite is a multimodal text generation model developed by Google, released in early 2025 as part of the Gemini 2.0 model family. It is designed specifically for high-volume, cost-sensitive applications, offering a balance between response speed and output quality. The model supports a context window of over one million tokens (1,048,576), making it suitable for processing long documents or extended conversations in a single request.

Gemini 2.0 Flash Lite is best suited for developers and organizations that need to run large numbers of inference requests without incurring high costs. Its architecture prioritizes throughput and efficiency, making it a practical choice for tasks like summarization, classification, translation, and content generation at scale. The model's training data has a cutoff of June 2024, and it is accessible through Google's Vertex AI platform.

Capabilities

What Gemini 2.0 Flash Lite supports

Large Context Window

Processes up to 1,048,576 tokens in a single request, enabling analysis of long documents, codebases, or extended conversation histories without truncation.

Fast Inference

Optimized for low-latency responses, making it suitable for real-time applications and pipelines that require quick turnaround on text generation tasks.

Cost-Effective Scaling

Priced for high-volume usage, allowing developers to run large numbers of requests while keeping per-token costs low compared to larger model tiers.

Multimodal Input

Accepts text and image inputs within the same request, supporting tasks that combine visual and textual understanding such as image captioning or document analysis.

Text Generation

Generates coherent, contextually relevant text for use cases including summarization, translation, classification, and content drafting.

Structured Output

Supports JSON-mode responses, allowing developers to request structured data outputs suitable for downstream processing in applications and APIs.

Ready to build with Gemini 2.0 Flash Lite?

Get Started Free

Performance

Benchmark scores

Scores represent accuracy — the percentage of questions answered correctly on each test.

Benchmark	What it tests	Score
MMLU-Pro	Expert knowledge across 14 academic disciplines	72.4%
GPQA Diamond	PhD-level science questions (biology, physics, chemistry)	53.5%
MATH-500	Undergraduate and competition-level math problems	87.3%
AIME 2024	American math olympiad problems	27.7%
LiveCodeBench	Real-world coding tasks from recent competitions	18.5%
HLE	Questions that challenge frontier models across many domains	3.6%
SciCode	Scientific research coding and numerical methods	25.0%

FAQ

Common questions about Gemini 2.0 Flash Lite

What is the context window size for Gemini 2.0 Flash Lite?

Gemini 2.0 Flash Lite supports a context window of 1,048,576 tokens, which allows it to process very long documents or extended multi-turn conversations in a single request.

What is the training data cutoff for this model?

The model's training data has a cutoff of June 2024, meaning it does not have knowledge of events or information published after that date.

How is Gemini 2.0 Flash Lite priced?

Gemini 2.0 Flash Lite is positioned as a cost-effective option within the Gemini 2.0 family, designed for high-volume workloads. Specific pricing details are available on the Google Cloud Vertex AI pricing page.

What types of inputs does Gemini 2.0 Flash Lite support?

The model supports text and image inputs, making it a multimodal model capable of handling tasks that involve both written content and visual data.

Where can I access and deploy Gemini 2.0 Flash Lite?

Gemini 2.0 Flash Lite is available through Google's Vertex AI platform and via the Google AI Studio and Gemini API. Documentation is provided on the Google Cloud Vertex AI documentation site.

Community Discussion

What people think about Gemini 2.0 Flash Lite

Community discussions around the Gemini Flash family frequently highlight its cost efficiency, with one widely shared thread noting that LLMs can be dramatically cheaper than dedicated translation services like DeepL for high-volume language tasks. Users in the LocalLLaMA and singularity subreddits generally view the Flash Lite tier as a practical choice for production workloads where throughput and cost matter.

Some threads reflect broader concerns about Google's model deprecation cadence, with users noting that older Gemini models are being retired on relatively short timelines. Discussions also touch on how Flash Lite fits into the wider Gemini ecosystem as Google continues to release newer model versions.

r/LocalLLaMA 909 pts 124 comments

Why nobody mentioned "Gemini Diffusion" here? It's a BIG deal

r/LocalLLaMA 595 pts 188 comments

LLMs are 800x Cheaper for Translation than DeepL

r/singularity 219 pts 47 comments

Google Gemini 3.1 Pro Preview Soon?

r/singularity 373 pts 51 comments

Google is depreciating these models by Nov 18th. Gemini 3 soon?

r/LocalLLaMA 194 pts 56 comments

Price performance comparison from the Gemini 2.5 Paper

View more discussions →

Resources