Vision Model

Gemini 2.0 Flash-Lite Vision

Gemini 2.0 Flash-Lite is our fastest and most cost efficient Flash model. It's an upgrade path for 1.5 Flash users who want better quality for the same price and speed.

Start Building with Gemini 2.0 Flash-Lite Vision View All Models

Publisher

Google

Type Vision

Context Window 1,048,576 tokens

Training Data June 2024

Input $0.08/MTok

Output $0.30/MTok

Try Gemini 2.0 Flash-Lite Vision →

About Gemini 2.0 Flash-Lite Vision

Fast, cost-efficient vision model from Google

Gemini 2.0 Flash-Lite Vision is a multimodal model developed by Google, designed to process both visual and textual inputs. It belongs to the Gemini 2.0 Flash family and is positioned as the fastest and most cost-efficient option within that lineup. The model supports a context window of over one million tokens, making it suitable for tasks that require processing large amounts of information in a single request. It was trained on data up to June 2024.

This model is intended as an upgrade path for users of Gemini 1.5 Flash who want improved output quality without changes to cost or latency. Its vision capabilities allow it to handle image understanding tasks alongside text-based workflows. The combination of speed, large context support, and multimodal input handling makes it well-suited for applications such as document analysis, image captioning, and high-throughput pipelines where cost efficiency is a priority.

Capabilities

What Gemini 2.0 Flash-Lite Vision supports

Vision Understanding

Processes and interprets image inputs alongside text, enabling tasks like image captioning, visual question answering, and scene description.

Large Context Window

Supports up to 1,048,576 tokens in a single context, allowing long documents, multi-image inputs, or extended conversations to be processed together.

Multimodal Input

Accepts combinations of text and image inputs in a single request, enabling workflows that mix visual and textual data.

High-Speed Inference

Optimized for low-latency responses, making it suitable for real-time or high-throughput production applications.

Text Generation

Generates coherent text responses based on visual and textual prompts, supporting summarization, Q&A, and content extraction tasks.

Document Analysis

Can process long-form documents or multi-page inputs within its million-token context window, extracting structured information or answering questions about content.

Ready to build with Gemini 2.0 Flash-Lite Vision?

Get Started Free

Performance

Benchmark scores

Scores represent accuracy — the percentage of questions answered correctly on each test.

Benchmark	What it tests	Score
MMLU-Pro	Expert knowledge across 14 academic disciplines	72.4%
GPQA Diamond	PhD-level science questions (biology, physics, chemistry)	53.5%
MATH-500	Undergraduate and competition-level math problems	87.3%
AIME 2024	American math olympiad problems	27.7%
LiveCodeBench	Real-world coding tasks from recent competitions	18.5%
HLE	Questions that challenge frontier models across many domains	3.6%
SciCode	Scientific research coding and numerical methods	25.0%

FAQ

Common questions about Gemini 2.0 Flash-Lite Vision

What is the context window size for Gemini 2.0 Flash-Lite Vision?

Gemini 2.0 Flash-Lite Vision supports a context window of 1,048,576 tokens, allowing very large inputs to be processed in a single request.

What is the knowledge cutoff date for this model?

The model's training data has a cutoff of June 2024, meaning it does not have knowledge of events or information published after that date.

What types of inputs does Gemini 2.0 Flash-Lite Vision accept?

The model accepts both image and text inputs, making it a multimodal model capable of handling visual understanding tasks alongside standard text-based prompts.

Who is this model intended for?

According to Google's description, it is designed as an upgrade path for Gemini 1.5 Flash users who want better output quality at the same price and speed.

Where can I access or deploy Gemini 2.0 Flash-Lite Vision?

The model is available through Google Cloud's Vertex AI platform. Documentation for deployment and usage can be found at the official Vertex AI documentation page.

Resources