What Is the Google AI Edge Gallery? How to Run LLMs Offline on Your iPhone
Google AI Edge Gallery is a free iOS app that runs Gemma models fully on-device with no internet required. Here's what it can do and how to get it.
Running a Local LLM on Your iPhone Is Now Surprisingly Simple
Most AI tools assume you’re online. They send your prompts to a server somewhere, process them in the cloud, and send back a response. That works fine most of the time — but it means your data leaves your device, you need a reliable connection, and you’re dependent on someone else’s infrastructure.
Google AI Edge Gallery flips that model entirely. It’s a free app that lets you run Gemma large language models directly on your iPhone or Android device, with no internet connection required. Your data never leaves your phone. There’s no API call, no server, no account needed.
This article explains what Google AI Edge Gallery is, how it works, what you can actually do with it, and how to get it running on iOS.
What Google AI Edge Gallery Actually Is
Google AI Edge Gallery is a mobile app built by Google to showcase on-device AI capabilities. It lets you download and run lightweight AI models — specifically from the Gemma family — directly on your smartphone’s hardware.
Think of it as a local AI sandbox. You download a model to your device once, and then everything runs entirely on-device. There’s no cloud backend involved in inference.
The app is part of Google’s broader AI Edge SDK ecosystem, which is a set of tools and runtimes designed to make it easier to deploy machine learning models on edge devices like phones, tablets, and embedded hardware.
Who It’s For
Google positions the app primarily for developers and AI enthusiasts who want to experiment with on-device LLMs. But you don’t need to write any code to use it — the app has a clean chat interface that anyone can pick up.
It’s useful if you:
- Want to test how Gemma models perform on mobile hardware
- Need an AI assistant that works offline (on planes, in areas with poor signal, etc.)
- Are concerned about data privacy and don’t want prompts sent to external servers
- Are a developer building apps on Google’s AI Edge stack and want to test models before deploying them
The Gemma Models Behind It
Gemma is Google’s family of open-weight language models. They’re designed to be smaller and more efficient than frontier models like Gemini Ultra — the tradeoff being capability for deployability. The models available in the gallery are small enough to run on consumer mobile hardware without melting your battery.
As of mid-2025, the app supports Gemma 3 variants in the 1B and 3B parameter range. These are quantized versions optimized for mobile inference, so they run faster and use less memory than the full-precision models.
What You Can Do With It
Google AI Edge Gallery isn’t just a chatbot. It includes a few distinct task types, each designed to show off different on-device AI capabilities.
Text Generation and Chat
The most straightforward use case: type a prompt, get a response. You can have a multi-turn conversation with the model, ask it to explain things, write drafts, summarize content, or work through problems.
Because everything runs locally, response speed depends entirely on your device hardware. Newer iPhones with Apple’s Neural Engine tend to perform better than older models. Expect response speeds that are slower than ChatGPT or Gemini on the web, but fast enough to be usable.
Ask Image
This feature lets you take or upload a photo and ask the model questions about it. It’s a multimodal capability — the model can interpret both visual and text inputs.
You could use it to:
- Identify objects or text in a photo
- Ask for a description of what’s in an image
- Get suggestions based on something you’ve photographed
Again, all of this happens on-device. The photo never leaves your phone.
Ask Audio (Android)
On Android, there’s an audio feature that lets you speak a prompt or upload an audio file and have the model process it. This feature was not available on iOS at initial launch, so iPhone users are limited to text and image inputs.
Prompt Lab
There’s also a Prompt Lab mode that gives you more control over how the model responds — things like system prompts, temperature settings, and top-K sampling. This is mostly useful for developers who want to understand how different parameters affect model output.
How to Get Google AI Edge Gallery on iPhone
The app is free and available on the App Store. Here’s how to set it up:
Step 1: Download the app
Search for “Google AI Edge Gallery” in the App Store, or follow the link from Google’s AI Edge developer page. Install it like any other app.
Step 2: Open the app and choose a task
When you first open it, you’ll see the available task types: chat, Ask Image, and Prompt Lab. Tap the one you want to try.
Step 3: Download a model
Before you can use any feature, you need to download a model to your device. The app will prompt you to do this when you select a task. Models are several hundred MB to a few GB in size, so download over Wi-Fi.
The download happens once. After that, the model is stored locally and you can use it offline indefinitely.
Step 4: Start prompting
Once the model is downloaded, you’re ready to go. Type your prompt in the chat interface and wait for the response. The first inference on a cold start may take a few seconds longer as the model loads into memory.
Requirements and Compatibility
- Requires iOS 16 or later
- Works best on iPhone 12 or newer (older devices may experience slower inference)
- Needs several GB of free storage for model downloads
- No account, login, or internet connection required after initial setup
How On-Device AI Actually Works
Understanding what’s happening under the hood helps set realistic expectations.
When you run a model in Google AI Edge Gallery, inference happens on your device’s processor — specifically using the Neural Processing Unit (NPU) on newer iPhones and Android devices. Google’s runtime (LiteRT, formerly TensorFlow Lite) handles the actual model execution.
The models are quantized, meaning the numerical precision of the model weights is reduced from 32-bit or 16-bit floating point down to 4-bit or 8-bit integers. This makes the model much smaller and faster without dramatically reducing quality. It’s why a 3B parameter model can fit on a phone at all.
The practical result: you get a model that’s genuinely capable of useful text and reasoning tasks, but it won’t match the output quality of much larger cloud-based models on complex tasks. For simple Q&A, summarization, drafting, and image description, it performs well. For nuanced analysis, long-form writing, or anything that benefits from a much larger context window and more parameters, you’ll notice the limits.
Privacy and Offline: The Real Advantages
The headline advantages of on-device AI are privacy and offline access.
Privacy: When your prompts never leave your device, there’s no data retention policy to worry about, no server logs, no training on your inputs. For sensitive use cases — medical notes, legal drafts, personal journaling — that’s a meaningful difference.
Offline access: Once the model is downloaded, it works anywhere. No Wi-Fi, no cellular data, no problem. That’s useful in contexts like flights, remote areas, or environments where internet access is restricted or unreliable.
Speed (sometimes): There’s no network latency, which can actually make on-device AI faster than cloud-based AI for short responses on a fast device. The bottleneck is computation, not connectivity.
The Limitations You Should Know About
On-device AI with Google AI Edge Gallery is genuinely impressive for what it is. But there are real constraints worth understanding before you rely on it.
Model capability ceiling: The Gemma 1B and 3B models are small by modern standards. They’ll struggle with complex multi-step reasoning, nuanced instruction-following, and tasks that benefit from massive pretraining scale. Don’t expect GPT-4 performance.
Context window: Small models typically support shorter context windows, meaning you can’t feed them very long documents and expect coherent responses across the whole thing.
No tool use or web access: The model is isolated. It can’t browse the web, run code, call APIs, or take actions outside the chat interface.
Storage and download size: Model files are large. Depending on which model you download, you’re looking at 1–4 GB of storage. That’s not nothing on a device where photos and apps compete for space.
Device heat and battery: Running a model locally is computationally intensive. Extended use will warm your device and drain the battery faster than typical app usage.
iOS feature gap: Some features available on Android (like audio input) aren’t yet available on iOS.
Where Cloud-Based AI Agents Fill the Gap
On-device LLMs like those in Google AI Edge Gallery are great for privacy-first, offline, single-task use. But they’re fundamentally isolated — the model sits on your phone and can’t connect to anything else.
If you need AI that can take actions, connect to your tools, and run multi-step workflows, you need something different.
That’s where platforms like MindStudio come in. MindStudio is a no-code builder for AI agents that can actually do things — send emails, update CRM records, generate images, query databases, or run entire automated workflows triggered by a schedule or an incoming message.
The key difference: MindStudio agents don’t just respond to prompts. They act on them. You can build an agent that reads an incoming email, extracts key information, updates a Salesforce record, drafts a reply, and sends it — all without you touching it.
MindStudio also gives you access to over 200 AI models including Gemini (Google’s flagship models, not just the small on-device variants), Claude, GPT-4o, and others — all through one interface, no API keys required. So if you want the full capability of Gemini 2.5 Pro rather than Gemma 3 1B, you can get it without managing your own API credentials.
For developers building more complex systems, MindStudio’s Agent Skills Plugin lets any external AI agent — including Claude Code or custom LangChain agents — call MindStudio’s 120+ capabilities as simple method calls, handling the infrastructure layer so the AI can focus on reasoning.
On-device AI and cloud AI aren’t really competing. They serve different needs. Google AI Edge Gallery is the right choice when you need offline access and privacy. When you need power, integrations, and multi-step automation, cloud-based agents handle what local models can’t.
You can try MindStudio free at mindstudio.ai.
Google AI Edge Gallery vs. Other On-Device AI Options
Google isn’t the only player in on-device AI. Here’s how the landscape looks:
Apple Intelligence
Apple has built AI features directly into iOS 18 — writing tools, image generation, and an enhanced Siri. These run on-device for most tasks (with some cloud fallback). The difference: Apple Intelligence is baked into the OS, not a standalone app. You can’t run arbitrary prompts through a chat interface the way you can with Google AI Edge Gallery.
Ollama (Desktop)
Ollama is a popular tool for running open-source LLMs locally on your Mac or PC. It supports a much wider range of models than Google AI Edge Gallery (Llama, Mistral, Phi, Gemma, and many others), and it’s generally the go-to for serious local AI experimentation on desktop. But it doesn’t run on mobile.
LM Studio (Desktop)
Similar to Ollama, LM Studio provides a polished desktop interface for running local models. Again, desktop-only.
Jan.ai
Another desktop option for running models locally, with a clean interface. Mobile support is limited.
The short version: for on-device AI specifically on iPhone, Google AI Edge Gallery is currently one of the most accessible options. Apple Intelligence is more integrated but less flexible. Desktop tools like Ollama offer much more model variety but require a computer.
Frequently Asked Questions
Is Google AI Edge Gallery free?
Yes. The app is free to download and use. There are no subscriptions, in-app purchases, or usage limits. The models themselves are free to download as well, since they’re open-weight models from the Gemma family.
Does Google AI Edge Gallery work without internet?
Yes — after the initial model download, everything runs entirely on-device with no internet connection required. You can use it on a plane in airplane mode, in areas without cell coverage, or with Wi-Fi disabled entirely.
What’s the difference between Gemma and Gemini?
Gemma and Gemini are both AI model families from Google, but they serve different purposes. Gemini (including Gemini 1.5 Pro, Gemini 2.0, and Gemini 2.5) are Google’s flagship large models, designed for maximum capability and accessed via cloud APIs. Gemma models are smaller, open-weight versions designed for efficiency — they can run on consumer hardware like phones and laptops. Google AI Edge Gallery uses Gemma models because they’re small enough to fit and run on a mobile device.
Is my data private when using Google AI Edge Gallery?
Yes. Because inference runs entirely on your device, your prompts and any images you analyze never leave your phone. There’s no server receiving your input. This is one of the primary advantages of on-device AI over cloud-based AI assistants.
What iPhone models are compatible with Google AI Edge Gallery?
The app requires iOS 16 or later. It works on older devices, but performance will vary. Devices with Apple’s newer Neural Engine chips (iPhone 12 and later) handle model inference significantly better. For the smoothest experience, iPhone 14 or newer is recommended.
How does on-device AI compare to ChatGPT or Gemini?
On-device models in Google AI Edge Gallery are much smaller than the models powering ChatGPT or Gemini’s web interface. They’re capable for many everyday tasks — drafting, summarizing, Q&A, image description — but they’ll fall short on complex reasoning, very long documents, or tasks that benefit from larger training scale. The tradeoff is privacy, offline access, and no dependency on external services.
Key Takeaways
- Google AI Edge Gallery is a free iOS and Android app that runs Gemma language models entirely on your device — no internet, no account, no cloud.
- It supports text chat, multimodal image analysis (Ask Image), and a Prompt Lab for parameter experimentation.
- Setting it up takes minutes: download the app, download a model, start prompting.
- The main advantages are privacy, offline access, and zero dependency on external services.
- The main limitations are model capability, no tool use or integrations, and higher storage and battery demands.
- For AI that can connect to tools, take actions, and run multi-step workflows, cloud-based platforms like MindStudio cover what on-device AI can’t — including access to full-scale Gemini models, 1,000+ integrations, and no-code agent building. You can get started free.