How to Run Gemma 4 Locally on Your Phone or Laptop With the Google AI Edge Gallery

What the Google AI Edge Gallery Actually Does

Running a large language model locally used to mean a dedicated GPU, a Linux terminal full of dependencies, and a weekend of troubleshooting. Google’s AI Edge Gallery changes that. It’s a free Android app that lets you download and run Gemma 4 models directly on your device — no internet connection required, no API keys, no cloud costs.

This matters for a few reasons. Your prompts stay on your device. You can use the model on a plane or in a basement with no signal. And once the model is downloaded, inference is completely free, forever.

This guide covers everything you need to get Gemma 4 running locally: what the app supports, how to install it, which model to pick for your hardware, and how to get the most out of it — including options for running Gemma 4 on a laptop if you want more power than a phone can offer.

What Is Gemma 4?

Gemma 4 is Google’s latest generation of open-weight language models, released at Google I/O 2025. Unlike the Gemini family, which runs in Google’s cloud, Gemma models are designed to be downloaded and run locally on consumer hardware.

The Gemma 4 family includes several size variants:

Gemma 4 1B — Tiny model optimized for phones and edge devices. Fast inference, lower accuracy.
Gemma 4 4B — Sweet spot for most modern Android phones with 8GB+ RAM.
Gemma 4 12B — Better reasoning, better at complex tasks. Needs a capable phone or a laptop with 16GB+ RAM.
Gemma 4 27B — Near-frontier quality. Requires a laptop with a solid GPU or 32GB+ RAM.

Hermes Crash Course — free 1-hour live workshop

Gemma 4 also introduces multimodal capabilities in some variants — meaning you can pass an image along with your text prompt and the model can reason about both. This is a big step up from earlier Gemma generations, which were text-only. For a fuller multimodal alternative that scales beyond on-device hardware, DeepSeek V4’s vision capabilities offer cheaper multimodal AI workflows — the same idea but optimized for cost-per-output in cloud pipelines rather than local privacy.

Google released Gemma 4 under an open license that allows personal and commercial use, with some restrictions. You can check the Gemma terms of use on Google’s model page for the specifics.

What Is the Google AI Edge Gallery?

The Google AI Edge Gallery is an open-source Android app built by Google’s AI Edge team. It serves two purposes:

A demo app for developers to see what on-device AI can do
A functional tool for anyone who wants to run Gemma models locally

The app is built on top of Google’s AI Edge SDK, which handles the low-level work of loading models, managing memory, and running inference efficiently on mobile hardware. Under the hood, it uses LiteRT (formerly TensorFlow Lite) and MediaPipe to optimize model execution for the GPU or NPU on your device.

The app currently works on Android (version 10 or later). A version for iOS is in active development but as of mid-2025 is not yet available on the App Store. For iPhone users, alternatives like Ollama running on a laptop connected to the same network, or the MLC Chat app, fill the gap in the meantime.

For laptops, Google AI Edge Gallery doesn’t have a native desktop app — but you have solid alternatives for running Gemma 4 locally, which are covered in the laptop section below.

Before You Start: Hardware and Storage Requirements

Local AI models are large files, and running them uses real resources. Before you download anything, check that your setup meets the minimums.

Android Phone Requirements

Model Size	Minimum RAM	Storage Needed	Recommended Device
Gemma 4 1B	4 GB	~1.5 GB	Most Android phones from 2021+
Gemma 4 4B	6 GB	~3–4 GB	Pixel 7 series, Samsung S23+
Gemma 4 12B	12 GB	~8–10 GB	Samsung S24 Ultra, Pixel 9 Pro
Gemma 4 27B	Not recommended	—	Use a laptop instead

Phones with a dedicated NPU (Neural Processing Unit) — like Qualcomm Snapdragon 8 Gen 2 or newer, or Google’s Tensor chips — will run inference noticeably faster and with better battery efficiency.

Laptop Requirements

For running Gemma 4 on a laptop via tools like Ollama:

Gemma 4 4B: Works fine on most modern laptops with 8GB RAM. CPU-only inference is slow but usable.
Gemma 4 12B: Needs 16GB RAM. GPU acceleration (Apple Silicon, NVIDIA, AMD) makes a big difference.
Gemma 4 27B: 32GB RAM minimum. You’ll want GPU acceleration here.

Apple Silicon Macs (M1/M2/M3/M4) are particularly well-suited because they share RAM between CPU and GPU, and tools like Ollama use Metal acceleration by default.

How to Install and Run Gemma 4 on Android

Step 1: Get the App

The Google AI Edge Gallery is not currently on the Google Play Store. You install it as an APK (sideloading). Here’s how:

On your Android device, go to Settings → Apps → Special App Access → Install Unknown Apps.
Find your browser (Chrome or Firefox) and toggle “Allow from this source.”
Visit the Google AI Edge Gallery GitHub releases page in your mobile browser.
Download the latest .apk file.
Once downloaded, tap the file in your notifications or file manager and follow the prompts to install.

Other agents ship a demo. Remy ships an app.

React + Tailwind ✓ LIVE

API

REST · typed contracts ✓ LIVE

DATABASE

real SQL, not mocked ✓ LIVE

AUTH

roles · sessions · tokens ✓ LIVE

DEPLOY

git-backed, live URL ✓ LIVE

Real backend. Real database. Real auth. Real plumbing. Remy has it all.

This should take about two minutes on a decent connection.

Step 2: Open the App and Accept Permissions

When you first open AI Edge Gallery, it will ask for storage permissions to save model files. Grant these. Without storage access, the app can’t download or load models.

The main screen shows several categories:

Ask Image — Vision tasks (describe an image, answer questions about a photo)
AI Chat — Standard text conversation
Summarize — Paste text and get a summary
Smart Reply — Generate reply suggestions

For most users, AI Chat is where you’ll spend the most time.

Step 3: Download a Gemma 4 Model

Tap AI Chat.
The app will prompt you to select and download a model. Tap Get Models.
You’ll see a list of available models with their sizes. Gemma 4 models are labeled clearly.
Select the model that fits your device. If you have 8GB RAM, start with Gemma 4 4B.
Tap Download. The download runs in the background — larger models will take several minutes on a fast connection.

You can download multiple models and switch between them. Each downloaded model persists on your device so you don’t have to re-download it.

Step 4: Start Chatting

Once the download completes:

Tap the model name to load it. First load takes 10–30 seconds depending on model size and device.
Type your message in the chat box and hit send.
The model generates a response entirely on your device. No data leaves your phone.

You’ll notice the first response is often the slowest — the model warms up as it runs. Subsequent messages in the same session are usually faster.

Step 5: Try the Vision Features (Gemma 4 Multimodal)

If you downloaded a multimodal variant of Gemma 4:

Go back to the main menu and tap Ask Image.
Select or take a photo.
Type a question about the image (e.g., “What’s in this image?” or “Is there any text I should read?”).
The model analyzes the image locally and responds.

This works without any internet connection and without your image being sent to any server.

How to Run Gemma 4 on a Laptop

The Google AI Edge Gallery doesn’t have a native desktop app, but Ollama is the fastest way to run Gemma 4 locally on a Mac, Windows PC, or Linux machine. The setup takes about five minutes.

Install Ollama

Go to ollama.com and download the installer for your OS.
Run the installer. On Mac, drag it to your Applications folder. On Windows, run the .exe. On Linux, use the install script from the site.
Ollama runs as a background service — no command line needed after initial setup, unless you prefer it.

Pull the Gemma 4 Model

Open your terminal (or use the Ollama desktop app if available):

ollama pull gemma4:4b

Replace 4b with 12b or 27b if your machine can handle it. Ollama downloads the model and stores it locally. The command ollama list shows everything you’ve downloaded.

Run It

ollama run gemma4:4b

✗ VIBE-CODED APP

Tangled. Half-built. Brittle.

✓ AN APP, MANAGED BY REMY

UIReact + Tailwind✓

APIValidated routes✓

DBPostgres + auth✓

DEPLOYProduction-ready✓

Architected. End to end.

Built like a system. Not vibe-coded.

Remy manages the project — every layer architected, not stitched together at the last second.

This opens an interactive chat session in your terminal. Type your prompt and hit Enter. To exit, type /bye.

If you prefer a browser-based UI, tools like Open WebUI (formerly Ollama WebUI) wrap Ollama in a clean chat interface that runs locally in your browser. It takes about five minutes to set up with Docker.

Laptop Performance Tips

Apple Silicon Macs: Ollama uses Metal by default. Performance on M2/M3/M4 chips is excellent, even for 12B models.
NVIDIA GPUs: Ollama uses CUDA automatically if a compatible GPU is detected. Make sure your NVIDIA drivers are up to date.
CPU-only inference: Works, but expect slow responses on larger models. The 4B model is the practical ceiling for CPU-only setups.
Close other applications: Free up RAM before loading large models. Each billion parameters needs roughly 500MB–1GB of RAM.

Choosing the Right Model for Your Task

Not every model is right for every job. Here’s a quick guide:

Gemma 4 1B Best for: Simple Q&A, basic summarization, quick lookups. Not great for complex reasoning or long contexts. Use this if battery life and speed matter more than output quality.

Gemma 4 4B Best for: Most everyday tasks — writing help, coding assistance, research questions, summarizing articles. Good balance of speed and quality on a modern phone or laptop.

Gemma 4 12B Best for: More nuanced reasoning, longer documents, better code generation, tasks where you notice the 4B model making errors. Requires a capable device.

Gemma 4 27B Best for: Tasks where you’d normally reach for a frontier model like GPT-4 or Claude. On a capable laptop with GPU, quality approaches cloud model territory. Not practical on phones.

Common Issues and Fixes

The model download keeps failing. Check your storage space first — models need free space plus some headroom. Try downloading on Wi-Fi rather than mobile data. If the app crashes during download, delete the partial file from your Downloads folder and try again.

The model loads but responses are very slow. This is normal for larger models on slower devices. Try the next size down. Also close background apps to free up RAM. On Android, clearing cached apps can help.

The app crashes when loading the model. Your device may not have enough RAM. Try a smaller model. On phones with 6GB RAM, Gemma 4 4B is usually the practical maximum.

The model gives odd or repetitive outputs. Try clearing the chat history and starting a new session. If the issue persists across sessions, delete and re-download the model — corrupted downloads can cause strange behavior.

The vision features aren’t available. Not all Gemma 4 variants support images. Check the model description in the app — multimodal variants are labeled. You may need to download a different model file.

Where MindStudio Fits If You Want More Than Local Inference

Plans first. Then code.

PROJECTYOUR APP

SCREENS12

DB TABLES6

BUILT BYREMY

1280 px · TYP.

yourapp.msagent.ai

A · UI · FRONT END

Remy writes the spec, manages the build, and ships the app.

Running Gemma 4 locally is great for private, offline use. But if you want to build something with an AI model — an actual tool, workflow, or agent — a local setup creates friction fast. You’re limited to one device, you can’t connect to external data sources easily, and sharing what you build requires others to replicate your setup.

That’s where MindStudio is worth knowing about. It’s a no-code platform for building AI agents and automated workflows, and it gives you access to 200+ models — including the full Gemini family, Claude, GPT-4o, and others — without managing any infrastructure yourself. No API keys, no installs, no model management.

If you’ve been experimenting with Gemma 4 locally and want to turn that into something repeatable — like an AI tool that processes documents, answers questions from a knowledge base, or connects to your existing business tools — MindStudio lets you build that in an afternoon. The average agent takes 15 minutes to an hour to set up using the visual builder.

You can try MindStudio free at mindstudio.ai. If you’re already thinking about building your own AI assistant or automated workflow, the platform removes the infrastructure overhead entirely so you can focus on what the agent should actually do.

For teams that need Gemini models specifically — MindStudio includes Gemini 1.5 Pro, Gemini 2.0, and the latest Gemini 2.5 models, all accessible through the same no-code interface. If you’re comparing Gemini models for your use case, MindStudio lets you test them side by side without separate API accounts.

Frequently Asked Questions

Is the Google AI Edge Gallery available on iPhone?

Not yet as of mid-2025. The app is Android-only. Google has indicated an iOS version is in development, but there’s no confirmed release date. iPhone users who want to run Gemma 4 locally can use Ollama on a nearby Mac and access it over their local network, or wait for an iOS-native solution.

Does running Gemma 4 locally use a lot of battery?

Yes, inference is computationally intensive and will drain your battery faster than typical app use. Devices with dedicated NPUs handle this more efficiently — a Pixel 9 Pro running Gemma 4 4B will use noticeably less battery than a device running pure CPU inference. If you plan to use it for extended sessions, plug in or expect shorter battery life.

Are my conversations private when using AI Edge Gallery?

Yes. The model runs entirely on your device. No prompts, no responses, and no images you share with the app are sent to Google or any server. The only network activity is the initial model download. After that, you can put your phone in airplane mode and the model will still work.

Can I use Gemma 4 for commercial projects if I run it locally?

Gemma 4 is released under Google’s Gemma Terms of Use, which allows commercial use for most users. There are some restrictions — particularly for large-scale deployments. Read the terms on Google’s AI developer site before building a product on top of the model.

How does Gemma 4 compare to running ChatGPT or Claude on my phone?

ChatGPT and Claude require a live internet connection and send your data to their servers. Gemma 4 runs entirely on-device after the model is downloaded. In terms of raw capability, frontier cloud models like GPT-4o or Claude 3.5 Sonnet still outperform Gemma 4 27B on most benchmarks — but they’re not free and they’re not private. For many everyday tasks, Gemma 4 4B or 12B is good enough, and the privacy and offline benefits are real.

What’s the difference between Google AI Edge Gallery and Google AI Studio?

These are completely separate products. Google AI Studio (aistudio.google.com) is a web-based tool for accessing Gemini models through the cloud — it’s fast and capable but requires an internet connection and sends data to Google’s servers. Google AI Edge Gallery is a local inference app — models run on your device, offline. They serve different use cases and can coexist.

Key Takeaways

Google AI Edge Gallery is a free Android app that runs Gemma 4 models fully on-device — no internet, no API costs, no data leaving your phone.
Gemma 4 comes in 1B, 4B, 12B, and 27B sizes. The 4B model is the best starting point for most modern Android phones.
For laptops, Ollama is the easiest path to running Gemma 4 locally. Apple Silicon Macs and NVIDIA GPU machines get the best performance.
Multimodal variants of Gemma 4 can analyze images locally — a genuinely useful capability for offline, private use.
The tradeoff is real: local models offer privacy and no ongoing cost, but setup takes a few steps, and quality at smaller sizes is below frontier cloud models.
If you want to build with AI rather than just run raw inference, MindStudio gives you access to the full Gemini family (and 200+ other models) through a no-code builder at mindstudio.ai — no model management required.