Google Gemini AI Glasses Explained: Audio Version vs Display Version and What's Actually Shipping
Google's Gemini AI glasses come in two versions. Here's what the audio-only pair launching this fall can do and what the display version offers.
Two Very Different Products Under One Name
Google Gemini AI glasses aren’t one thing — they’re two. And mixing them up leads to real confusion about what you can actually buy and when.
At Google I/O 2025, Google announced both a near-term wearable product and a longer-horizon augmented reality device, both branded around Gemini. The near-term product is an audio-focused pair of smart glasses without a display. The longer-term device is a full Android XR headset with a heads-up display built into the lens.
If you’ve seen headlines about Google’s AI glasses and weren’t sure which version was being discussed, this article breaks down the differences clearly — hardware, capabilities, availability, and what Gemini actually does in each one.
What Google Announced at I/O 2025
Google used its 2025 developer conference to outline a two-track approach to AI wearables.
The first track is immediate: stylish, camera-equipped glasses running Gemini in audio-only mode. Google announced partnerships with eyewear brands Warby Parker and Gentle Monster to design the frames. These glasses look like regular eyewear. There’s no display — just speakers, a microphone, a camera, and a Gemini-powered assistant listening and responding through audio.
The second track is further out: Android XR glasses with a transparent display built into one lens. These are closer to what most people picture when they imagine AR glasses — a small overlay you can see while still seeing the world. This product is more technically ambitious and doesn’t have a firm public release window.
Both products are powered by Project Astra, Google’s real-time multimodal AI framework that can process what you see, hear, and say in near-real-time.
The Audio-Only Version: What’s Actually Shipping
The audio glasses are the near-term product. Google has been fairly specific about what these do.
Hardware and Design
The frames are designed to look like normal glasses. Google partnered with Warby Parker and Gentle Monster specifically to avoid the “tech product” aesthetic that has hurt previous smart glasses — most famously Google Glass.
The hardware includes:
- A small camera for visual input
- Microphones to capture voice queries and ambient audio
- Speakers built into the arms of the frame
- Bluetooth connectivity to a paired phone
- A battery designed to last through a day of typical use
There’s no display. Nothing is projected onto the lens. Visual output from Gemini — directions, answers, translations — comes through audio only.
What Gemini Can Do in Audio Mode
The audio glasses act as an always-available Gemini assistant you can talk to hands-free. Key capabilities include:
Live translation — Gemini can translate conversations in real time, speaking the translation through the glasses speakers as the other person talks. This is one of the clearest use cases for glasses specifically, since it keeps your eyes and hands free.
Contextual Q&A — You can ask Gemini questions about what you’re looking at. Point the glasses at a menu, a sign, a storefront, or a product, and ask a question. Gemini processes the camera feed along with your voice query.
Navigation — Turn-by-turn directions delivered through audio, without needing to look at a phone screen.
Reminders and tasks — Standard assistant functions: setting reminders, sending messages, checking calendar events, all via voice.
Summarization — If you’re in a meeting or listening to a lecture, Gemini can summarize what was said.
Availability
Google has indicated the audio glasses will be available in fall 2025. The Warby Parker collaboration suggests a retail presence alongside prescription and non-prescription lens options. Pricing hasn’t been officially confirmed, but the expectation from analyst coverage is in the $300–$500 range — positioning them near Meta’s Ray-Ban smart glasses rather than above them.
The Display Version: Android XR Glasses
The display version is a more ambitious product and one that Google is being more careful about overpromising.
What Android XR Is
Android XR is Google’s operating system for extended reality devices — both VR headsets and AR glasses. Samsung’s Project Moohan headset is the first device running Android XR. The glasses version extends the same platform to a wearable form factor.
The key difference from the audio glasses is a transparent waveguide display built into one lens. This lets Gemini surface visual information as a heads-up overlay: text, navigation arrows, translated subtitles, notifications, and other elements you can see while still looking at the real world.
What the Display Adds
The display version unlocks capabilities that audio simply can’t replicate:
- Subtitles in real time — Live translated or transcribed text floating in your field of view during conversations
- Navigation overlays — Arrows and waypoints appearing in your environment as you walk
- Contextual cards — When you look at a landmark, product, or person you’ve shared contact info with, relevant information can appear in the display
- Persistent notifications — Glanceable information without pulling out a phone
Remy doesn't write the code. It manages the agents who do.
Remy runs the project. The specialists do the work. You work with the PM, not the implementers.
These are compelling features. They’re also harder to ship well. Display brightness, battery life, latency, field of view, and social comfort are all significantly more complex engineering problems than placing a speaker in a glasses arm.
Timeline and Availability
Google has not announced a specific release date for the Android XR glasses. At I/O 2025, the display version was shown in prototype form, and Google shared early developer access information. The realistic expectation is a 2026 release for consumer availability, though developer hardware could arrive earlier.
How the Two Versions Compare
Here’s a side-by-side view of what each version offers:
| Feature | Audio Glasses | Android XR Display Glasses |
|---|---|---|
| Visual display | No | Yes (one-lens waveguide) |
| Camera | Yes | Yes |
| Gemini integration | Yes | Yes |
| Live translation | Audio only | Audio + on-screen subtitles |
| Navigation | Audio directions | Visual overlay + audio |
| Design partners | Warby Parker, Gentle Monster | TBD |
| Battery expectations | Full day | Unknown (likely shorter) |
| Shipping timeline | Fall 2025 | 2026+ |
| Estimated price | ~$300–$500 | Unknown (likely higher) |
The audio version is a real product with a real launch window. The display version is a real product in development — but the timeline is more open.
What Makes These Different From Meta Ray-Bans
Meta’s Ray-Ban smart glasses launched in 2023 and updated in 2024 with Meta AI integration. They’re the closest existing product to what Google is launching in the audio category.
The core comparison:
Meta Ray-Ban glasses use Meta AI, which is backed by Llama models. They support voice queries, live video streaming to Instagram, and photo capture. Meta AI’s capabilities have improved significantly, including real-time visual assistance.
Google’s audio glasses use Gemini, which brings stronger integration with Google’s ecosystem — Search, Maps, Calendar, Gmail, Workspace. If you live in Google’s tools, the integration is tighter. Gemini also benefits from Google’s lead in multilingual translation, which matters for the live translation feature.
Neither has a clear overall capability advantage at this point. The differentiator is ecosystem. Google’s glasses will likely feel more natural for Android users deep in Google’s apps. Meta’s will feel natural for people who use Facebook, Instagram, and WhatsApp heavily.
The display comparison is less direct. Meta has no consumer AR display glasses currently available. Apple Vision Pro is an immersive VR/AR headset, not a glasses form factor. Google’s Android XR display glasses are competing in a space where there are few shipping products at any price point.
What Gemini Brings to Wearables Specifically
Gemini isn’t just a chatbot pasted into glasses hardware. The capabilities that make it useful in a wearable context are different from what makes it useful in a browser.
Multimodal Real-Time Processing
Project Astra — the system powering these glasses — is built for continuous, real-time input. It doesn’t just respond to discrete queries. It can maintain context over a conversation that spans several minutes, track what you’ve looked at with the camera, and connect those visual inputs to what you’re asking.
Other agents ship a demo. Remy ships an app.
Real backend. Real database. Real auth. Real plumbing. Remy has it all.
This matters because most wearable AI interactions are short and context-dependent. You’re walking down a street and ask about a restaurant you’re passing. You’re in a conversation and want a translated phrase. You look at a train schedule and ask when the next one leaves. These are all quick, contextual moments — not the kind of extended research sessions where a desktop AI shines.
Persistent Context
Gemini can maintain a thread of context throughout the day across interactions. If you mention in the morning that you have a meeting in a new part of the city, and later ask for directions, Gemini can connect those. That kind of continuity is more useful in a glasses form factor than on a phone, where you’re more likely to switch apps and lose thread.
On-Device vs. Cloud Processing
Google hasn’t published full technical details on how much processing happens on-device versus in the cloud for the glasses. Latency is a known challenge in real-time AI wearables. For live translation and navigation to feel fluid rather than lagged, processing needs to be fast. Google has been working on Gemini Nano — the on-device version of Gemini — for exactly this kind of use case.
Building Gemini-Powered Workflows With MindStudio
The Gemini AI glasses represent Google’s vision for ambient AI — intelligence that’s present in your environment rather than something you have to open an app to access. That’s compelling for real-world moments, but for most business workflows, there’s still a meaningful gap between what the glasses can do hands-free and what teams actually need to automate.
That’s where MindStudio fits in.
MindStudio is a no-code platform for building AI agents and automated workflows. It gives you access to Gemini models (along with 200+ others) without needing a separate API key or Google Cloud account. You can build agents that use Gemini’s multimodal capabilities — analyzing images, processing documents, generating content — and connect them to the tools your team already uses.
For example, you could build a Gemini-powered agent that:
- Takes photos or video stills captured by smart glasses or phones and runs them through Gemini for product identification, quality checks, or documentation
- Integrates with Google Workspace to automatically log notes from a Gemini voice interaction into a CRM like HubSpot or Salesforce
- Triggers real-time translation workflows for support teams, routing transcripts from multilingual conversations into the right queue
The average MindStudio agent takes 15 minutes to an hour to build, and you can connect it to 1,000+ pre-built integrations without writing code. If you’re thinking about how Gemini’s capabilities map to actual operational work — not just casual voice queries — MindStudio is a practical starting point.
You can try MindStudio free at mindstudio.ai.
Frequently Asked Questions
When will Google Gemini AI glasses be available to buy?
The audio-only version, developed in partnership with Warby Parker and Gentle Monster, is expected to ship in fall 2025. The Android XR display glasses are still in development with no confirmed consumer release date — 2026 is the most widely cited expectation based on Google’s statements.
Do Google’s Gemini glasses have a camera?
Yes. Both the audio version and the display version include a camera that feeds visual input to Gemini. This enables features like contextual Q&A about what you’re looking at, live translation, and object recognition. The camera is built into the frame.
Plans first. Then code.
Remy writes the spec, manages the build, and ships the app.
What’s the difference between the audio glasses and the Android XR glasses?
The audio glasses have no screen — all output comes through speakers. The Android XR display glasses have a transparent display built into one lens, allowing Gemini to show visual overlays like subtitles, navigation arrows, and contextual information in your field of view. The audio version is shipping sooner and will be more affordable.
How does Gemini work in the glasses?
Gemini processes input from the glasses microphone and camera in near-real time using Google’s Project Astra framework. You speak a query or the glasses pick up a conversation, Gemini processes the audio and visual context together, and responds through the glasses speakers (or display, in the XR version). Processing happens through a combination of on-device and cloud computing.
How do Google’s glasses compare to Meta Ray-Ban smart glasses?
Both are camera-equipped smart glasses with AI assistants and no display. Meta Ray-Bans use Meta AI. Google’s glasses use Gemini. The main difference is ecosystem: Google’s glasses integrate more naturally with Android, Google Maps, Calendar, and Search. Meta’s integrate with Instagram, Facebook, and WhatsApp. Neither has a clear feature advantage overall as of mid-2025.
Will the glasses work without a phone?
Google hasn’t published full standalone specs for the audio glasses. Based on similar products and the hardware description, they’ll likely require a paired Android phone for processing-heavy tasks and connectivity. Some lightweight functions may work without a phone nearby, but full Gemini functionality will depend on a connected device or strong Wi-Fi.
Key Takeaways
- Google’s Gemini AI glasses are two distinct products: an audio-only version shipping fall 2025, and an Android XR display version still in development.
- The audio glasses look like regular eyewear (Warby Parker and Gentle Monster frames) and use Gemini for voice queries, live translation, contextual Q&A, and navigation — all through audio.
- The display version adds a transparent heads-up display for visual overlays, real-time subtitles, and navigation arrows, but has a longer development timeline.
- Both versions are powered by Project Astra, Google’s real-time multimodal AI framework.
- The audio glasses compete directly with Meta Ray-Ban smart glasses. The differentiator is ecosystem: Gemini vs. Meta AI, Google’s tools vs. Meta’s social platforms.
- If you want to build workflows that use Gemini’s capabilities in a business context — processing images, connecting to CRMs, automating multilingual communication — MindStudio lets you do that without code and without API setup.
