Google IO 2026 Leaks: 8 Codenames and Features That Surfaced Before the Announcement

Eight Signals That Google IO 2026 Is Going to Be Bigger Than Anyone Expected

Something unusual is happening in the weeks before Google IO 2026. Codenames are leaking. Arena tests are surfacing unfamiliar model names. Features are showing up in Google AI Studio before any announcement. And the pattern of what’s appearing — Ajax, Hercules, Hector, Orpheus in blind arena tests; a memory feature called Team Food; a visual model push under the name Spark Robin — suggests Google has been building quietly while OpenAI has been doing its victory lap.

You don’t usually get this much signal before a major developer conference. Here are eight specific things that have surfaced, what they suggest, and why the combination of them matters.

The Arena Tests Are Showing Names Nobody Recognizes

Start with the most concrete evidence: blind arena tests — the kind where users rate model outputs without knowing which model produced them — are now surfacing codenames that weren’t there before. Ajax. Hercules. Hector. Orpheus. Four names, none of them attached to any publicly announced model.

Cursor

ChatGPT

Figma

Linear

GitHub

Vercel

Supabase

remy.msagent.ai

Seven tools to build an app. Or just Remy.

Editor, preview, AI agents, deploy — all in one tab. Nothing to install.

Arena tests are a reliable early-warning system. When a new name appears in the blind pool, it means the model is real enough to be tested against real users, even if it hasn’t been announced. The fact that four new names appeared in close proximity suggests Google (or someone adjacent to Google) is running a serious pre-IO evaluation sprint. This kind of pre-announcement arena activity has precedent — it’s how several frontier model releases have been telegraphed before their official reveal, and it’s worth tracking the same way you’d track early benchmark leaks ahead of a major model drop.

The wrinkle: at least one commenter watching the arena results closely believes Ajax might not be a Google model at all — it could be an Apple model. That’s unverified, but it’s not implausible. Apple has been quietly building its own foundation model infrastructure, and blind arena tests don’t require the submitting lab to identify itself. If Ajax is Apple’s, that’s a story on its own. If it’s Google’s, it’s one of four new models in testing simultaneously. Either way, something is moving.

Team Food: The Memory Feature With the Strangest Name

Alongside the arena codenames, a separate leak points to a feature called Team Food — a new memory system aimed at improving how Gemini uses past conversations and long-term context.

The name tells you nothing about the function, which is presumably the point of a codename. But the function matters a lot. Current Gemini models, like most frontier models, treat each conversation as largely self-contained. Long-term context — the kind that would let a model remember that you prefer concise answers, or that you’re working on a specific project, or that you made a decision three weeks ago that affects today’s question — is either absent or shallow.

Team Food appears to be Google’s attempt to fix that. The leak pairs it with a broader description of Gemini Ultra evolving into a “memory-heavy long-context system for consistent multi-step workflows.” That framing is significant. It’s not just about remembering your name. It’s about maintaining coherent state across extended, multi-session work — the kind of thing that would make Gemini genuinely useful for ongoing projects rather than one-off queries.

This is also where the Gemini Ultra positioning starts to make sense as a deliberate product strategy, not just a capability upgrade. If you’re building a two-track model lineup — one track optimized for speed, one for deep memory and context — you need the memory track to actually have memory. Team Food is presumably the infrastructure that makes that possible.

Spark Robin: A Visual Push That Goes Beyond Image Generation

The codename Spark Robin is less specific than Team Food, but the direction it points is clear: a strong push on image and video capabilities as a unified effort, not two separate tracks.

Google’s current visual model story is fragmented. Imagen handles image generation. The VO series handles video. Gemini can process visual inputs but its generation capabilities are more limited. Spark Robin, based on what’s leaked, suggests Google is trying to consolidate or at least coordinate these efforts — treating image and video as a single capability domain rather than separate product lines.

Remy is new. The platform isn't.

Remy

Product Manager Agent

THE PLATFORM

200+ models 1,000+ integrations Managed DB Auth Payments Deploy

▮

BUILT BY MINDSTUDIO

Shipping agent infrastructure since 2021

Remy is the latest expression of years of platform work. Not a hastily wrapped LLM.

Why does that matter? Because the most capable visual AI systems right now aren’t just good at one modality. They move fluidly between them. The ability to generate an image, then animate it, then edit specific elements, then export frames — that kind of pipeline is currently stitched together from multiple models. If Spark Robin represents a more unified approach, it would close a real gap. The broader question of how image generation fits into developer workflows is one that’s been evolving quickly — GPT Image 2’s approach to native transparency and asset generation is a useful comparison point for what Google is presumably trying to match or exceed.

Nano Banana: Already Live, Already Limited

This one isn’t a leak — it’s already in Google AI Studio. Nano Banana is integrated and generating custom image assets for apps as they’re built, with a redesigned edit tool that gives you visual control over specific components.

The comparison to OpenAI’s Codex image generation is instructive. Codex can generate transparent assets natively — useful for UI elements, icons, overlays, anything that needs to sit on top of other content without a background. Nano Banana currently can’t do this. No native transparency support.

That’s not a fatal limitation, but it’s a specific one, and it matters for the use case Nano Banana is clearly aimed at: in-context app building where you want generated assets to integrate cleanly into a UI. Transparency is table stakes for that workflow. The fact that it’s missing suggests Nano Banana is either an early version of something more complete, or it’s a different product than it appears to be.

The redesigned edit tool is the more interesting piece. Visual control over specific components — the ability to annotate and update individual elements rather than regenerating the whole thing — is a meaningful workflow improvement. It’s the difference between “AI generates something and you take it or leave it” and “AI generates something and you can steer it.” That’s the direction the whole field is moving, and it’s good to see it showing up in a live product rather than a demo.

For builders thinking about how AI-generated assets fit into production workflows, this is worth watching. Remy takes a related approach at the code layer: you write an annotated markdown spec where prose carries intent and annotations carry precision, and a complete TypeScript application — backend, database, auth, and deployment — gets compiled from it. The spec is the source of truth; the generated code is derived output. The parallel to Nano Banana’s edit tool is real: in both cases, the goal is a source artifact you can control, not just an output you accept.

The Omni Model: Native Audio In and Out

One of the more technically significant leaks is the hint of a new Omni model in testing. The signal came from a specific UI element: a “video UI powered by Omni,” which suggests a model capable of deeper multimodal integration than current Gemini versions.

The word “Omni” has a specific meaning in this context. When OpenAI launched GPT-4o, the “o” stood for Omni — a model that could natively intake and output multiple modalities, not just process them through separate pipelines. GPT-4o’s multimodal capabilities were real but constrained: it could handle audio and video inputs, but native audio output was limited and native video output was locked.

How Remy works. You talk. Remy ships.

YOU14:02

Build me a sales CRM with a pipeline view and email integration.

REMY14:03 → 14:11

Scoping the project

Wiring up auth, database, API

Building pipeline UI + email integration

Running QA tests

✓ Live at yourapp.msagent.ai

Google’s Gemini models have actually led in some multimodal areas. Gemini 1.5 and later versions can natively process video inputs in ways that GPT-4o still can’t. If the new Omni model extends this to native audio output — not text-to-speech bolted on afterward, but audio generated as a first-class output — that would be a meaningful capability jump. The architecture questions here aren’t trivial; the way Google has approached mixture-of-experts scaling in recent Gemini generations gives some indication of how they might handle the compute demands of true native multimodal output, and the tradeoffs in that architecture are worth understanding if you’re trying to read the tea leaves on what Omni can actually do.

The leak doesn’t confirm native audio output, but the framing (“deeper multimodal capability”) points in that direction. Combined with the expected new video model — described as going beyond current VO systems with better quality and control, possibly including native video output inside Gemini — you start to see a coherent picture: Google trying to build a model that treats all modalities as equally native, not as add-ons.

Gemini 3.2 and 3.5: The Speed Track

Separate from the Omni model and the Ultra memory push, there are reports of Gemini 3.2 and 3.5 in testing, with a focus on speed and efficiency rather than capability expansion.

This is the other half of the two-track strategy. If Ultra is going deep on memory and long context, the 3.x series appears to be going in the opposite direction: faster responses, lower latency, more efficient inference. The use cases are different. Ultra is for the kind of sustained, multi-step work where you need the model to remember what happened three sessions ago. The 3.x series is for the kind of quick, iterative work where you need an answer in under a second.

The honest concern with current Gemini models — and this is a real one, not a benchmark complaint — is that they can feel reluctant. They don’t always want to produce a lot of output at once. They hedge. They summarize when you wanted the full thing. Speed improvements are welcome, but they’re not the same as quality improvements. The question is whether 3.2/3.5 addresses the underlying tendency toward brevity and hedging, or just delivers the same behavior faster.

The DeepMind Diffusion Paper: A VO4 Signal

This one requires a bit of reading between the lines, but it’s worth paying attention to. Google DeepMind recently published a paper on diffusion models that addresses a fundamental trade-off: the tension between the information content of the latent representation and the reconstruction quality of the output.

In plain terms: when you compress an image or video into a latent space (the internal representation a diffusion model works with), you lose some information. When you reconstruct the output from that latent, you’re working with that compressed representation. The paper apparently offers a systematic framework for navigating this trade-off — essentially a map for how to make better decisions about where to compress and where to preserve detail.

The reason this is being read as a VO4 signal is that video generation is where this trade-off is most painful. Current video models struggle with fine detail over time — faces drift, textures lose coherence, motion artifacts accumulate. A better framework for managing the latent/reconstruction trade-off would directly address these failure modes. If VO4 is coming, this paper is plausibly the research foundation it’s built on.

The Voice Model Situation: Not Google, But Adjacent

This one isn’t a Google IO leak, but it’s happening in the same week and it’s directly relevant to where Google is also moving. XAI has released a voice cloning model with what they’re calling “rich natural emotion,” and it’s already live in the Groq voice API — no enterprise plan required.

The demo that’s been circulating is genuinely unsettling. Two voices played back to back — one real, one cloned — and a public poll of thousands of listeners came in nearly 50/50 on which was which. That’s not “pretty good voice cloning.” That’s indistinguishable voice cloning, available through a standard API, today.

Google recently released their own voice model, described as “very instructable” — meaning you can give it specific direction about tone, pacing, emotion, and it follows those instructions with more precision than typical TTS systems. The two releases together — XAI’s cloning capability and Google’s instructability — suggest the voice model space is moving faster than most people are tracking. By the time IO happens, Google’s voice story may be more developed than the current public release suggests.

For builders thinking about how to chain voice models with other AI capabilities — routing audio inputs to the right model, combining voice output with retrieval or memory systems — MindStudio is one way to handle that orchestration without writing the plumbing from scratch. It supports 200+ models and 1,000+ integrations, which matters when you’re trying to combine a voice model with a memory layer or a business tool inside a single workflow.

What Eight Signals Add Up To

The honest read on all of this is that Google has been building in parallel across more dimensions than is immediately obvious. A new Omni model. A new video model. Two Gemini version tracks (speed and memory). A memory feature (Team Food). A visual model push (Spark Robin). A live image generation tool (Nano Banana). And four new model codenames in arena tests, at least one of which might not even be Google’s.

That’s a lot of surface area for a single developer conference. The question isn’t whether Google has things to announce — clearly they do. The question is whether the things they announce will survive actual use. Benchmarks are one thing. The everyday experience of using a model that hedges, or a video generator that drifts, or a memory system that misremembers — that’s the test that matters.

Google IO is a few weeks out. The codenames will get real names. The leaks will either hold up or they won’t. But the pattern of what’s surfacing — the breadth of it, the specificity of the codenames, the fact that Nano Banana is already live — suggests this isn’t a conference where Google shows up with one big thing. It looks more like a conference where they show up with everything they’ve been sitting on.

That’s either a sign of real momentum or a sign of a company trying to cover too much ground at once. We’ll find out which one soon enough.