What Is Thinking Machines Labs? Mira Murati's Real-Time AI Interaction Model

Mira Murati Is Building Something Different

When Mira Murati left OpenAI in September 2024, people expected her to build another frontier model lab — another competitor to the company she helped shape as CTO. What she’s building with Thinking Machines Labs is more specific than that, and arguably more interesting.

Thinking Machines Labs is focused on a distinct problem: how AI systems interact with humans and tools in real time. Not just faster responses, but fundamentally different interaction patterns — real-time translation that preserves voice characteristics, simultaneous tool calls instead of sequential ones, and agents that are aware of time as an active variable. The demos they’ve released have drawn attention from researchers and developers who recognize how different this approach is from the current mainstream.

This article breaks down what Thinking Machines Labs is actually building, why the interaction model matters, and what it means for how AI agents will work in practice.

Who Founded Thinking Machines Labs

Mira Murati is one of the most recognized figures in AI. She joined OpenAI in 2018, eventually becoming CTO — a role that put her at the center of some of the field’s most significant product launches, including ChatGPT and GPT-4.

In November 2023, she served as interim CEO during the five-day board crisis that nearly ended OpenAI. She returned to the CTO role after Sam Altman came back, but departed permanently in September 2024.

Cursor

ChatGPT

Figma

Linear

GitHub

Vercel

Supabase

goremy.ai

Seven tools to build an app. Or just Remy.

Editor, preview, AI agents, deploy — all in one tab. Nothing to install.

Murati founded Thinking Machines Labs shortly after leaving. She brought in other experienced researchers and engineers, and the company reportedly raised substantial early funding. The team has kept a relatively low public profile compared to other high-profile AI startups, but the technical direction they’ve signaled is clearly defined.

The core thesis: current AI interactions are too sequential, too latency-sensitive, and too context-blind. Thinking Machines Labs is building toward something that feels less like a query-response system and more like a continuous, context-aware AI presence.

What Makes the Real-Time Interaction Model Different

Most AI systems today operate in discrete turns. You send input, the model processes it, returns output, and waits. Even “streaming” responses are still fundamentally sequential — one token at a time, one step at a time.

Thinking Machines Labs is working on a different architecture. Their interaction model treats the AI as something closer to a persistent co-processor: always listening, processing in parallel streams, and capable of acting on multiple inputs or tools at once.

Why Sequential Models Hit a Wall

Sequential processing has served well enough for text chat, but it starts to break down in more demanding scenarios:

Live conversations — Human speech doesn’t wait for a turn to end. Real-time translation or live captioning requires processing that overlaps with input.
Complex agentic tasks — An agent researching a topic might need to search three sources simultaneously, not one after another.
Long-running sessions — A multi-hour interaction requires the model to track what happened an hour ago with the same fidelity as what happened a second ago.

These aren’t niche edge cases. They’re the exact scenarios where AI agents need to work reliably to be genuinely useful in business and research contexts.

The Real-Time Translation Demo

One of the most striking things Thinking Machines Labs has publicly demonstrated is real-time language translation that preserves the speaker’s voice characteristics.

This is harder than it sounds. Most translation systems work in phases: transcribe speech to text, translate the text, synthesize new audio. Each step adds latency, and the synthesized voice typically sounds generic or robotic — nothing like the original speaker.

The Thinking Machines Labs approach compresses or eliminates the gaps between those phases, and works on maintaining voice identity across the translation. The result is translation that feels more like simultaneous human interpreting than a machine pipeline.

Why Voice Preservation Matters

For practical applications — international business calls, multilingual customer service, live interviews — a translation that sounds like the original speaker is far more natural and easier to trust. It reduces the “uncanny valley” effect that makes automated translation feel off.

It also signals something important about the underlying model architecture: it’s processing audio, semantic meaning, and synthesis in a much more integrated way than traditional pipelines.

Simultaneous Tool Calls: Running in Parallel

Current large language models, including the most capable ones, typically call tools sequentially. If an agent needs to check a calendar, search the web, and look up a contact to answer a question, it does one, waits for the result, then does the next.

Thinking Machines Labs is building toward simultaneous tool calls — the ability to dispatch multiple tool requests in parallel and integrate the results as they come in.

Why This Changes Agent Performance

Everyone else built a construction worker.
We built the contractor.

🦺

CODING AGENT

Types the code you tell it to.
One file at a time.

🧠

CONTRACTOR · REMY

Runs the entire build.
UI, API, database, deploy.

The difference isn’t just speed, though speed matters. Sequential tool use creates a compounding latency problem: if each tool call takes two seconds, five calls take ten seconds. Parallel calls could bring that to two seconds total.

But beyond speed, parallel tool use enables a different kind of reasoning. An agent that can hold multiple threads of investigation open simultaneously can synthesize richer answers and catch contradictions across sources in ways that sequential search can’t.

For multi-agent workflows specifically, this is significant. When one agent is orchestrating several sub-agents, the ability to run them in parallel rather than in sequence changes what’s actually feasible at scale.

Time-Aware AI Agents

One of the less-discussed but genuinely interesting aspects of the Thinking Machines Labs model is time awareness.

Current models have a training cutoff and, unless given explicit context, don’t have a reliable sense of when “now” is. They can be told the date, but that’s passive — it doesn’t affect how they reason about recency, relevance, or the aging of information.

A time-aware agent understands that something that happened thirty minutes ago in a session is more recent than something that happened three hours ago, and reasons about that difference appropriately. It can prioritize recent information, flag when something it learned earlier in a session might be superseded, and behave differently based on the time context it’s operating in.

What This Enables

Persistent memory management — An agent that knows how old a piece of information is can decide whether to verify it before acting on it.
Scheduled and event-driven reasoning — An agent aware of temporal context can reason about deadlines, time zones, and sequence of events more reliably.
Longitudinal sessions — For agents running over hours or days, temporal awareness prevents the kind of context collapse where the model treats everything as equally current.

This connects directly to how autonomous background agents need to behave — they’re not responding to a single prompt but operating over extended time periods with changing conditions.

The Broader Vision: Always-On AI Interaction

Thinking Machines Labs isn’t just optimizing existing AI patterns. The underlying vision is a shift from AI as a tool you invoke to AI as a persistent presence that’s continuously processing context.

Think about the difference between a calculator and a human assistant. The calculator waits for you to press a button. The assistant is already thinking about your meeting in an hour while they respond to your current question.

Current AI systems are mostly calculators. The Thinking Machines Labs model is aimed at something closer to the assistant pattern — context is continuous, processing is parallel, and time is an active variable.

This is a technically demanding goal. It requires rethinking memory architecture, attention mechanisms, and latency targets all at once. But if achieved, it changes what’s realistic for AI-powered applications in domains like healthcare, legal research, and real-time operations.

How MindStudio Fits Into the Multi-Agent Picture

The capabilities Thinking Machines Labs is developing — parallel tool calls, time-aware reasoning, real-time processing — are frontier model research. Building that kind of model requires a dedicated research organization.

But many of the practical applications those capabilities enable are things developers and teams need to build today, with current models. That’s where a platform like MindStudio comes in.

MindStudio’s no-code agent builder lets you create AI agents that call multiple tools, run workflows in parallel, and connect to over 1,000 business integrations — without writing the model infrastructure from scratch. You can connect 200+ AI models including Claude, GPT-4, and Gemini, and build agents that handle email triggers, scheduled tasks, API calls, and more.

For teams that want to implement parallel-style agentic workflows now — running sub-agents in parallel, integrating results, operating on schedules — MindStudio provides the infrastructure layer so you can focus on the logic, not the plumbing.

You can try MindStudio free at mindstudio.ai.

Frequently Asked Questions

What is Thinking Machines Labs?

Thinking Machines Labs is an AI research and product company founded by Mira Murati, former CTO of OpenAI, in late 2024. The company is focused on building AI systems capable of real-time, parallel, and time-aware interactions — a different architecture than the sequential query-response models most current AI systems use.

Who is Mira Murati?

Mira Murati is an AI researcher and executive who served as CTO of OpenAI from 2022 until September 2024. She was also briefly interim CEO during the OpenAI board crisis in November 2023. She oversaw the development and launch of major OpenAI products including ChatGPT and GPT-4 before founding Thinking Machines Labs.

What is real-time AI translation and why does it matter?

Real-time AI translation processes and translates speech as it happens, rather than waiting for a complete utterance to finish before beginning the translation process. Thinking Machines Labs has demonstrated translation that also preserves the original speaker’s voice characteristics, making it sound significantly more natural than traditional text-to-speech translation pipelines. This matters for live business communication, multilingual collaboration, and accessibility tools.

What are simultaneous tool calls in AI?

Simultaneous tool calls (also called parallel tool use) allow an AI agent to dispatch multiple external requests — web searches, database lookups, API calls — at the same time and integrate results as they arrive. This contrasts with the current common approach where agents make one tool call, wait for the result, then make the next. Parallel tool calls dramatically reduce latency for complex agent tasks and enable richer synthesis across multiple information sources.

How is Thinking Machines Labs different from OpenAI or Anthropic?

While OpenAI and Anthropic focus broadly on building capable general-purpose AI models, Thinking Machines Labs is specifically focused on the interaction architecture — how AI systems process information in real time, use tools in parallel, and maintain temporal awareness. The differentiation is less about model capability benchmarks and more about how the model experiences and operates within a continuous flow of time and context.

Is Thinking Machines Labs releasing products publicly?

As of early 2025, Thinking Machines Labs has shared technical demos publicly but has not launched a general-purpose consumer or enterprise product. The company has signaled intent to build products on top of their model research, but has maintained a relatively measured pace of public announcements compared to other high-profile AI startups.

Key Takeaways

Thinking Machines Labs was founded by ex-OpenAI CTO Mira Murati after she departed in September 2024, with a focus on real-time AI interaction models rather than just general-purpose model capability.
Real-time translation is one of their flagship demos, showing speech translation that preserves the original speaker’s voice — reducing latency and improving naturalness compared to traditional pipeline approaches.
Simultaneous tool calls allow AI agents to run multiple external queries in parallel rather than sequentially, reducing compounding latency and enabling richer reasoning across data sources.
Time-aware agents represent a shift toward AI that actively reasons about recency and temporal context — crucial for agents operating over extended sessions or handling time-sensitive information.
The broader vision is a move from AI as an invoked tool to AI as a persistent, context-aware presence — which has significant implications for how agentic workflows are designed.

If you want to build practical multi-agent workflows today — with parallel processing, tool integrations, and scheduled execution — MindStudio lets you get started without building model infrastructure from scratch.

What Is Thinking Machines Labs? Mira Murati's Real-Time AI Interaction Model

Mira Murati Is Building Something Different

Who Founded Thinking Machines Labs

Seven tools to build an app. Or just Remy.