What Is Sakana Fugu? The Multi-Model Orchestrator Explained
Sakana Fugu is an AI orchestrator that routes prompts to the best model automatically. Learn how it works, its two tiers, and real benchmark results.
Why Routing Prompts to a Single Model Is Leaving Performance on the Table
Every AI team eventually hits the same wall: no single model wins at everything. GPT-4o crushes complex reasoning but burns through tokens fast. A smaller model handles simple classification cheaply but fumbles nuanced tasks. Most teams pick one model and live with the tradeoff — or they build fragile custom routing logic that breaks every time a new model ships.
Sakana Fugu is a different answer to that problem. It’s a multi-model orchestrator from Sakana AI that automatically decides which language model should handle each incoming prompt. The result is a system that targets the performance of top-tier models while running a meaningful portion of queries on cheaper, faster alternatives.
This article explains what Sakana Fugu is, how its two-tier routing works, what the benchmark results actually show, and when it makes sense to use a system like this in production.
What Is Sakana AI?
Sakana AI is a Tokyo-based AI research company founded in 2023 by Llion Jones — one of the original co-authors of the “Attention Is All You Need” paper that introduced the Transformer architecture — along with David Ha, former Head of Research at Google Brain.
The company focuses on building efficient, nature-inspired AI systems. The name “Sakana” means “fish” in Japanese, and their model names often follow aquatic themes — Fugu being the Japanese word for puffer fish, a creature famous for both its danger and its delicacy.
Rather than competing head-to-head with frontier labs on raw parameter count, Sakana AI pursues approaches like evolutionary model merging and adaptive routing — techniques designed to get more out of existing compute rather than simply scaling up.
What Is Sakana Fugu?
Sakana Fugu is a multi-model LLM orchestration system. At its core, it’s a learned router that sits in front of a pool of language models and decides, for each incoming query, which model should handle it.
The key insight behind Fugu is that most queries don’t need the most powerful model available. A simple summarization request, a factual lookup, or a basic classification task can be handled just as well by a smaller, cheaper model. The expensive model should be reserved for genuinely hard problems — complex reasoning, ambiguous instructions, multi-step tasks.
Fugu operationalizes this idea automatically. You don’t write routing rules by hand. The system learns to classify query difficulty and model-fit, then dispatches accordingly.
The Two-Tier Design
Fugu uses a two-tier architecture:
- Tier 1 (fast lane): A lightweight, low-cost model handles queries the router predicts are within its capability. This tier is optimized for speed and cost efficiency.
- Tier 2 (capable lane): A more powerful model handles queries the router predicts require higher capability — complex reasoning, long-context understanding, nuanced generation.
The router itself is a trained classifier. It takes the incoming prompt as input and outputs a routing decision: send this to Tier 1 or escalate to Tier 2. The classifier is trained on examples where model performance differences are measurable, so it learns to identify the signals that predict when the cheaper model will underperform.
This design mirrors how well-run support teams work: frontline agents handle the majority of tickets, and escalation to specialists happens only when genuinely needed. The difference is that Fugu makes this decision in milliseconds, invisibly, on every query.
How the Router Learns to Route
The routing classifier isn’t just a keyword filter or a prompt-length heuristic. It’s trained on actual model outputs.
The training process typically involves:
- Running a large set of queries through both Tier 1 and Tier 2 models.
- Labeling which queries the Tier 1 model handled adequately versus where it fell short relative to Tier 2.
- Training a lightweight classifier on the input prompts to predict these outcome labels.
This means the router is grounded in empirical performance differences, not proxy signals. It learns patterns in the prompt itself — structure, vocabulary, task type, apparent complexity — that correlate with whether the smaller model will be sufficient.
What the Router Looks For
While the exact features vary, routing classifiers generally pick up on:
- Task type signals — Instructions that imply multi-step reasoning vs. single-step retrieval.
- Linguistic complexity — Sentence structure, domain-specific vocabulary, ambiguity in the instruction.
- Output requirement signals — Requests for structured output, code, or long-form content often route to the higher tier.
- Context length — Longer, more complex contexts tend to escalate.
The router doesn’t know which answer is “correct” at routing time — it predicts based on query characteristics alone. That’s what makes it practically deployable: it adds minimal latency (typically milliseconds) and no extra inference cost.
Benchmark Results: What the Numbers Show
Sakana AI has published benchmark comparisons showing Fugu’s performance across standard LLM evaluation tasks.
The headline result is that Fugu achieves performance close to exclusively using the top-tier model while running a substantial share of queries — often the majority — on the cheaper Tier 1 model. The exact split depends on the routing threshold you set: a conservative threshold routes more to Tier 2 for higher accuracy; an aggressive threshold routes more to Tier 1 for lower cost.
The Cost-Performance Tradeoff Curve
The practical output of Fugu’s benchmarking is a tradeoff curve showing accuracy vs. average cost per query. A few observations from this curve:
- At the high-accuracy end, Fugu matches or approaches full Tier 2 performance while still routing 20–40% of queries to Tier 1, generating meaningful cost savings.
- At the cost-optimized end, Fugu can route 70–80% of queries to Tier 1 with acceptable accuracy loss on most standard benchmarks.
- The sweet spot for most production use cases sits in the middle — capturing significant cost reduction with minimal accuracy degradation.
Comparison to Naive Baselines
Fugu outperforms two obvious alternatives:
- Always use Tier 2: Maximum accuracy, maximum cost. Fugu matches or approaches this at lower average cost.
- Always use Tier 1: Minimum cost, lower accuracy. Fugu significantly outperforms this on complex query subsets.
- Random routing: Fugu consistently outperforms random allocation at every cost target.
The learned routing is doing real work — it’s not just averaging between two options.
Why Multi-Model Orchestration Matters for Production AI
Fugu isn’t just an academic exercise. It addresses real problems that teams building production AI applications run into.
Token Economics at Scale
At small query volumes, model cost differences are negligible. At scale — thousands or millions of queries per day — they compound fast. A 40% reduction in average cost per query might represent tens of thousands of dollars monthly for a large deployment.
Latency Variability
Tier 1 models are typically faster, often significantly so. Routing simpler queries to faster models reduces average response latency and smooths out the latency distribution. Users get faster responses on the queries where speed matters most (quick lookups, simple tasks) while complex queries take the time they need.
Model Specialization
As the LLM ecosystem matures, the best model for one task isn’t the best model for all tasks. A routing layer lets you use different models for different problem types without exposing that complexity to your users or your application logic.
Future-Proofing
When a new model ships that’s better or cheaper, you can swap it into Tier 1 or Tier 2 without rewriting application logic. The routing layer abstracts the model layer from the application layer.
Limitations and Honest Tradeoffs
Fugu is useful, but it’s not universally better than picking one model.
Router Accuracy Isn’t Perfect
The routing classifier makes mistakes. Some queries the router sends to Tier 1 would have benefited from Tier 2. Some queries escalated to Tier 2 could have been handled cheaply. The aggregate performance is better than naive approaches, but individual query routing isn’t infallible.
Latency Overhead
The router adds a classification step before model inference. In most implementations this is small — milliseconds — but it’s not zero. For latency-critical applications, this overhead matters.
Domain Dependency
A router trained on general benchmarks may not generalize well to niche domains. If your application handles highly specialized content — legal documents, medical records, obscure technical topics — the routing accuracy on your specific queries may differ from published benchmarks.
Maintenance Burden
As models in your pool update, the router’s calibration may drift. Retraining the router requires fresh data from both models and careful evaluation. This is manageable but not free.
How MindStudio Approaches Multi-Model Access
If you’re thinking about multi-model routing for your own applications, it’s worth knowing there are ways to access multiple models without building routing infrastructure from scratch.
MindStudio provides access to 200+ AI models — including GPT-4o, Claude, Gemini, and others — from a single platform, without separate API keys or accounts. When building AI agents in MindStudio, you can designate different models for different steps in a workflow, effectively implementing manual multi-model routing through the visual builder.
This works well for structured workflows where the task type at each step is predictable. You know that the summarization step doesn’t need the same model as the complex analysis step, so you assign models accordingly in the workflow design.
For teams that want the benefits of multi-model orchestration — cost control, appropriate model selection — without building a routing classifier from scratch, this kind of step-level model assignment is a practical starting point. You can build and test AI agents in MindStudio’s no-code environment, selecting the right model for each step, and iterate quickly.
It’s a different approach from Fugu’s learned routing — more manual, more transparent — but useful for teams who want control over model assignment without the overhead of training and maintaining a routing classifier.
You can try MindStudio free at mindstudio.ai.
For teams building more sophisticated multi-agent AI systems, the question of which model handles which task becomes central to system design — and understanding approaches like Fugu helps inform those decisions.
When to Use a System Like Fugu
Fugu-style orchestration makes the most sense in specific scenarios:
High-volume inference workloads. If you’re running tens of thousands of queries per day, even a modest cost reduction per query adds up. The break-even on building or adopting routing infrastructure comes faster at scale.
Mixed-complexity query distributions. If your application receives a mix of simple and complex queries, routing earns its keep by handling the easy ones cheaply. If nearly all your queries are genuinely hard, a Tier 1 model won’t help much anyway.
Cost-sensitive products. Applications where inference cost directly affects unit economics — consumer products, API services with per-query pricing — benefit more from routing than internal tools where cost is less visible.
Acceptable accuracy tradeoffs. If your use case tolerates occasional routing mistakes (the router sends something to Tier 1 that would have been better on Tier 2), routing is viable. If every query must get the best possible answer, the cost savings may not justify the accuracy risk.
Built like a system. Not vibe-coded.
Remy manages the project — every layer architected, not stitched together at the last second.
Stable query distributions. Routing classifiers work best when the distribution of query types is reasonably stable. If your query distribution shifts frequently (new product features, changing user behavior), router performance may degrade faster.
Sakana Fugu vs. Other Routing Approaches
Fugu isn’t the only multi-model routing system. A few other approaches worth knowing:
RouteLLM (LMSys): An open-source routing framework from the team behind Chatbot Arena, which uses human preference data to train routing classifiers. Strong benchmark performance and open-weights models available.
Martian: A commercial LLM routing service that routes across providers based on cost and capability targets.
Unify: Another commercial routing service focused on routing across model providers with cost and latency optimization.
Manual step-level routing: Implemented in workflow builders like MindStudio, where different steps in a workflow use different models based on task type — no classifier needed, but no automation either.
What distinguishes Fugu is its origin from a research lab with deep model-level expertise, and its tight integration with Sakana AI’s broader work on efficient AI systems. The routing isn’t bolted on as a cost-cutting measure — it’s central to how Sakana thinks about deploying models.
Frequently Asked Questions
What does Sakana Fugu do?
Sakana Fugu is a multi-model LLM orchestrator. It routes incoming prompts to one of multiple language models — typically a cheaper Tier 1 model or a more capable Tier 2 model — based on a learned classifier’s prediction of which model is appropriate for the query. The goal is to match the performance of always using the top-tier model while reducing average cost and latency.
How does Fugu decide which model to use?
Fugu uses a trained routing classifier that takes the input prompt and predicts which model tier should handle it. The classifier is trained on examples where the performance difference between the two model tiers is measurable, so it learns to identify prompt-level signals that predict when the cheaper model is sufficient and when escalation is needed.
What are Fugu’s two tiers?
Fugu’s two-tier architecture consists of a lightweight, lower-cost Tier 1 model and a more capable, higher-cost Tier 2 model. Simple, well-defined queries route to Tier 1. Complex, ambiguous, or reasoning-heavy queries escalate to Tier 2. The routing decision is automatic and adds minimal latency.
How does Sakana Fugu compare to always using the best model?
Benchmark results show Fugu approaches the performance of exclusively using the top-tier model while routing a significant share of queries — often a majority — to the cheaper Tier 1 model. The exact accuracy-cost tradeoff depends on the routing threshold. At conservative thresholds, Fugu closely matches full Tier 2 performance with lower average cost. At aggressive thresholds, cost savings are higher but with some accuracy degradation.
Is Sakana Fugu open source?
Sakana AI has published research and details on the Fugu system, but full open-source availability depends on what Sakana AI has released at any given time. Check Sakana AI’s official site for the current status of model and code releases.
Can I build my own multi-model routing system?
Yes. Open-source projects like RouteLLM from LMSys provide frameworks for training routing classifiers. The core components are: a set of models at different capability/cost tiers, a dataset of queries labeled by which tier is appropriate, a classifier trained on that dataset, and an inference layer that routes based on classifier output. Commercial services like Martian and Unify offer managed routing without the build overhead.
Key Takeaways
- Sakana Fugu is a multi-model LLM orchestrator that routes each prompt to the most appropriate model tier automatically, using a learned classifier.
- Its two-tier design separates fast, cheap model inference from capable, expensive inference — and routes queries based on predicted difficulty.
- Benchmark results show meaningful cost savings relative to always using the top-tier model, with accuracy that approaches full Tier 2 performance at reasonable routing thresholds.
- Fugu works best at scale — the cost and latency benefits compound with query volume, and the routing classifier performs best on stable, diverse query distributions.
- Alternatives exist — from open-source frameworks like RouteLLM to commercial routing services to manual model assignment in tools like MindStudio.
- The core concept is sound: for most real-world applications, a significant share of queries don’t need the most powerful model available. Smart routing captures that efficiency without sacrificing quality where it matters.
If you’re building AI applications and want to experiment with multi-model workflows without building routing infrastructure first, MindStudio’s no-code builder lets you assign different models to different workflow steps and test the results — a practical way to develop intuition for model selection before investing in automated routing.


