Text Generation Model

Sonar

Perplexity's lightweight, real-time AI search model that delivers fast, citation-backed answers by connecting directly to the live web.

Start Building with Sonar View All Models

Publisher

Perplexity

Type Text

Context Window 128,000 tokens

Training Data January 2025

Input $1.00/MTok

Output $1.00/MTok

Try Sonar →

About Sonar

Real-time web search with citation-backed answers

Sonar is Perplexity AI's in-house text generation model, built on Meta's Llama 3.3 70B and optimized for web-grounded question answering. Released in January 2025, it retrieves live internet data at query time rather than relying solely on static training knowledge, and every response includes inline source citations for transparency. It supports a 128,000-token context window and runs at approximately 121 tokens per second using Cerebras wafer-scale inference.

Sonar is designed for developers and businesses that need to embed fast, factual, and source-backed search capabilities into their own applications. It offers three search depth modes — High, Medium, and Low — allowing teams to balance thoroughness against response speed depending on their use case. On the SimpleQA benchmark, Sonar achieved an F-score of 0.773, reflecting its focus on factual accuracy. It is particularly well-suited for high-volume applications such as sales research tools, medical information platforms, and real-time in-meeting search features.

Capabilities

What Sonar supports

Real-Time Web Search

Grounds every response in live internet data retrieved at query time, rather than relying on static training knowledge alone.

Inline Source Citations

Automatically includes inline citations with each answer, linking responses directly to their source URLs for verifiability.

128K Token Context

Supports a 128,000-token context window, enabling extended conversations and analysis of long documents within a single request.

High-Speed Inference

Achieves approximately 121 tokens per second using Cerebras wafer-scale inference, enabling sub-second response times for high-volume workloads.

Adjustable Search Depth

Offers High, Medium, and Low search depth modes so developers can tune the balance between answer thoroughness and response latency.

API Integration

Available via the Sonar API, allowing developers to embed generative search directly into their own products without building retrieval infrastructure.

Ready to build with Sonar?

Get Started Free

Performance

Benchmark scores

Scores represent accuracy — the percentage of questions answered correctly on each test.

Benchmark	What it tests	Score
MMLU-Pro	Expert knowledge across 14 academic disciplines	68.9%
GPQA Diamond	PhD-level science questions (biology, physics, chemistry)	47.1%
MATH-500	Undergraduate and competition-level math problems	81.7%
AIME 2024	American math olympiad problems	48.7%
LiveCodeBench	Real-world coding tasks from recent competitions	29.5%
HLE	Questions that challenge frontier models across many domains	7.3%
SciCode	Scientific research coding and numerical methods	22.9%

FAQ

Common questions about Sonar

What is the context window size for Sonar?

Sonar supports a context window of 128,000 tokens, which allows for extended conversations and analysis of lengthy documents in a single request.

Does Sonar have a knowledge cutoff date?

Sonar retrieves live web data at query time, so its answers are not limited to a static training cutoff. The model itself was launched in January 2025, and its underlying Llama 3.3 70B base has its own training data cutoff, but real-time search supplements this with current information.

How is Sonar priced?

Pricing details for Sonar via the Sonar API are available on Perplexity's official API overview page at sonar.perplexity.ai. Sonar is also used to power Perplexity's free consumer tier.

What model is Sonar built on?

Sonar is built on Meta's Llama 3.3 70B and has been optimized by Perplexity AI for web-grounded, real-time question answering with citation support.

How accurate is Sonar on factual questions?

On the SimpleQA benchmark, which tests factual accuracy in language models, Sonar achieved an F-score of 0.773.

Community Discussion

What people think about Sonar

Community discussion around Sonar on Reddit is limited in volume, but the available thread reflects interest in how it performs relative to other search-augmented models on structured benchmarks like FRAMES. Users in the LocalLLaMA community engaged with comparisons between Sonar and competing search models, generating 792 upvotes and 75 comments.

The primary concern raised in community threads is whether Sonar's benchmark performance holds up against open-source alternatives, with at least one highly upvoted post highlighting an open-source search repository that outperformed Sonar Reasoning Pro on the FRAMES benchmark. Practical use cases discussed include developer integrations and real-time search applications.

r/LocalLLaMA 792 pts 75 comments

Open-source search repo beats GPT-4o Search, Perplexity Sonar Reasoning Pro on FRAMES

View more discussions →

Resources