Skip to main content
MindStudio
Pricing
Blog About
My Workspace
Text Generation Model

Sonar

Perplexity's lightweight, real-time AI search model that delivers fast, citation-backed answers by connecting directly to the live web.

Publisher Perplexity
Type Text
Context Window 128,000 tokens
Training Data January 2025
Input $1.00/MTok
Output $1.00/MTok

Real-time web search with citation-backed answers

Sonar is Perplexity AI's in-house text generation model, built on Meta's Llama 3.3 70B and optimized for web-grounded question answering. Released in January 2025, it retrieves live internet data at query time rather than relying solely on static training knowledge, and every response includes inline source citations for transparency. It supports a 128,000-token context window and runs at approximately 121 tokens per second using Cerebras wafer-scale inference.

Sonar is designed for developers and businesses that need to embed fast, factual, and source-backed search capabilities into their own applications. It offers three search depth modes — High, Medium, and Low — allowing teams to balance thoroughness against response speed depending on their use case. On the SimpleQA benchmark, Sonar achieved an F-score of 0.773, reflecting its focus on factual accuracy. It is particularly well-suited for high-volume applications such as sales research tools, medical information platforms, and real-time in-meeting search features.

What Sonar supports

Real-Time Web Search

Grounds every response in live internet data retrieved at query time, rather than relying on static training knowledge alone.

Inline Source Citations

Automatically includes inline citations with each answer, linking responses directly to their source URLs for verifiability.

128K Token Context

Supports a 128,000-token context window, enabling extended conversations and analysis of long documents within a single request.

High-Speed Inference

Achieves approximately 121 tokens per second using Cerebras wafer-scale inference, enabling sub-second response times for high-volume workloads.

Adjustable Search Depth

Offers High, Medium, and Low search depth modes so developers can tune the balance between answer thoroughness and response latency.

API Integration

Available via the Sonar API, allowing developers to embed generative search directly into their own products without building retrieval infrastructure.

Ready to build with Sonar?

Get Started Free

Benchmark scores

Scores represent accuracy — the percentage of questions answered correctly on each test.

Benchmark What it tests Score
MMLU-Pro Expert knowledge across 14 academic disciplines 68.9%
GPQA Diamond PhD-level science questions (biology, physics, chemistry) 47.1%
MATH-500 Undergraduate and competition-level math problems 81.7%
AIME 2024 American math olympiad problems 48.7%
LiveCodeBench Real-world coding tasks from recent competitions 29.5%
HLE Questions that challenge frontier models across many domains 7.3%
SciCode Scientific research coding and numerical methods 22.9%

Common questions about Sonar

What is the context window size for Sonar?

Sonar supports a context window of 128,000 tokens, which allows for extended conversations and analysis of lengthy documents in a single request.

Does Sonar have a knowledge cutoff date?

Sonar retrieves live web data at query time, so its answers are not limited to a static training cutoff. The model itself was launched in January 2025, and its underlying Llama 3.3 70B base has its own training data cutoff, but real-time search supplements this with current information.

How is Sonar priced?

Pricing details for Sonar via the Sonar API are available on Perplexity's official API overview page at sonar.perplexity.ai. Sonar is also used to power Perplexity's free consumer tier.

What model is Sonar built on?

Sonar is built on Meta's Llama 3.3 70B and has been optimized by Perplexity AI for web-grounded, real-time question answering with citation support.

How accurate is Sonar on factual questions?

On the SimpleQA benchmark, which tests factual accuracy in language models, Sonar achieved an F-score of 0.773.

What people think about Sonar

Community discussion around Sonar on Reddit is limited in volume, but the available thread reflects interest in how it performs relative to other search-augmented models on structured benchmarks like FRAMES. Users in the LocalLLaMA community engaged with comparisons between Sonar and competing search models, generating 792 upvotes and 75 comments.

The primary concern raised in community threads is whether Sonar's benchmark performance holds up against open-source alternatives, with at least one highly upvoted post highlighting an open-source search repository that outperformed Sonar Reasoning Pro on the FRAMES benchmark. Practical use cases discussed include developer integrations and real-time search applications.

View more discussions →

Parameters & options

Max Temperature 1.9
Max Response Size 32,768 tokens
Return Citations Select

Determines whether or not a request to an online model should return citations.

Default: false
NoYes
Return Images Select

Determines whether or not a request to an online model should return images.

Default: false
NoYes
Search Context Size Select

Controls how much web information is retrieved. Higher context provides more comprehensive results but costs more per request.

Default: low
Low (Fastest, cheapest)Medium (Balanced)High (Best for research)

Start building with Sonar

No API keys required. Create AI-powered workflows with Sonar in minutes — free.