Text Generation Model

DeepSeek V3.1

DeepSeek-V3.1 is a powerful 671B parameter hybrid AI model that seamlessly switches between fast conversational responses and deep step-by-step reasoning, with significantly improved tool use and agent capabilities.

Start Building with DeepSeek V3.1 View All Models

Publisher

DeepSeek

Type Text

Context Window 128,000 tokens

Training Data August 2025

Input $0.27/MTok

Output $1.00/MTok

Provider

DeepInfra

Try DeepSeek V3.1 →

About DeepSeek V3.1

671B hybrid model with switchable reasoning modes

DeepSeek-V3.1 is a 671-billion parameter large language model developed by DeepSeek, using a Mixture-of-Experts (MoE) architecture that activates 37 billion parameters at any given time. It supports a 128,000-token context window and was trained through August 2025, with an enhanced base model built using a two-phase long-context extension process that included 630 billion tokens at the 32K phase and 209 billion tokens at the 128K phase. The model accepts text input and produces text output across a wide range of general-purpose tasks.

What distinguishes DeepSeek-V3.1 from earlier versions is its hybrid thinking design: a single model that can operate in a fast conversational mode or a slower step-by-step reasoning mode, selectable through prompting rather than requiring a separate model. Post-training improvements have also focused on tool use and agentic workflows, including multi-step API calls, web search, and code execution. This makes it well-suited for coding, mathematical reasoning, long-document analysis, and complex multi-turn agent tasks.

Capabilities

What DeepSeek V3.1 supports

Hybrid Thinking Mode

Switches between fast conversational responses and deep step-by-step reasoning within a single model, controlled by how the model is prompted rather than by selecting a separate endpoint.

Long Context Window

Supports up to 128,000 tokens of context, enabling analysis of long documents, extended codebases, or multi-turn conversations without truncation.

Tool Use & Agents

Handles multi-step agentic workflows including external API calls, web search, and code execution, with post-training improvements specifically targeting tool-calling reliability.

Code Generation

Generates, explains, and debugs code across multiple programming languages, with the option to invoke thinking mode for complex algorithmic problems.

Mathematical Reasoning

Solves multi-step math problems using the model's thinking mode, which produces intermediate reasoning steps before arriving at a final answer.

Mixture-of-Experts Architecture

Uses a MoE design with 671 billion total parameters but only 37 billion activated per forward pass, allowing large model capacity with more efficient inference.

Ready to build with DeepSeek V3.1?

Get Started Free

Performance

Benchmark scores

Scores represent accuracy — the percentage of questions answered correctly on each test.

Benchmark	What it tests	Score
MMLU-Pro	Expert knowledge across 14 academic disciplines	83.3%
GPQA Diamond	PhD-level science questions (biology, physics, chemistry)	73.5%
LiveCodeBench	Real-world coding tasks from recent competitions	57.7%
HLE	Questions that challenge frontier models across many domains	6.3%
SciCode	Scientific research coding and numerical methods	36.7%

FAQ

Common questions about DeepSeek V3.1

What is the context window for DeepSeek-V3.1?

DeepSeek-V3.1 supports a context window of 128,000 tokens, suitable for long documents, large codebases, and extended multi-turn conversations.

How many parameters does DeepSeek-V3.1 have?

The model has 671 billion total parameters. Due to its Mixture-of-Experts architecture, only 37 billion parameters are activated during any single forward pass.

What is the knowledge cutoff for DeepSeek-V3.1?

Based on the provided metadata, DeepSeek-V3.1's training date is listed as August 2025, which represents the approximate knowledge cutoff for the model.

How does the hybrid thinking mode work?

DeepSeek-V3.1 can operate in a fast non-thinking conversational mode or a slower step-by-step reasoning mode. The mode is selected through prompting rather than by choosing a different model or endpoint.

Is the model available for local deployment?

The model weights for both DeepSeek-V3.1 and DeepSeek-V3.1-Base are available on Hugging Face, making local or self-hosted deployment possible for those with sufficient hardware resources.

Community Discussion

What people think about DeepSeek V3.1

Community reception on r/LocalLLaMA has been broadly positive, with multiple threads accumulating hundreds of upvotes and active discussion around the model's release and base weights. Users have praised the hybrid reasoning capability and the availability of the base model weights on Hugging Face.

A notable portion of community discussion involves direct comparisons with other models and questions about practical local deployment given the model's large parameter count. Hardware requirements for running a 671B MoE model locally are a recurring concern in the threads.

r/LocalLLaMA 826 pts 199 comments

deepseek-ai/DeepSeek-V3.1-Base · Hugging Face

r/LocalLLaMA 446 pts 140 comments

GPT 4.5 vs DeepSeek V3.1

r/LocalLLaMA 549 pts 115 comments

DeepSeek v3.1

r/LocalLLaMA 559 pts 92 comments

deepseek-ai/DeepSeek-V3.1 · Hugging Face

View more discussions →

Resources