Skip to main content
MindStudio
Pricing
Blog About
My Workspace
Text Generation Model

DeepSeek V3.1

DeepSeek-V3.1 is a powerful 671B parameter hybrid AI model that seamlessly switches between fast conversational responses and deep step-by-step reasoning, with significantly improved tool use and agent capabilities.

Publisher DeepSeek
Type Text
Context Window 128,000 tokens
Training Data August 2025
Input $0.27/MTok
Output $1.00/MTok
Provider DeepInfra

671B hybrid model with switchable reasoning modes

DeepSeek-V3.1 is a 671-billion parameter large language model developed by DeepSeek, using a Mixture-of-Experts (MoE) architecture that activates 37 billion parameters at any given time. It supports a 128,000-token context window and was trained through August 2025, with an enhanced base model built using a two-phase long-context extension process that included 630 billion tokens at the 32K phase and 209 billion tokens at the 128K phase. The model accepts text input and produces text output across a wide range of general-purpose tasks.

What distinguishes DeepSeek-V3.1 from earlier versions is its hybrid thinking design: a single model that can operate in a fast conversational mode or a slower step-by-step reasoning mode, selectable through prompting rather than requiring a separate model. Post-training improvements have also focused on tool use and agentic workflows, including multi-step API calls, web search, and code execution. This makes it well-suited for coding, mathematical reasoning, long-document analysis, and complex multi-turn agent tasks.

What DeepSeek V3.1 supports

Hybrid Thinking Mode

Switches between fast conversational responses and deep step-by-step reasoning within a single model, controlled by how the model is prompted rather than by selecting a separate endpoint.

Long Context Window

Supports up to 128,000 tokens of context, enabling analysis of long documents, extended codebases, or multi-turn conversations without truncation.

Tool Use & Agents

Handles multi-step agentic workflows including external API calls, web search, and code execution, with post-training improvements specifically targeting tool-calling reliability.

Code Generation

Generates, explains, and debugs code across multiple programming languages, with the option to invoke thinking mode for complex algorithmic problems.

Mathematical Reasoning

Solves multi-step math problems using the model's thinking mode, which produces intermediate reasoning steps before arriving at a final answer.

Mixture-of-Experts Architecture

Uses a MoE design with 671 billion total parameters but only 37 billion activated per forward pass, allowing large model capacity with more efficient inference.

Ready to build with DeepSeek V3.1?

Get Started Free

Benchmark scores

Scores represent accuracy — the percentage of questions answered correctly on each test.

Benchmark What it tests Score
MMLU-Pro Expert knowledge across 14 academic disciplines 83.3%
GPQA Diamond PhD-level science questions (biology, physics, chemistry) 73.5%
LiveCodeBench Real-world coding tasks from recent competitions 57.7%
HLE Questions that challenge frontier models across many domains 6.3%
SciCode Scientific research coding and numerical methods 36.7%

Common questions about DeepSeek V3.1

What is the context window for DeepSeek-V3.1?

DeepSeek-V3.1 supports a context window of 128,000 tokens, suitable for long documents, large codebases, and extended multi-turn conversations.

How many parameters does DeepSeek-V3.1 have?

The model has 671 billion total parameters. Due to its Mixture-of-Experts architecture, only 37 billion parameters are activated during any single forward pass.

What is the knowledge cutoff for DeepSeek-V3.1?

Based on the provided metadata, DeepSeek-V3.1's training date is listed as August 2025, which represents the approximate knowledge cutoff for the model.

How does the hybrid thinking mode work?

DeepSeek-V3.1 can operate in a fast non-thinking conversational mode or a slower step-by-step reasoning mode. The mode is selected through prompting rather than by choosing a different model or endpoint.

Is the model available for local deployment?

The model weights for both DeepSeek-V3.1 and DeepSeek-V3.1-Base are available on Hugging Face, making local or self-hosted deployment possible for those with sufficient hardware resources.

What people think about DeepSeek V3.1

Community reception on r/LocalLLaMA has been broadly positive, with multiple threads accumulating hundreds of upvotes and active discussion around the model's release and base weights. Users have praised the hybrid reasoning capability and the availability of the base model weights on Hugging Face.

A notable portion of community discussion involves direct comparisons with other models and questions about practical local deployment given the model's large parameter count. Hardware requirements for running a 671B MoE model locally are a recurring concern in the threads.

View more discussions →

Parameters & options

Max Temperature 1
Max Response Size 8,000 tokens

Start building with DeepSeek V3.1

No API keys required. Create AI-powered workflows with DeepSeek V3.1 in minutes — free.