DeepSeek V3.1
DeepSeek-V3.1 is a powerful 671B parameter hybrid AI model that seamlessly switches between fast conversational responses and deep step-by-step reasoning, with significantly improved tool use and agent capabilities.
671B hybrid model with switchable reasoning modes
DeepSeek-V3.1 is a 671-billion parameter large language model developed by DeepSeek, using a Mixture-of-Experts (MoE) architecture that activates 37 billion parameters at any given time. It supports a 128,000-token context window and was trained through August 2025, with an enhanced base model built using a two-phase long-context extension process that included 630 billion tokens at the 32K phase and 209 billion tokens at the 128K phase. The model accepts text input and produces text output across a wide range of general-purpose tasks.
What distinguishes DeepSeek-V3.1 from earlier versions is its hybrid thinking design: a single model that can operate in a fast conversational mode or a slower step-by-step reasoning mode, selectable through prompting rather than requiring a separate model. Post-training improvements have also focused on tool use and agentic workflows, including multi-step API calls, web search, and code execution. This makes it well-suited for coding, mathematical reasoning, long-document analysis, and complex multi-turn agent tasks.
What DeepSeek V3.1 supports
Hybrid Thinking Mode
Switches between fast conversational responses and deep step-by-step reasoning within a single model, controlled by how the model is prompted rather than by selecting a separate endpoint.
Long Context Window
Supports up to 128,000 tokens of context, enabling analysis of long documents, extended codebases, or multi-turn conversations without truncation.
Tool Use & Agents
Handles multi-step agentic workflows including external API calls, web search, and code execution, with post-training improvements specifically targeting tool-calling reliability.
Code Generation
Generates, explains, and debugs code across multiple programming languages, with the option to invoke thinking mode for complex algorithmic problems.
Mathematical Reasoning
Solves multi-step math problems using the model's thinking mode, which produces intermediate reasoning steps before arriving at a final answer.
Mixture-of-Experts Architecture
Uses a MoE design with 671 billion total parameters but only 37 billion activated per forward pass, allowing large model capacity with more efficient inference.
Ready to build with DeepSeek V3.1?
Get Started FreeBenchmark scores
Scores represent accuracy — the percentage of questions answered correctly on each test.
| Benchmark | What it tests | Score |
|---|---|---|
| MMLU-Pro | Expert knowledge across 14 academic disciplines | 83.3% |
| GPQA Diamond | PhD-level science questions (biology, physics, chemistry) | 73.5% |
| LiveCodeBench | Real-world coding tasks from recent competitions | 57.7% |
| HLE | Questions that challenge frontier models across many domains | 6.3% |
| SciCode | Scientific research coding and numerical methods | 36.7% |
Common questions about DeepSeek V3.1
What is the context window for DeepSeek-V3.1?
DeepSeek-V3.1 supports a context window of 128,000 tokens, suitable for long documents, large codebases, and extended multi-turn conversations.
How many parameters does DeepSeek-V3.1 have?
The model has 671 billion total parameters. Due to its Mixture-of-Experts architecture, only 37 billion parameters are activated during any single forward pass.
What is the knowledge cutoff for DeepSeek-V3.1?
Based on the provided metadata, DeepSeek-V3.1's training date is listed as August 2025, which represents the approximate knowledge cutoff for the model.
How does the hybrid thinking mode work?
DeepSeek-V3.1 can operate in a fast non-thinking conversational mode or a slower step-by-step reasoning mode. The mode is selected through prompting rather than by choosing a different model or endpoint.
Is the model available for local deployment?
The model weights for both DeepSeek-V3.1 and DeepSeek-V3.1-Base are available on Hugging Face, making local or self-hosted deployment possible for those with sufficient hardware resources.
What people think about DeepSeek V3.1
Community reception on r/LocalLLaMA has been broadly positive, with multiple threads accumulating hundreds of upvotes and active discussion around the model's release and base weights. Users have praised the hybrid reasoning capability and the availability of the base model weights on Hugging Face.
A notable portion of community discussion involves direct comparisons with other models and questions about practical local deployment given the model's large parameter count. Hardware requirements for running a 671B MoE model locally are a recurring concern in the threads.
deepseek-ai/DeepSeek-V3.1-Base · Hugging Face
GPT 4.5 vs DeepSeek V3.1
DeepSeek v3.1
deepseek-ai/DeepSeek-V3.1 · Hugging Face
Parameters & options
Explore similar models
Start building with DeepSeek V3.1
No API keys required. Create AI-powered workflows with DeepSeek V3.1 in minutes — free.