Skip to main content
MindStudio
Pricing
Blog About
My Workspace
LLMs & Models

LLMs & Models Articles

Browse 420 articles about LLMs & Models.

Kimi K2 Runs 300 Sub-Agents Across 4,000 Steps on 4x H100s — The Story Hermes Found That Everyone Missed

Hermes's content ideation agent surfaced Kimi K2: an open-source system orchestrating 300 sub-agents across 4,000 coordinated steps on 4x H100 GPUs.

Multi-Agent LLMs & Models Automation

OpenAI's Goblin Problem: How RL Training in Codex Infected GPT-5.4 with Creature References Across Model Generations

GPT started mentioning goblins and gremlins in responses. The cause: RL 'nerdy personality' training in Codex scored creature references highly and bled…

GPT & OpenAI LLMs & Models AI Concepts

Scott Aaronson's 2029 Warning: Why the World's Top Quantum Skeptic Is Now Sounding the Alarm

Scott Aaronson — historically skeptical of quantum timelines — now says fault-tolerant quantum computers capable of breaking crypto are expected by ~2029.

Security & Compliance AI Concepts LLMs & Models

How to Use a Smart Orchestrator Model to Direct Cheaper Sub-Agent Models in Claude Code

Use Claude Opus as an orchestrator to plan and review while DeepSeek or Gemma handle heavy lifting—cutting token costs by 5-10x without losing quality.

Multi-Agent Workflows LLMs & Models

What Is the Mistral Medium 3.5 Model? Open-Weight AI Built for Agent Harnesses

Mistral Medium 3.5 is a 128B open-weight model combining reasoning, coding, and instruction-following for agent harnesses like OpenClaw and Hermes.

LLMs & Models Multi-Agent AI Concepts

AI Model Orchestration: How to Use a Smart Model to Direct Cheaper Sub-Agents

Use a frontier model as orchestrator and cheaper models like DeepSeek for heavy lifting. Learn how to build a cost-efficient multi-model agent pipeline.

Multi-Agent LLMs & Models Workflows

Andrej Karpathy on DeepSeek's OCR Paper: Why Pixels May Beat Tokens as AI Inputs

Karpathy called DeepSeek's Oct 2025 OCR paper — 10x text compression, 97% accuracy — a sign that tokenizers are on the way out.

LLMs & Models AI Concepts Optimization

Andrej Karpathy's Verifiability Thesis: Why AI Is Superhuman at Code and Fails at Car Washes

Karpathy's Sequoia talk explains AI's jagged profile: RL only trains where outputs are verifiable. That's why Opus 4.7 refactors codebases but tells you to…

AI Concepts LLMs & Models Prompt Engineering

How to Build a Local AI Stack from Scratch: Ollama to vLLM, Step by Step

From Ollama for daily use to vLLM for serving to TensorRT-LLM for production — here's the complete local AI runtime stack and when to use each layer.

LLMs & Models Workflows Optimization

China Blocks Meta's $2B Manus Acquisition: 4 Reasons the Unwinding Problem Has No Clear Solution

China blocked Meta's $2B Manus deal after employees moved into Meta offices and capital was transferred. There's no clear legal mechanism to unwind it.

Enterprise AI AI Concepts LLMs & Models

Claude Mythos and GPT-5.5 Pass the 'Last Ones' Cyberattack Benchmark: 6 Things You Need to Know

AISI's 32-step corporate network attack sim took human experts 20 hours. Claude Mythos completed it 3 times out of 10. Here's what that means.

Security & Compliance Claude AI Concepts

Cursor SDK + GPT-5.5 Scores 87.2% vs Native Codex's 61.5% — The Harness Is the Bottleneck

Switching GPT-5.5 from Codex's native harness to Cursor's SDK jumped functionality from 61.5% to 87.2% — a 26-point gain from the harness alone.

GPT & OpenAI Comparisons Optimization

DeepSeek V4 Launch: 5 Specs That Threaten Closed Frontier Labs

DeepSeek V4 dropped with 1M token context, open weights, and pricing that undercuts GPT-5.5 by nearly 9x on output tokens.

LLMs & Models AI Concepts GPT & OpenAI

DeepSeek V4 Vision: 10x Cheaper Multimodal AI for Your Workflows

DeepSeek V4's vision model uses 90 KV cache entries vs 870 for Claude—10x cheaper. Learn how to use it in your AI workflows and agents.

LLMs & Models Workflows AI Concepts

DeepSeek V4 Vision Model: 10x KV-Cache Efficiency and 67% Maze Navigation vs GPT-5.4's 50%

DeepSeek's vision variant uses ~90 KV-cache entries per image vs Claude Sonnet 4.6's ~870 — and beats GPT-5.4 on maze navigation 67% to 50%.

LLMs & Models AI Concepts Comparisons

Google AI Co-clinician vs GPT-5.4 Thinking: Which Medical AI Do Physicians Actually Prefer?

In blind physician evaluations, Google's AI Co-clinician beat GPT-5.4 thinking with search 63% to 30%. Here's what drove the gap.

Comparisons LLMs & Models GPT & OpenAI

Google DeepMind AI Co-clinician: 6 Benchmark Results That Redefine Medical AI in 2026

Preferred by physicians 67% of the time, zero critical errors in 97/98 cases, and beating GPT-5.4 thinking 63% to 30% — here's what the numbers actually show.

LLMs & Models AI Concepts Use Cases

Google DeepMind's AI Co-clinician Tops the RXQA Drug Knowledge Benchmark — Beating Every Frontier Model

On RXQA — open FDA drug data, open-ended questions — Google's AI Co-clinician surpassed every other frontier AI system including GPT-5.4 and Claude.

LLMs & Models Comparisons AI Concepts

How to Use OpenRouter with Claude Code: Run Cheaper Models as a Backend

Use OpenRouter to swap Claude's backend for DeepSeek or other models at 2–5% of the cost. A step-by-step guide to setting up the free-claude-code proxy.

Claude LLMs & Models Workflows

Karpathy's Sequoia Talk: 5 Predictions About Agentic Engineering That Should Change How You Work

Karpathy named December 2025 as the inflection point for agentic coding and says he can't remember the last time he corrected the model.

AI Concepts Productivity LLMs & Models