Skip to main content
MindStudio
Pricing
Blog About
My Workspace
LLMs & Models

LLMs & Models Articles

Browse 482 articles about LLMs & Models.

Google DeepMind's AI Co-Clinician: 4 Benchmark Results That Surprised Even the Evaluators

AI Co-clinician beat GPT-5.4 63% to 30%, hit zero critical errors in 97 of 98 queries, and matched physicians in 68 of 140 consultation dimensions.

Gemini LLMs & Models AI Concepts

Harvard and Stanford Physicians Built the Toughest Medical AI Benchmark Yet — Here's How AI Co-Clinician Scored

DeepMind's evaluation used 140 consultation dimensions, 20 synthetic clinical scenarios, and 10 real physicians as role-playing patients. Here are the results.

Gemini LLMs & Models AI Concepts

Kimi K2 Runs 300 Sub-Agents Across 4,000 Steps on 4x H100s — The Story Hermes Found That Everyone Missed

Hermes's content ideation agent surfaced Kimi K2: an open-source system orchestrating 300 sub-agents across 4,000 coordinated steps on 4x H100 GPUs.

Multi-Agent LLMs & Models Automation

OpenAI's Goblin Problem: How RL Training in Codex Infected GPT-5.4 with Creature References Across Model Generations

GPT started mentioning goblins and gremlins in responses. The cause: RL 'nerdy personality' training in Codex scored creature references highly and bled…

GPT & OpenAI LLMs & Models AI Concepts

Scott Aaronson's 2029 Warning: Why the World's Top Quantum Skeptic Is Now Sounding the Alarm

Scott Aaronson — historically skeptical of quantum timelines — now says fault-tolerant quantum computers capable of breaking crypto are expected by ~2029.

Security & Compliance AI Concepts LLMs & Models

How to Use a Smart Orchestrator Model to Direct Cheaper Sub-Agent Models in Claude Code

Use Claude Opus as an orchestrator to plan and review while DeepSeek or Gemma handle heavy lifting—cutting token costs by 5-10x without losing quality.

Multi-Agent Workflows LLMs & Models

What Is the Mistral Medium 3.5 Model? Open-Weight AI Built for Agent Harnesses

Mistral Medium 3.5 is a 128B open-weight model combining reasoning, coding, and instruction-following for agent harnesses like OpenClaw and Hermes.

LLMs & Models Multi-Agent AI Concepts

AI Model Orchestration: How to Use a Smart Model to Direct Cheaper Sub-Agents

Use a frontier model as orchestrator and cheaper models like DeepSeek for heavy lifting. Learn how to build a cost-efficient multi-model agent pipeline.

Multi-Agent LLMs & Models Workflows

Andrej Karpathy on DeepSeek's OCR Paper: Why Pixels May Beat Tokens as AI Inputs

Karpathy called DeepSeek's Oct 2025 OCR paper — 10x text compression, 97% accuracy — a sign that tokenizers are on the way out.

LLMs & Models AI Concepts Optimization

Andrej Karpathy's Verifiability Thesis: Why AI Is Superhuman at Code and Fails at Car Washes

Karpathy's Sequoia talk explains AI's jagged profile: RL only trains where outputs are verifiable. That's why Opus 4.7 refactors codebases but tells you to…

AI Concepts LLMs & Models Prompt Engineering

How to Build a Local AI Stack from Scratch: Ollama to vLLM, Step by Step

From Ollama for daily use to vLLM for serving to TensorRT-LLM for production — here's the complete local AI runtime stack and when to use each layer.

LLMs & Models Workflows Optimization

China Blocks Meta's $2B Manus Acquisition: 4 Reasons the Unwinding Problem Has No Clear Solution

China blocked Meta's $2B Manus deal after employees moved into Meta offices and capital was transferred. There's no clear legal mechanism to unwind it.

Enterprise AI AI Concepts LLMs & Models

Claude Mythos and GPT-5.5 Pass the 'Last Ones' Cyberattack Benchmark: 6 Things You Need to Know

AISI's 32-step corporate network attack sim took human experts 20 hours. Claude Mythos completed it 3 times out of 10. Here's what that means.

Security & Compliance Claude AI Concepts

Cursor SDK + GPT-5.5 Scores 87.2% vs Native Codex's 61.5% — The Harness Is the Bottleneck

Switching GPT-5.5 from Codex's native harness to Cursor's SDK jumped functionality from 61.5% to 87.2% — a 26-point gain from the harness alone.

GPT & OpenAI Comparisons Optimization

DeepSeek V4 Launch: 5 Specs That Threaten Closed Frontier Labs

DeepSeek V4 dropped with 1M token context, open weights, and pricing that undercuts GPT-5.5 by nearly 9x on output tokens.

LLMs & Models AI Concepts GPT & OpenAI

DeepSeek V4 Vision: 10x Cheaper Multimodal AI for Your Workflows

DeepSeek V4's vision model uses 90 KV cache entries vs 870 for Claude—10x cheaper. Learn how to use it in your AI workflows and agents.

LLMs & Models Workflows AI Concepts

DeepSeek V4 Vision Model: 10x KV-Cache Efficiency and 67% Maze Navigation vs GPT-5.4's 50%

DeepSeek's vision variant uses ~90 KV-cache entries per image vs Claude Sonnet 4.6's ~870 — and beats GPT-5.4 on maze navigation 67% to 50%.

LLMs & Models AI Concepts Comparisons

Google AI Co-clinician vs GPT-5.4 Thinking: Which Medical AI Do Physicians Actually Prefer?

In blind physician evaluations, Google's AI Co-clinician beat GPT-5.4 thinking with search 63% to 30%. Here's what drove the gap.

Comparisons LLMs & Models GPT & OpenAI

Google DeepMind AI Co-clinician: 6 Benchmark Results That Redefine Medical AI in 2026

Preferred by physicians 67% of the time, zero critical errors in 97/98 cases, and beating GPT-5.4 thinking 63% to 30% — here's what the numbers actually show.

LLMs & Models AI Concepts Use Cases

Google DeepMind's AI Co-clinician Tops the RXQA Drug Knowledge Benchmark — Beating Every Frontier Model

On RXQA — open FDA drug data, open-ended questions — Google's AI Co-clinician surpassed every other frontier AI system including GPT-5.4 and Claude.

LLMs & Models Comparisons AI Concepts