Local & Open-Weight Models
Deployment-focused content for open-weight models — running Gemma, Qwen, etc. locally, on phones, laptops, edge devices. Setup guides, hardware requirements, deployment patterns. Single-model reviews and explainers go under AI Model Reviews & Comparisons instead.
How to Run Local AI on AMD: ROCm, LM Studio, Ollama, and ComfyUI Setup
AMD's ROCm platform now supports PyTorch, Ollama, LM Studio, and ComfyUI out of the box. Here's how to set up a full local AI stack on AMD hardware.
Running Local AI on AMD: ROCm, Ollama, and LM Studio Performance in 2026
AMD's ROCm platform now supports PyTorch, Ollama, LM Studio, and ComfyUI out of the box. Learn what's possible with a 32GB Radeon GPU for local AI workloads.
What Is ROCm? AMD's Open Compute Platform for AI and Deep Learning
ROCm is AMD's answer to CUDA—and it's finally production-ready. Learn how ROCm enables LLM inference, fine-tuning, and image generation on AMD GPUs.
Local AI vs Cloud AI in 2026: When to Run Models on Your Own Hardware
Open-weight models are 3–6 months behind frontier. Learn when local AI makes sense for cost, privacy, and agentic workloads vs paying for cloud APIs.
How to Run Open-Weight AI Models Locally with Ollama and LM Studio
Run Qwen 3.6, Gemma, and DeepSeek locally with Ollama and LM Studio. This guide covers setup, quantization, and performance on consumer hardware.
How to Add Vision to a Local AI Agent Without Blowing Your VRAM
Use a small vision model like MiniCPM-V as a specialized sub-agent to handle screenshots and PDFs without loading a full multimodal LLM.
What Is MiniCPM-V 4.6? The 1.3B Vision Model Built for Local AI Agents
MiniCPM-V 4.6 is a 1.3B parameter vision model that beats larger models on token efficiency. Here's how to use it in local agent workflows.
How to Add Vision Capabilities to a Local AI Agent Without Blowing Your VRAM
Running a small LLM locally but need vision? Learn how to pair a lightweight vision model like MiniCPM-V with your text agent to handle screenshots and PDFs.
What Is MiniCPM-V 4.6? A 1.3B Vision Model Built for Local AI Agents
MiniCPM-V 4.6 is a 1.3B parameter vision model that beats larger models on visual reasoning benchmarks. Learn why it's ideal for local agentic vision tasks.
What Is Mercury 2? The Diffusion-Based Language Model That Runs 5x Faster Than Claude Haiku
Mercury 2 from Inception Labs applies image diffusion methods to language generation, producing outputs 5x faster than Claude Haiku. Here's how it works.
What Is Mercury 2? The Diffusion-Based Language Model That Runs 5x Faster
Mercury 2 from Inception Labs uses a diffusion process instead of autoregressive token generation, claiming 5x faster speeds than Claude Haiku.
How to Use Free Alternatives to Claude Code: OpenRouter, NVIDIA NIM, and Ollama
Run Claude Code's interface with DeepSeek, GLM-4.7, or local models via a free proxy. Get 80–90% of Opus quality at 2–5% of the cost.
DeepSeek's 'Thinking with Visual Primitives': 5 Technical Breakthroughs in the Paper That Briefly Disappeared
DeepSeek's vision paper was published then pulled. Here are 5 key technical details — including inline bounding-box tokens and a 7,000x compression ratio.
DeepSeek V4 Flash vs Claude Sonnet 4.6: Which Model Is Best for AI Agent Workflows?
Compare DeepSeek V4 Flash and Claude Sonnet 4.6 on cost, speed, and quality for agentic coding, automation, and multi-step workflows.
DeepSeek Vision's 7,000x Image Compression Pipeline: From 756px Input to 81 KV Cache Entries
DeepSeek's vision model compresses a 756x756 image through four stages down to 81 KV cache entries — a ~7,000x total compression ratio. Here's each step.
DeepSeek Vision Beats GPT-5.4 by 17 Points on Maze Navigation — The Topological Reasoning Benchmark Explained
On maze navigation, DeepSeek's vision model scores 67% vs. GPT-5.4's 50% — a 17-point gap driven by inline bounding-box spatial reasoning.
DeepSeek Vision vs. Claude Sonnet 4.6 vs. Gemini Flash 3: Which Vision Model Uses 10x Less KV Cache?
DeepSeek's vision model uses ~90 KV cache entries per image vs. ~870 for Sonnet 4.6 and ~1,000 for Gemini Flash 3. Here's what that means for cost.
How to Use Free Claude Code Alternatives: OpenRouter, NVIDIA NIM, and Ollama Setup Guide
Run Claude Code with DeepSeek, GLM, or Gemma models via OpenRouter, NVIDIA NIM, or Ollama to cut costs by up to 99% with the free-claude-code proxy.
What Is the Mistral Medium 3.5 Model? Open-Weight AI Built for Agent Harnesses
Mistral Medium 3.5 is a 128B open-weight model combining reasoning, coding, and instruction-following for agent harnesses like OpenClaw and Hermes.
How to Build a Local AI Stack from Scratch: Ollama to vLLM, Step by Step
From Ollama for daily use to vLLM for serving to TensorRT-LLM for production — here's the complete local AI runtime stack and when to use each layer.
DeepSeek V4 Launch: 5 Specs That Threaten Closed Frontier Labs
DeepSeek V4 dropped with 1M token context, open weights, and pricing that undercuts GPT-5.5 by nearly 9x on output tokens.
DeepSeek V4 Vision: 10x Cheaper Multimodal AI for Your Workflows
DeepSeek V4's vision model uses 90 KV cache entries vs 870 for Claude—10x cheaper. Learn how to use it in your AI workflows and agents.
DeepSeek V4 Vision Model: 10x KV-Cache Efficiency and 67% Maze Navigation vs GPT-5.4's 50%
DeepSeek's vision variant uses ~90 KV-cache entries per image vs Claude Sonnet 4.6's ~870 — and beats GPT-5.4 on maze navigation 67% to 50%.
Mac Mini M4 Pro vs RTX 5090 vs DGX Spark: Which Local AI Hardware Is Right for You in 2026?
Mac mini M4 Pro at 64GB, RTX 5090 at 32GB GDDR7, or DGX Spark at 128GB unified memory — here's the honest hardware comparison for running AI models locally.