AI Models

Anthropic14

Claude 5 Opus

Anthropic's most intelligent model, setting new standards in agentic coding, complex reasoning, and professional knowledge work.

Text

Anthropic's next-generation Sonnet model — the best combination of speed and intelligence, featuring a 1M token context window, adaptive thinking by default, and major gains in coding and agentic tasks.

Text

Claude 5 Fable

Anthropic's most intelligent model, setting new standards in agentic coding, complex reasoning, and professional knowledge work.

Text

Claude 4.8 Opus

Anthropic's most intelligent model, setting new standards in agentic coding, complex reasoning, and professional knowledge work.

Text

Claude 4.7 Opus

Anthropic's most intelligent model, setting new standards in agentic coding, complex reasoning, and professional knowledge work.

Text

Claude 4.6 Sonnet

Anthropic's most capable Sonnet model yet, delivering near-Opus-level intelligence for coding, agents, and computer use with a 1M token context window.

Text

Claude 4.6 Opus

Anthropic's most intelligent model, setting new standards in agentic coding, complex reasoning, and professional knowledge work.

Text

Claude 4.5 Opus

Anthropic's most powerful flagship model, setting the state of the art in coding, agentic workflows, and real-world computer use.

Text

Claude 4.5 Haiku

Anthropic's fastest and most efficient model, delivering frontier-level coding performance at a fraction of the cost and more than twice the speed of its predecessors.

Text

Claude 4.5 Sonnet

Anthropic's most intelligent model, leading the world in coding, computer use, and complex agentic tasks.

Text

Claude 4.1 Opus

Anthropic's most advanced model, purpose-built for complex coding, agentic tasks, and deep research requiring sustained multi-step reasoning.

Text

Claude 3 Sonnet

Ideal balance of intelligence and speed for enterprise workloads, offering cost-effective performance.

Text

Claude 3 Haiku

Fast, affordable model with strong vision capabilities and performance for diverse enterprise applications.

Text

Claude InstantDeprecated

OpenAI39

Text

GPT 5.5

OpenAI's most capable and efficient frontier model for professional work, combining powerful reasoning with reliable agentic execution at scale.

Text

GPT 5.4 Pro

OpenAI's most capable and efficient frontier model for professional work, combining powerful reasoning with reliable agentic execution at scale.

Text

GPT 5.4

OpenAI's most capable and efficient frontier model for professional work, combining powerful reasoning with reliable agentic execution at scale.

Text

GPT‑5.2 Pro

The most capable model series yet for professional knowledge work.

Text

GPT-5.1

The best model for coding and agentic tasks across domains

Text

o3-pro

OpenAI's most powerful reasoning model, designed to tackle the hardest problems with deeper thinking, multimodal understanding, and autonomous tool use.

Text

GPT-5 Chat

GPT-5 model used in ChatGPT

Text

GPT-5 nano

Fastest, most cost-efficient version of GPT-5

Text

GPT-5 mini

A faster, more cost-efficient version of GPT-5 for well-defined tasks

Text

GPT-5

The best model for coding and agentic tasks across domains

Text

GPT OSS 120B

OpenAI's flagship open-weight reasoning model with 117B parameters, built for powerful coding, math, and agentic tasks under the Apache 2.0 license.

Text

GPT OSS 20B

OpenAI's compact open-weight reasoning model that delivers powerful AI capabilities on consumer hardware, running within just 16GB of memory.

Text

o3

OpenAI's most powerful reasoning model, built to tackle complex problems in coding, math, science, and visual analysis with exceptional depth and accuracy.

Text

o1-pro

OpenAI's most powerful reasoning model, built to tackle the hardest problems with extended thinking time and enhanced compute.

Text

o3-mini

OpenAI's compact reasoning model optimized for fast, cost-efficient problem-solving with exceptional performance in math, coding, and science.

Text

o1

Small cost-efficient reasoning model that’s optimized for coding, math, and science, and supports tools and Structured Outputs.

Vision

GPT-4o Mini Vision

Low-cost, fast model surpassing GPT-3.5 Turbo in textual intelligence and multimodal reasoning.

Vision

GPT-4o Vision

A GPT-4o variant with vision capabilities, processing both text and image inputs.

Vision

GPT-4 Turbo Vision

A variant of GPT-4 with vision capabilities, processing both text and image inputs.

Image

GPT Image 2

OpenAI's latest image generation and editing model, offering state-of-the-art visual quality, precise instruction following, and support for large-scale batch processing.

Image

GPT Image Latest

OpenAI's latest image generation and editing model, offering state-of-the-art visual quality, precise instruction following, and support for large-scale batch processing.

Image

GPT Image 1.5

OpenAI's flagship image generation model featuring precise editing, face preservation, and 4x faster generation speeds.

Image

GPT Image 1

OpenAI's flagship image generation model that creates and edits images with precise instruction-following, fast generation, and detail-preserving edits.

Video

Sora 2 Pro

OpenAI's most powerful text-to-video model, delivering up to 25-second cinematic clips with synchronized audio, lifelike physics, and unprecedented creative control.

Video

Sora 2

OpenAI's Sora 2 is a groundbreaking AI video generation model that creates realistic, physics-aware videos with synchronized audio from text prompts.

Transcription

Whisper

General-purpose speech recognition model

Text to Speech

GPT-4o-mini TTS

A lightweight, cost-efficient text-to-speech model from OpenAI that converts written text into natural-sounding audio.

Text to Speech

TTS HD

AI model converting text to high-quality, natural-sounding speech with TTS-1-HD optimized for quality.

Text to Speech

TTS

OpenAI's fast text-to-speech model, optimized for low-latency voice generation across a wide range of applications.

Text

GPT-4.5Deprecated

Increased capacity and nuance compared to predecessors, offering more accurate text generation.

Text

o1-previewDeprecated

Early preview model using broad general knowledge to reason about hard problems.

Text

o1-miniDeprecated

Faster, cheaper version of o1 adept at coding, math, and science tasks without extensive general knowledge.

Text

GPT-4o MiniDeprecated

Low-cost, low-latency model surpassing GPT-3.5 Turbo in textual intelligence and multimodal reasoning.

Text

GPT-4oDeprecated

Accepting any input and generating any output combination of text, audio, and image for more natural interaction.

Text

GPT-3.5Deprecated

A versatile model excelling in coherent text generation, conversation, and problem-solving.

Transcription

GPT-4o mini TranscribeDeprecated

Speech-to-text model powered by GPT-4o mini

Transcription

GPT-4o TranscribeDeprecated

Speech-to-text model powered by GPT-4o

Instruct

GPT-3Deprecated

Enhanced language understanding and generation for detailed, context-relevant responses.

Instruct

GPT-3.5 InstructDeprecated

Variant tuned to better understand and adhere to user prompts for precise, instruction-following responses.

Google34

Text

Gemini 3.5 Flash

Gemini 3.5 Flash provides sustained frontier-level intelligence optimized for real-world tasks at a higher speed and lower cost. Designed for the agentic era, it excels at sub-agent deployment, multi-step workflows, and long-horizon tasks at scale. This model is particularly effective for rapid agentic loops involving complex coding cycles and iterations.

Text

Gemini 3.1 Pro

Google's frontier reasoning model delivering enhanced software engineering, agentic reliability, and multimodal intelligence across a 1M-token context window.

Text

Gemini 3 Flash

A fast, capable thinking model from Google designed for agentic workflows, coding, and multi-turn chat with near Pro-level reasoning at lower latency.

Text

Gemini 2.5 Flash Lite

Google's fastest and most efficient Gemini 2.5 model, delivering high-quality AI performance at scale with optional reasoning capabilities.

Text

Gemini 2.5 Flash

Gemini 2.5 Flash is a thinking model that offers great, well-rounded capabilities. It is designed to offer a balance between price and performance.

Text

Gemini 2.5 Pro

State-of-the-art thinking model, capable of reasoning over complex problems in code, math, and STEM, as well as analyzing large datasets, codebases, and documents using long context.

Text

Gemma 3.2

Google's most capable open multimodal model, combining image and text understanding with support for 140+ languages and a massive 128K context window.

Vision

Gemini 2.5 Pro Vision

Gemini 2.5 Pro is our most advanced reasoning Gemini model, capable of solving complex problems.

Vision

Gemini 2.5 Flash Vision

Gemini 2.5 Flash is a thinking model that offers great, well-rounded capabilities. It is designed to offer a balance between price and performance.

Image

Gemini 3.1 Flash Lite Image

Google's fastest and cheapest Gemini image model, engineered for velocity and scale where speed and cost are the primary operational constraints. Not optimized for multiple reference inputs or multi-turn sequential editing.

Image

Gemini 3.1 Flash Image

Google's top-ranked image generation and editing model, delivering world-class visual quality with strong text rendering, multi-subject consistency, and search-grounded creativity.

Image

Gemini 3 Pro Image

Google's most advanced image generation and editing model, built on Gemini 3 Pro, delivering professional-grade visual synthesis with multimodal reasoning.

Image

Gemini 2.5 Flash Image

Google's state-of-the-art image generation and editing model that combines powerful creative control with Gemini's world knowledge.

Image

Imagen 3

Google's Imagen 3 is a premium text-to-image model delivering photorealistic quality with exceptional text rendering precision and natural language understanding.

Image

Imagen 3 Fast

A fast variant of Google's Imagen 3 that generates high-quality, realistic images from text prompts with lower latency.

Image

Imagen 4 Ultra

Google's highest quality text-to-image generation model, delivering exceptional fidelity and detail for professional-grade visuals.

Image

Imagen 4 Fast

Google's highest quality image generation model, optimized for speed without sacrificing detail or visual fidelity.

Video

Gemini Omni Flash

Google's high-performance multimodal model for fast video generation, conversational editing, and cinematic control with native audio synthesis.

Video

Veo 3.1 Fast

A fast video generation model from Google designed for quick, high-quality AI video creation via Vertex AI and the Gemini API.

Video

Veo 3.1

Google's Veo 3.1 video generation model, the stable production release succeeding the preview version for high-quality AI video creation.

Video

Veo 3 Fast

Google's fast variant of the Veo 3 video generation model, optimized for speed while delivering high-quality AI-generated video from text prompts.

Video

Veo 3

Google's stable, production-ready video generation model that succeeds the Veo 3.0 preview, offering high-quality AI video synthesis via the Vertex AI platform.

Video

Veo 2

Google's generally available text- and image-to-video model capable of generating detailed, physically realistic, and artistically nuanced videos.

Text to Speech

Gemini 3.1 Flash TTS

The Gemini 3.1 Flash TTS Preview model provides powerful, low-latency speech generation with natural outputs, steerable prompts, and new expressive audio tags for precise narration control.

Text

Gemini 3Deprecated

The best model in the world for multimodal understanding, and our most powerful agentic and vibe-coding model yet, delivering richer visuals and deeper interactivity, all built on a foundation of state-of-the-art reasoning.

Text

Gemini 2.0 Flash ThinkingDeprecated

Combining speed and performance, 2.0 Flash Thinking Experimental excels in science and math, showing its thinking to solve complex problems.

Text

Gemini 2.0 ProDeprecated

An experimental update Gemini 2.0 for coding and complex prompts.

Text

Gemini 1.5 FlashDeprecated

Speedy, cost-effective multimodal model for high-volume applications without compromising quality.

Text

Gemini 1.5 ProDeprecated

Proficient at multimodal tasks and content creation from image, audio, and video inputs.

Text

Gemini 1.0 ProDeprecated

Handles both text and image inputs for content generation and problem-solving.

Text

PaLM 2Deprecated

Advanced language model with high efficiency and accuracy for complex language tasks and creative content generation.

Vision

Gemini 1.5 Flash VisionDeprecated

Fast, cost-effective multimodal model for quality applications at high volume.

Vision

Gemini 1.5 Pro VisionDeprecated

Adept at processing visual and text inputs for multimodal tasks and content creation.

Vision

Gemini 1.0 Pro VisionDeprecated

Handles both text and image inputs for content generation and problem-solving.

Alibaba1

Video

HappyHorse 1.0

Alibaba's unified video generation model supporting text-to-video, image-to-video, reference-to-video with up to 9 reference images, and natural-language video editing with realistic dynamic rendering.

Amazon3

Text

Amazon Nova Pro

Highly capable multimodal model with the best combination of accuracy, speed, and cost for a wide range of tasks.

Text

Amazon Nova Lite

Very low cost multimodal model that is lightning fast for processing image, video, and text inputs.

Text

Amazon Nova Micro

Text-only model that delivers the lowest latency responses at very low cost.

Blackforestlabs12

Image

FLUX.2 [klein] 9B

FLUX.2 [klein] 9B with LoRA support is a high-quality text-to-image model with 9B parameters, offering enhanced realism, crisper text generation, and fast LoRA customization. Ready-to-use REST inference API, best performance, no cold starts, affordable pricing.

Image

FLUX.2 [turbo]

FLUX 1.1 Pro is Black Forest Labs' latest AI image generation model, delivering six times faster generation speeds and superior image quality with enhanced prompt accuracy.

Image

FLUX.1 [dev] Ultra-Fast

FLUX 1.1 Pro is Black Forest Labs' latest AI image generation model, delivering six times faster generation speeds and superior image quality with enhanced prompt accuracy.

Image

FLUX.1 [schnell] LoRA

FLUX 1.1 Pro is Black Forest Labs' latest AI image generation model, delivering six times faster generation speeds and superior image quality with enhanced prompt accuracy.

Image

FLUX.1 [dev] LoRA

FLUX.1 [dev] with LoRA support for fast, high-quality, personalized image generation using pre-trained style adapters.

Image

FLUX.2 [max]

Black Forest Labs' most powerful image generation and editing model, delivering professional-grade output with real-time web context and unmatched visual consistency.

Image

FLUX.2 [dev] LoRA

FLUX.2 [dev] with LoRA support brings personalized, studio-quality text-to-image generation by letting you inject custom styles, characters, and brand identities through stackable adapter modules.

Image

FLUX.2 [pro]

FLUX.2 [pro] is Black Forest Labs' production-grade image generation model delivering state-of-the-art photorealism, multi-reference consistency, and precise brand control at scale.

Image

FLUX.1 Kontext [max]

FLUX.1 Kontext [Max] is a frontier image generation and editing model with advanced text rendering and contextual understanding.

Image

FLUX.1 Kontext [pro]

FLUX.1 Kontext Pro is a powerful image generation and editing model that enables precise, context-aware modifications to existing images using natural language prompts.

Image

FLUX 1.1 [pro] Ultra

A state-of-the-art AI image generation model that produces stunning high-resolution images from text descriptions with exceptional speed and creative control.

Image

FLUX 1.1 [pro]

FLUX 1.1 Pro is Black Forest Labs' latest AI image generation model, delivering six times faster generation speeds and superior image quality with enhanced prompt accuracy.

ByteDance11

Image

Seedream 5.0 Pro

Image

Seedream 4.0

Image

Seedream 5.0 Lite

ByteDance's lightweight Seedream 5.0 image editing model, delivering fast, high-fidelity transformations with exceptional face preservation and multi-reference support.

Image

Seedream 4.5

ByteDance's unified AI model that seamlessly combines high-fidelity image generation and intelligent image editing in a single architecture, with standout text rendering and multi-image composition capabilities.

Video

Seedance 2.0 Fast Turbo

Seedance 2.0 generates Hollywood-grade cinematic videos from text prompts with native audio-visual synchronization, director-level camera and lighting control, and exceptional motion stability. Built on Seed's unified multimodal architecture, it leads on instruction adherence, motion quality, and visual aesthetics.

Video

Seedance 2.0 Fast

Seedance 2.0 generates Hollywood-grade cinematic videos from text prompts with native audio-visual synchronization, director-level camera and lighting control, and exceptional motion stability. Built on Seed's unified multimodal architecture, it leads on instruction adherence, motion quality, and visual aesthetics.

Video

Seedance 2.0

Seedance 2.0 generates Hollywood-grade cinematic videos from text prompts with native audio-visual synchronization, director-level camera and lighting control, and exceptional motion stability. Built on Seed's unified multimodal architecture, it leads on instruction adherence, motion quality, and visual aesthetics.

Video

DreamActor V2

ByteDance DreamActor V2 is a motion transfer model that animates any character — human, cartoon, or creature — by applying movements from a reference video to a static image.

Video

Seedance 1.5 Pro

ByteDance's cinematic image-to-video AI model that generates high-quality 1080p videos with natively synchronized audio from static images in a single pass.

Lip Sync

LatentSync

LatentSync is an end-to-end audio-conditioned lip-sync framework that generates perfectly synchronized talking-head videos from any source video and target audio.

Lip Sync

Omni Human 1.5

ByteDance's OmniHuman 1.5 transforms static images into emotionally expressive, audio-driven digital humans using cognitive simulation and multimodal AI.

Cohere2

Text

Command R+

Cohere's most powerful Command model, optimized for complex enterprise workloads requiring advanced reasoning, RAG, and multi-step tool use.

Text

Command R

Cohere's scalable enterprise LLM optimized for retrieval-augmented generation and tool use with a 128K context window.

DeepSeek8

Text

DeepSeek V4 Pro

DeepSeek V4 Pro is an MoE model with 1.6T total parameters (49B active) and a 1M-token context window. It's built for advanced reasoning, coding, and long-running agent tasks, and performs well on knowledge, math, and software engineering benchmarks.

Text

DeepSeek V4 Flash

DeepSeek V4 Flash is an efficiency-focused MoE model with 284B total parameters (13B active) and a 1M-token context window. It's tuned for fast inference and high-throughput use cases while still holding up on reasoning and coding tasks.

Text

DeepSeek V3.2

DeepSeek-V3.2 is a reasoning-first open large language model that combines efficient sparse attention with advanced agentic capabilities, delivering GPT-5-level performance.

Text

DeepSeek V3.1

DeepSeek-V3.1 is a powerful 671B parameter hybrid AI model that seamlessly switches between fast conversational responses and deep step-by-step reasoning, with significantly improved tool use and agent capabilities.

Text

DeepSeek R1 Turbo

DeepSeek-R1-Turbo is a fast, efficient variant of DeepSeek's powerful R1 reasoning model, optimized for speed without sacrificing strong analytical performance.

Text

DeepSeek-R1

Reasoning LLM from Chinese AI company DeepSeek utilizing Chain of Thought.

Text

DeepSeek-V3

General-purpose LLM from DeepSeek.

Text

DeepSeek-V3Deprecated

General-purpose LLM from Chinese AI company DeepSeek.

ElevenLabs4

Transcription

Scribe v2

ElevenLabs' state-of-the-art speech recognition model, delivering highly accurate transcription across 90+ languages with advanced features like speaker diarization, entity detection, and precise word-level timestamps.

0

Transcription

Scribe v1

ElevenLabs' first-generation speech-to-text transcription model, offering accurate audio transcription across multiple languages.

0

Text to Speech

ElevenLabs TTS

ElevenLabs is a leading voice AI platform that generates emotionally rich, human-like speech across 70+ languages with advanced voice cloning and real-time conversational AI capabilities.

Music

ElevenLabs Music

music_v1 is an AI model designed for music-related generation or analysis tasks.

0

HeyGen1

Lip Sync

HeyGen Video Translate

HeyGen Video Translate is an AI-powered tool that automatically translates and lip-syncs videos into multiple languages with a user-friendly interface.

Ideogram9

Image

Ideogram V4

Ideogram's flagship image generation model with industry-leading text rendering, high-resolution 2K outputs, and powerful remix capabilities.

Image

Ideogram V3 Remix

Ideogram V3 Remix is an AI-powered image editing tool that lets you transform and create variations of existing images while preserving the elements that matter most.

Image

Ideogram V3

Ideogram's most advanced text-to-image model, delivering ultra-realistic imagery with exceptional in-image text rendering and style reference control.

Image

Ideogram Upscale

Ideogram's AI-powered image upscaler that enhances photos and generated images to stunning 8K resolution with sharp, production-ready detail.

Image

Ideogram V2 Remix

An AI image remixing model that transforms existing images into creative variations while preserving their core elements through text prompt guidance.

Image

Ideogram V1 Remix

Ideogram v1 Remix is an AI image generation model designed to creatively reimagine and transform existing images with new styles, concepts, and artistic directions.

Image

Ideogram V2

Ideogram's second-generation text-to-image model, renowned for best-in-class text rendering, typography, and graphic design capabilities.

Image

Ideogram V1

Ideogram v1 is an AI image generation model recognized for its exceptional ability to render accurate, legible text within images.

Vision

Ideogram VisionDeprecated

Ideogram Vision is a multimodal AI model capable of understanding and analyzing images alongside text prompts.

Kling10

Image

Kling Image O3

Kling AI's first image generation model, delivering high-quality text-to-image and image-to-image results with exceptional text rendering and up to 4K resolution support.

Image

Kling Image O1

Kling Omni Image O1 is Kuaishou's multi-reference image generation model that maintains stunning visual consistency across characters, styles, and scenes using up to 10 reference images at once.

Video

Kling 3.0 Pro

Kling

Video

Kling 3.0

Kling

Video

Kling 2.6

Kling

Video

Kling 3.0 Motion Control

Kling 3.0 Standard Motion Control transfers motion from reference videos to animate still images.

Video

Kling O3

Kling Video O3 is Kuaishou's most advanced omni-video model, built for reference-driven, multi-shot cinematic storytelling with consistent characters, native audio, and precise creative control.

Video

Kling 2.6 Pro Motion Control

Kling V2.6 Pro Motion Control transforms static character images into dynamically animated videos by precisely extracting and transferring motion from real reference video clips.

Video

Kling O1

Kling Video O1 is a unified multimodal AI video model that seamlessly combines text, images, and video to generate, edit, and extend footage with remarkable consistency.

Lip Sync

AI Avatar Standard

Kling AI Avatar transforms a single portrait photo into a natural talking-head video driven by any audio track, with precise lip-sync and stable identity preservation.

Krea1

Image

Krea 2 Large

Krea's fast, high-fidelity text-to-image model designed for creative professionals, supporting flexible aspect ratios, creativity control, and optional style reference images.

Lightricks3

Video

LTX-2.3 LoRA

LTX-2.3 LoRA is a fine-tuning extension for Lightricks' LTX-2.3 video generation model, enabling custom character, style, and camera control in AI-generated videos.

Video

LTX-2.3

LTX-2.3 is Lightricks' open-source 22-billion-parameter multimodal model that generates synchronized audio and video in a single pass at up to 4K resolution.

Video

LTX-2 19b

LTX-2 19B is Lightricks' open-source AI model that simultaneously generates cinematic 4K video and synchronized audio from text or image inputs in a single unified workflow.

Luma Labs6

Image

UNI 1.1 Max

A reasoning model that interprets intent before it generates.

Image

UNI 1.1

A reasoning model that interprets intent before it generates.

Image

Photon 1 Flash

Photon Flash 1 is a fast, efficient AI model designed for high-speed inference and responsive applications.

Image

Photon 1

Luma Labs' photorealistic AI image generator built on a proprietary Universal Transformer architecture, delivering studio-quality 1080p images with exceptional prompt understanding and character consistency.

Video

Ray 2

Luma AI's flagship video generation model that creates strikingly realistic, physically accurate video clips from text or image inputs.

Video

Ray Flash 2

Ray Flash 2 is Luma AI's fast video generation model designed for efficient, high-quality text-to-video and image-to-video creation.

MeiGen1

Lip Sync

Infinitetalk

An audio-driven avatar generation model that transforms a single photo or silent video into a lifelike talking or singing video with precise lip sync, natural body movement, and support for videos up to 10 minutes long.

Meta9

Text

Muse Spark 1.1

Meta's multimodal model for agentic and coding work, supporting text, image, video, and PDF input with a 1M+ token context window.

Text

Llama 4 Scout

Llama 4 Scout is a powerful multimodal model with 17 billion active parameters, offering state-of-the-art performance in its class.

Text

Llama 4 Scout

Llama 4 Scout is a powerful multimodal model with 17 billion active parameters, offering state-of-the-art performance in its class.

Text

Llama 4 Maverick

Llama 4 Maverick is a high-performance multimodal model with 17 billion active parameters, supporting 12 languages and offering industry-leading image and text understanding.

Text

Llama 3 8BDeprecated

Text

Llama 3 70BDeprecated

Text

Code LlamaDeprecated

Tailored for code comprehension, generation, and debugging with an instructive design.

Text

Llama-2 70B ChatDeprecated

Provides depth and complexity in language understanding for sophisticated content creation.

Text

Llama-2 13B ChatDeprecated

Balanced model for detailed language processing, offering advanced understanding and generation.

MiniMax3

Video

Hailuo 2.3 Pro

MiniMax Hailuo 2.3 Pro is a video generation model for ultra-clear 1080P output and physics-aware scenes with responsive rendering.

Text to Speech

Minimax Speech 2.8 HD

MiniMax Speech 2.8 HD is a studio-quality text-to-speech model that delivers broadcast-ready, emotionally expressive audio rivaling professional voice actors.

Music

Minimax Music 2.5

MiniMax Music 2.5 is an AI music generation model that delivers professional-grade, structurally precise songs with strikingly realistic sound — no recording studio required.

Mistral17

Text

Ministral 3 3B

Designed for edge deployment, it delivers high performance across diverse hardware, including local setups.

Text

Ministral 3 8B

Built for edge deployment, it delivers high performance across diverse hardware, including local setups.

Text

Ministral 3 14B

Optimized for local deployment, it delivers high performance across diverse hardware, including local setups.

Text

Mistral Large 3

Mistral Large 3 is Mistral’s first mixture-of-experts model since the seminal Mixtral series, and represents a substantial step forward in pretraining at Mistral.

Text

Mistral Medium 3

A versatile AI model designed for professional applications with particular strengths in coding and multimodal understanding. The model can be deployed across various environments, making it ideal for enhancing customer service, personalizing business processes, and analyzing complex datasets across industries.

Text

Mistral Small 3.1 (25.03)

Single-node inference model with 128k context window supporting dozens of languages and 80+ coding languages.

Text

Mixtral 8x7B Instruct

High-quality, efficient sparse model outperforming larger models in speed and benchmarks.

Text

Mistral Small 24.02

Single-node inference model with 128k context window supporting dozens of languages and 80+ coding languages.

Text

Mistral Large 24.07

Single-node inference model with 128k context window supporting dozens of languages and 80+ coding languages.

Text

Mistral Large 24.02

Single-node inference model with 128k context window supporting dozens of languages and 80+ coding languages.

Text

Mistral 7B Instruct

Focused on instruction-based tasks, providing clear, concise responses adhering to user instructions.

Text

Mistral Codestral

Open-weight code generation model assisting developers in 80+ programming languages.

Text

Mistral Nemo

State-of-the-art reasoning, world knowledge, and coding accuracy model designed for global, multilingual applications.

Text

Mistral 8x7bDeprecated

Mixtral 8x7B is a high-performance mixture-of-experts language model from Mistral AI, offering a 32K token context window with efficient, fast inference.

Text

Mixtral 8x22B InstructDeprecated

High-performance, cost-efficient sparse model using 39B active parameters out of 141B.

Text

Mixtral 8x7B InstructDeprecated

High-quality, efficient sparse model outperforming larger models in speed and benchmarks.

Text

Mistral 7B InstructDeprecated

Focused on instruction-based tasks, providing clear, concise responses adhering to user instructions.

Moonshot4

Text

Kimi K3

Text

Kimi K2.7 Code

Moonshot AI's coding-focused agentic model built on Kimi K2.6, delivering stronger long-horizon software engineering performance with roughly 30% lower thinking-token usage.

Text

Kimi K2.6

Kimi K2.6 is an open-source, native multimodal agentic model that advances practical capabilities in long-horizon coding, coding-driven design, proactive autonomous execution, and swarm-based task orchestration.

Text

Kimi K2.5

Moonshot AI's most powerful open-source model — a 1T-parameter native multimodal agent capable of visual understanding, advanced coding, and coordinated multi-agent task execution.

Nvidia2

Text

Nemotron 3 Nano 30B

NVIDIA's compact open-weight hybrid MoE model with 30B total / 3.5B active parameters, delivering strong reasoning and coding performance up to 1M context.

Text

Nemotron 3 Super 120B

NVIDIA's open-weight hybrid Mamba-Transformer MoE with 120B total / 12B active parameters, excelling in agentic reasoning, coding, and long-context tasks up to 1M tokens.

Perplexity9

Text

Sonar Deep Research

Perplexity's exhaustive deep research model that autonomously searches hundreds of sources to deliver expert-level analysis and comprehensive reports.

Text

Sonar Reasoning Pro

Perplexity's premier reasoning model combining DeepSeek R1-powered Chain-of-Thought reasoning with real-time web search for comprehensive, citation-rich answers.

Text

Sonar

Perplexity's lightweight, real-time AI search model that delivers fast, citation-backed answers by connecting directly to the live web.

Text

Sonar Pro

Perplexity's advanced search model delivering deeper reasoning, richer citations, and multi-step query handling for complex research tasks.

Text

Sonar ReasoningDeprecated

Lightweight reasoning offering powered by reasoning models trained with DeepSeek R1.

Text

Sonar Large OnlineDeprecated

Perplexity's latest model family surpassing earlier versions in cost-efficiency, speed, and performance.

Text

Sonar Large ChatDeprecated

Perplexity's latest model family surpassing earlier versions in cost-efficiency, speed, and performance.

Text

Sonar Small OnlineDeprecated

Perplexity's latest model family surpassing earlier versions in cost-efficiency, speed, and performance.

Text

Sonar Small ChatDeprecated

Perplexity's latest model family surpassing earlier versions in cost-efficiency, speed, and performance.

PixVerse1

Video

PixVerse V5.6

PixVerse V5.6 is an AI video generation model that gives you precise creative control over your videos, including the unique ability to define both the starting and ending frames of every scene.

Qwen11

Text

Qwen3.5 Omni Flash

Qwen's latest omni-modal model supporting text, image, video, and audio understanding with 10+ hours of audio comprehension, 60+ input languages, and fast, fluent multimodal interaction.

Text

Qwen3.6 Flash

Alibaba's fast, cost-efficient vision-language model with a 1M token context window, excelling at agentic coding, deep reasoning, and spatial visual understanding.

Text

Qwen3.7 Max

Alibaba's flagship Qwen3.7 model built for the agent-centric era, combining deep thinking, a 1M token context window, and best-in-class agentic capabilities for programming, productivity, and long-term autonomous execution.

Text

Qwen3.7 Plus

Alibaba's cost-effective Qwen3.7 flagship combining strong text generation, upgraded vision-language understanding, and agent-level intelligence for coding, GUI interaction, and tool use.

Text

Qwen3.6-35B-A3B

Qwen3.6-35B-A3B is Alibaba's latest flagship Mixture-of-Experts model.

Text

Qwen3 235B

Alibaba's updated 235B Mixture-of-Experts model delivering top-tier reasoning, coding, and long-context understanding while only activating 22B parameters at a time.

Image

Qwen Image Edit Plus

Qwen Image 2.0 is an advanced text-to-image model with enhanced image quality and improved prompt understanding.

Image

Z Image Turbo Controlnet

Alibaba's Tongyi Lab Z-Image is a 6-billion-parameter image generation foundation model that delivers state-of-the-art visual quality and prompt coherence using an efficient single-stream diffusion transformer architecture.

Image

Qwen 2 Pro

Alibaba's unified image generation and editing model that excels at accurate text rendering, native 2K resolution output, and ranks #1 on AI Arena's blind human evaluation leaderboard.

Image

Qwen Image

Qwen-Image is a state-of-the-art image generation and editing model with exceptional text rendering capabilities, including complex Chinese character generation.

Image

Z Image Turbo

Alibaba's Tongyi Lab Z-Image is a 6-billion-parameter image generation foundation model that delivers state-of-the-art visual quality and prompt coherence using an efficient single-stream diffusion transformer architecture.

Reka3

Text

Reka EdgeDeprecated

Lightweight 7B model outperforming larger models while delivering outsized value.

Text

Reka FlashDeprecated

Fast and capable 21B model outperforming larger models while delivering outsized value.

Text

Reka CoreDeprecated

Largest, most capable model for complex tasks, performing competitively on benchmarks and human evaluations.

Runway1

Video

Gen-4 Turbo

Runway's fast and efficient video generation model, delivering high-quality AI-generated video at speed for professional content creation workflows.

Stability5

Image

SDXL LoRA

Stability AI's SDXL LoRA is a powerful text-to-image model combining a 3.5 billion parameter architecture with flexible LoRA customization for professional-grade, high-resolution image generation.

Image

SDXL

Stability AI's flagship text-to-image model, delivering dramatically improved image quality, composition, and detail over previous Stable Diffusion versions.

Image

Stable Diffusion 3

Stable Diffusion 3 is Stability AI's most advanced text-to-image model, featuring a new Multimodal Diffusion Transformer architecture that delivers superior prompt adherence, typography, and visual quality.

Image

Stable Image Ultra

Stability AI's most advanced text-to-image model, delivering exceptional photorealism and creative detail from short text prompts.

Image

Stable Image Core

Stable Image Core is Stability AI's text-to-image generation model designed to produce high-quality visuals from descriptive text prompts.

Wan12

Image

Wan 2.7 Pro

Alibaba's Wan2.7 image model supporting text-to-image, sequential image series, image editing, and multi-image reference generation with enhanced text rendering and subject consistency.

Image

Wan 2.7

Alibaba's Wan2.7 image model supporting text-to-image, sequential image series, image editing, and multi-image reference generation with enhanced text rendering and subject consistency.

Image

Wan 2.7 Pro

Alibaba's powerful multimodal AI model that generates cinematic 1080p video with native audio synchronization, multi-shot storytelling, and advanced image creation.

Image

Wan 2.7

Alibaba's powerful multimodal AI model that generates cinematic 1080p video with native audio synchronization, multi-shot storytelling, and advanced image creation.

Image

Wan 2.6

Alibaba's powerful multimodal AI model that generates cinematic 1080p video with native audio synchronization, multi-shot storytelling, and advanced image creation.

Image

Wan 2.5

Alibaba's open-source AI video model that generates cinematic 1080p video clips with fully synchronized audio — including dialogue, ambient sound, and music — all in a single step.

Video

Wan 2.7

Alibaba's unified video generation model supporting text-to-video, image-to-video, reference-to-video, and natural-language video editing — with multi-shot storytelling, native audio, and up to 15 seconds of 1080p output.

Video

Wan 2.7

Seedance 2.0 generates Hollywood-grade cinematic videos from text prompts with native audio-visual synchronization, director-level camera and lighting control, and exceptional motion stability. Built on Seed's unified multimodal architecture, it leads on instruction adherence, motion quality, and visual aesthetics.

Video

Wan 2.6

Alibaba's open-source video generation model that creates cinematic 1080p videos up to 15 seconds with native audio and lip sync from a single prompt.

Video

Wan 2.5

Alibaba's open-source AI video model that generates cinematic 1080p clips with fully synchronized audio — dialogue, ambient sound, and music — all in a single step.

Video

Wan 2.2

Wan 2.2 is an open-source video generation model from Alibaba's Tongyi Lab that uses a pioneering Mixture-of-Experts architecture to deliver cinematic-quality text-to-video and image-to-video results.

Image

Wan 2.2Deprecated

Wavespeed6

Image

Chroma

Chroma is an 8.9 billion parameter, uncensored text-to-image model built on FLUX.1-schnell architecture, designed for unrestricted creative expression with professional-grade output quality.

3d_generation

Hunyuan3D V2 Multi-View

Generate a 3D model from front, back, and side reference images with optional textured output.

3d_generation

Tripo3D v2.5

X.ai19

Text

Grok 4.5

Text

Grok Build 0.1

Text

Grok 4.3

Text

Grok 4.20 Reasoning

A beta reasoning variant of xAI's Grok 4.20, designed for enhanced step-by-step thinking and complex problem-solving.

Text

Grok 4.20

A fast, tool-focused variant of Grok 4.20 Beta with reasoning disabled for lower-latency agentic workloads and a massive 2M-token context window.

Text

Grok 4.1 Fast Reasoning

A fast, reasoning-capable AI model built for agentic tool-calling workflows with a massive 2-million-token context window.

Text

Grok 4.1 Fast

xAI's ultra-fast, non-reasoning variant of Grok 4.1 Fast, built for real-time agentic tasks with a massive 2 million token context window.

Text

Grok 4 Fast Reasoning

A fast, cost-efficient reasoning model from xAI that delivers near-flagship performance with dramatically fewer thinking tokens and a massive 2 million token context window.

Text

Grok 4 Fast

xAI's cost-efficient reasoning model that delivers frontier-level intelligence with exceptional speed and token efficiency.

Text

Grok 4

xAI's most powerful reasoning model, trained with massive-scale reinforcement learning to achieve world-leading performance on the hardest academic and scientific benchmarks.

Text

Grok 3 Mini Fast

A lightweight, high-speed reasoning model from xAI that delivers fast, intelligent responses with function calling, web search, and extended thinking capabilities.

Text

Grok 3 Mini

A lightweight, cost-efficient reasoning model from xAI that thinks before it responds, excelling at math, logic puzzles, and quantitative problem-solving.

Text

Grok 3 Fast

A high-speed variant of xAI's Grok 3, optimized for rapid responses while retaining strong language understanding and tool-use capabilities.

Text

Grok 3

xAI's flagship large language model, delivering state-of-the-art reasoning, coding, and creative writing capabilities on par with the best models in the world.

Vision

Grok 4.3 Vision

Vision

Grok 2 Vision

A multimodal vision model from xAI that combines strong image understanding with text reasoning, multilingual support, and enhanced instruction-following.

Image

Grok Imagine Pro

xAI's premium image generation model, delivering higher-quality visuals than the standard grok-imagine-image.

Image

Grok Imagine

xAI's fast and accessible text-to-image generation model, designed to turn written prompts into high-quality images at scale.

Video

Grok Imagine

X.ai's fast, native text-to-video and image-to-video generation model with built-in audio, multiple aspect ratios, and flexible creative modes.

Z.ai6

Text

AI Models

Anthropic14

Claude 5 Opus

Claude 5 Sonnet

Claude 5 Fable

Claude 4.8 Opus

Claude 4.7 Opus

Claude 4.6 Sonnet

Claude 4.6 Opus

Claude 4.5 Opus

Claude 4.5 Haiku

Claude 4.5 Sonnet

Claude 4.1 Opus

Claude 3 Sonnet

Claude 3 Haiku

Claude InstantDeprecated

OpenAI39

GPT 5.5

GPT 5.4 Pro

GPT 5.4

GPT‑5.2 Pro

GPT-5.1

o3-pro

GPT-5 Chat

GPT-5 nano

GPT-5 mini

GPT-5

GPT OSS 120B

GPT OSS 20B

o3

o1-pro

o3-mini

o1

GPT-4o Mini Vision

GPT-4o Vision

GPT-4 Turbo Vision

GPT Image 2

GPT Image Latest

GPT Image 1.5

GPT Image 1

Sora 2 Pro

Sora 2

Whisper

GPT-4o-mini TTS

TTS HD

TTS

GPT-4.5Deprecated

o1-previewDeprecated

o1-miniDeprecated

GPT-4o MiniDeprecated

GPT-4oDeprecated

GPT-3.5Deprecated

GPT-4o mini TranscribeDeprecated

GPT-4o TranscribeDeprecated

GPT-3Deprecated

GPT-3.5 InstructDeprecated

Google34

Gemini 3.5 Flash

Gemini 3.1 Pro

Gemini 3 Flash

Gemini 2.5 Flash Lite

Gemini 2.5 Flash

Gemini 2.5 Pro

Gemma 3.2

Gemini 2.5 Pro Vision

Gemini 2.5 Flash Vision

Gemini 3.1 Flash Lite Image

Gemini 3.1 Flash Image

Gemini 3 Pro Image

Gemini 2.5 Flash Image

Imagen 3

Imagen 3 Fast

Imagen 4 Ultra

Imagen 4 Fast

Gemini Omni Flash

Veo 3.1 Fast

Veo 3.1

Veo 3 Fast

Veo 3

Veo 2