Catalog

Search the API model catalog directly.

Use the catalog when you already know the constraints you care about and just need a fast shortlist.

11 models matched. Showing page 1 of 1.

DeepSeekCheapest

DeepSeek Chat

Low-cost reasoning-oriented API model with strong price-performance for coding workloads.

Pricing: Official LiveBenchmarks: Third-Party

Older than 49d

Input$0.28/1M

Output$0.42/1M

Context128K tokens

Latency980 ms

Coveragefull

Details Compare

Google GeminiCheapest

Gemini 2.5 Flash

Google’s high-speed API model for lower-latency product flows and agents.

Pricing: Official LiveBenchmarks: Third-Party

Older than 49d

Input$0.30/1M

Output$2.5/1M

Context1M tokens

Latency520 ms

Coveragefull

Details Compare

xAICheapest

Grok 3 Mini

Cost-efficient xAI reasoning model positioned for speed and general agent tasks.

Pricing: Official LiveBenchmarks: Third-Party

Older than 49d

Input$0.30/1M

Output$0.50/1M

Context131.1K tokens

Latency610 ms

Coveragefull

Details Compare

OpenAICheapest

GPT-4.1 mini

Cost-efficient OpenAI model for high-throughput agent and app workloads.

Pricing: Official LiveBenchmarks: Third-Party

Older than 49d

Input$0.40/1M

Output$1.6/1M

Context1M tokens

Latency780 ms

Coveragefull

Details Compare

MistralCheapest

Mistral Large

High-capability Mistral model for enterprise-grade reasoning and multilingual output.

Pricing: Official LiveBenchmarks: Third-Party

Older than 49d

Input$0.50/1M

Output$1.5/1M

Context128K tokens

Latency920 ms

Coveragefull

Details Compare

GroqFastest

Llama 3.3 70B Versatile

Groq-hosted fast serving tier for Meta’s large open model family.

Pricing: Official LiveBenchmarks: Third-Party

Older than 49d

Input$0.59/1M

Output$0.79/1M

Context131.1K tokens

Latency290 ms

Coveragefull

Details Compare

AnthropicCheapest

Claude Haiku 3.5

Fast Claude model for lightweight app flows and retrieval-heavy workloads.

Pricing: Official LiveBenchmarks: Third-Party

Older than 49d

Input$0.80/1M

Output$4/1M

Context200K tokens

Latency640 ms

Coveragefull

Details Compare

Google GeminiSmartest

Gemini 2.5 Pro

Google’s top-tier reasoning and multimodal-capable API model for complex tasks.

Pricing: Official LiveBenchmarks: Third-Party

Older than 49d

Input$1.25/1M

Output$10/1M

Context1M tokens

Latency1500 ms

Coveragefull

Details Compare

OpenAISmartest

GPT-4.1

Flagship general-purpose model tuned for production-grade reasoning and tool use.

Pricing: Official LiveBenchmarks: Third-Party

Older than 49d

Input$2/1M

Output$8/1M

Context1M tokens

Latency1400 ms

Coveragefull

Details Compare

CohereFastest

Command A

Cohere’s enterprise-oriented generation model tuned for agentic workflows and RAG.

Pricing: Official LiveBenchmarks: Third-Party

Older than 49d

Input$2.5/1M

Output$10/1M

Context128K tokens

Latency1010 ms

Coveragefull

Details Compare

AnthropicCoding

Claude Sonnet 4

Balanced Claude model optimized for production coding and long-context tasks.

Pricing: Official LiveBenchmarks: Third-Party

Older than 49d

Input$3/1M

Output$15/1M

Context200K tokens

Latency1200 ms

Coveragefull

Details Compare