Token cost library

Model pricing guides

Each guide breaks down a single model’s token pricing, realistic workloads, and FAQs. Use them to educate stakeholders or benchmark multiple providers before you ship.

12 of 37 guides are launch-ready.

How to use this library

Deep dives for finance, product, and engineering

Every guide packages scenario math, cache assumptions, FAQs, and pricing citations for a single model.

Share them when stakeholders need qualitative context—not just the spreadsheet.

Inside each guide you’ll find:

Scenario presets with per-request, monthly, and yearly totals.
Use cases that explain where the model shines (or falls short).
FAQ + pricing citations so legal/finance can audit the numbers.

Need a model that isn’t listed yet? Email hello@tokentally.net and we’ll queue it up.

Launch set

QA’d and ready for prime time. We’ll add the rest of the library once their copy is reviewed.

Anthropic

Claude Opus 4.6

Claude Opus 4.6 Pricing Guide

Last check: Mar 13, 2026

Anthropic

Claude Sonnet 4.6

Claude Sonnet 4.6 Cost Planner

Last check: Mar 13, 2026

DeepSeek

DeepSeek Chat V3.2

DeepSeek Chat Pricing Guide

Last check: Mar 13, 2026

Google

Gemini 2.5 Flash

Gemini 2.5 Flash Pricing

Last check: Mar 13, 2026

Google

Gemini 2.5 Pro

Gemini 2.5 Pro Pricing Guide

Last check: Mar 13, 2026

OpenAI

GPT-4.1

GPT-4.1 Pricing Guide

Last check: Mar 13, 2026

OpenAI

GPT-4o Mini

GPT-4o Mini Cost Breakdown

Last check: Mar 13, 2026

OpenAI

GPT-5

GPT-5 Pricing Guide

Last check: Mar 13, 2026

OpenAI

GPT-5.4

GPT-5.4 Pricing Guide

Last check: Mar 13, 2026

Alibaba Cloud

Qwen3-Max (Global)

Qwen3-Max Pricing Guide

Last check: Mar 13, 2026

Perplexity

Sonar

Perplexity Sonar Pricing Guide

Last check: Mar 13, 2026

Perplexity

Sonar Pro

Perplexity Sonar Pro Pricing Guide

Last check: Mar 13, 2026

Search models or providers

Alibaba Cloud

3 guides

Alibaba Cloud

Qwen3-Max (Global)

Last check: Mar 13, 2026

Qwen3-Max global pricing

Model Alibaba Cloud’s flagship Qwen3-Max spend for long-context research, ops copilots, and strategy assistants.

Read guide →

Alibaba Cloud

Qwen3.5-Flash (Global)

Last check: Mar 13, 2026

Qwen3.5-Flash pricing

Forecast Qwen3.5-Flash spend for mega-scale assistants, notifications, and automation.

Read guide →

Alibaba Cloud

Qwen3.5-Plus (Global)

Last check: Mar 13, 2026

Qwen3.5-Plus token costs

Plan Qwen3.5-Plus deployments that need multimodal inputs and million-token context without premium rates.

Read guide →

Anthropic

5 guides

Anthropic

Claude 3.7 Haiku

Last check: Mar 13, 2026

Claude 3.7 Haiku pricing

Plan Claude Haiku usage for instant support bots, workflows, and summarizers.

Read guide →

Anthropic

Claude 3.7 Sonnet

Last check: Mar 13, 2026

Claude 3.7 Sonnet token costs

Detailed Claude Sonnet pricing with budget examples for reasoning-heavy workflows.

Read guide →

Anthropic

Claude Haiku 4.5

Last check: Mar 13, 2026

Haiku 4.5 cost planner

Forecast ultra-fast Claude Haiku 4.5 usage at $1/$5 per million tokens.

Read guide →

Anthropic

Claude Opus 4.6

Last check: Mar 13, 2026

Opus 4.6 token costs

Budget Anthropic’s flagship Opus 4.6 for agentic workflows, coding, and extended reasoning.

Read guide →

Anthropic

Claude Sonnet 4.6

Last check: Mar 13, 2026

Sonnet 4.6 pricing

Plan Sonnet 4.6 deployments that balance speed, intelligence, and the 1M-token context beta.

Read guide →

DeepSeek

2 guides

DeepSeek

DeepSeek Chat V3.2

Last check: Mar 13, 2026

DeepSeek Chat token costs

Break down DeepSeek Chat's low-cost tiers and estimate cache-hit vs cache-miss spend.

Read guide →

DeepSeek

DeepSeek Reasoner V3.2

Last check: Mar 13, 2026

DeepSeek Reasoner token costs

Plan DeepSeek Reasoner usage for complex reasoning, math, and code automation.

Read guide →

Google

8 guides

Google

Gemini 1.5 Flash

Last check: Mar 13, 2026

Gemini Flash pricing

Find the sweet spot between context length and price with Gemini Flash.

Read guide →

Google

Gemini 1.5 Pro

Last check: Mar 13, 2026

Gemini 1.5 Pro token costs

Budget Gemini 1.5 Pro for long-context video, audio, and document understanding.

Read guide →

Google

Gemini 2.5 Flash

Last check: Mar 13, 2026

Gemini 2.5 Flash costs

Budget Gemini 2.5 Flash at $0.30/$2.50 per million tokens for hybrid reasoning workloads.

Read guide →

Google

Gemini 2.5 Flash-Lite

Last check: Mar 13, 2026

Gemini 2.5 Flash-Lite costs

Model Gemini 2.5 Flash-Lite at $0.10/$0.40 per million tokens for massive scale.

Read guide →

Google

Gemini 2.5 Pro

Last check: Mar 13, 2026

Gemini 2.5 Pro costs

Model Gemini 2.5 Pro at $1.25/$10 per million tokens with 1M context support.

Read guide →

Google

Gemini 3 Flash Preview

Last check: Mar 13, 2026

Gemini 3 Flash costs

Plan Gemini 3 Flash preview deployments at $0.50/$3.00 per million tokens.

Read guide →

Google

Gemini 3.1 Flash-Lite Preview

Last check: Mar 13, 2026

Gemini 3.1 Flash-Lite costs

Budget ultra-low-cost Gemini 3.1 Flash-Lite preview workloads at $0.25/$1.50 per million tokens.

Read guide →

Google

Gemini 3.1 Pro Preview

Last check: Mar 13, 2026

Gemini 3.1 Pro Preview costs

Model the $2/$12 per-million token rates (and long-context uplifts) for Gemini 3.1 Pro Preview.

Read guide →

Model MiniMax M2’s $0.30/$1.20 Bedrock rates for coding copilots and long-context agents.

Read guide →

MiniMax

MiniMax M2.1

Last check: Mar 13, 2026

MiniMax M2.1 pricing

Forecast MiniMax M2.1 workloads that need higher throughput coding and agent loops at the same token rate as M2.

Read guide →

Mistral

1 guide

Mistral

Mistral Large 2

Last check: Mar 13, 2026

Mistral Large 2 pricing

Forecast spend for European-hosted enterprise copilots using Mistral Large 2.

Read guide →

OpenAI

11 guides

OpenAI

GPT-4.1

Last check: Mar 13, 2026

GPT-4.1 cost planner

Understand GPT-4.1’s premium pricing and plan for reasoning-heavy workloads.

Read guide →

OpenAI

GPT-4.1 Mini

Last check: Mar 13, 2026

GPT-4.1 mini pricing

Forecast GPT-4.1 mini spend for drafting, lightweight agents, and experimentation.

Read guide →

OpenAI

GPT-4o

Last check: Mar 13, 2026

GPT-4o pricing explained

Up-to-date GPT-4o token costs plus real-world scenarios for support, creative, and analytics workloads.

Read guide →

OpenAI

GPT-4o Mini

Last check: Mar 13, 2026

GPT-4o mini token costs

See how GPT-4o mini keeps prompt spend down for high-volume assistants and automations.

Read guide →

OpenAI

GPT-5

Last check: Mar 13, 2026

GPT-5 pricing

Model baseline GPT-5 usage (same pricing as GPT-5.1) for broad deployments.

Read guide →

OpenAI

GPT-5 Mini

Last check: Mar 13, 2026

GPT-5 Mini pricing

Budget GPT-5 Mini for cost-sensitive assistants at $0.25/$2 per million tokens.

Read guide →

OpenAI

GPT-5 Nano

Last check: Mar 13, 2026

GPT-5 Nano pricing

Forecast ultra-low-cost GPT-5 Nano usage at $0.05/$0.40 per million tokens.

Read guide →

OpenAI

GPT-5.1

Last check: Mar 13, 2026

GPT-5.1 cost breakdown

Budget GPT-5.1 usage for large copilots, data agents, and enterprise chat flows.

Read guide →

OpenAI

GPT-5.2

Last check: Mar 13, 2026

GPT-5.2 cost planner

Plan GPT-5.2 deployments that need premium reasoning at $1.75/$14 per million tokens.

Read guide →

OpenAI

GPT-5.4

Last check: Mar 13, 2026

GPT-5.4 cost planner

Model OpenAI’s flagship GPT-5.4 across both 1.05M context work and short context inference.

Read guide →

OpenAI

GPT-5.4 Pro

Last check: Mar 13, 2026

GPT-5.4 Pro pricing

Budget OpenAI’s highest-tier GPT-5.4 Pro runs for mission-critical reasoning workloads.

Read guide →

Perplexity

2 guides

Perplexity

Sonar

Last check: Mar 13, 2026

Sonar cost planner

Map Sonar’s $1/$1 token rates plus the low-cost search-context fees for grounded Q&A flows.

Read guide →

Perplexity

Sonar Pro

Last check: Mar 13, 2026

Sonar Pro token costs

Budget Sonar Pro’s deeper research runs, including Pro Search request tiers and premium output pricing.

Read guide →

xAI

2 guides

xAI

Grok 4.1 Fast

Last check: Mar 13, 2026

Grok 4.1 Fast pricing

Map out Grok 4.1 Fast usage for realtime assistants, dashboards, and bulk automation.

Read guide →

xAI

Grok 4.20 Beta (Reasoning)

Last check: Mar 13, 2026

Grok 4.20 Beta token costs

Forecast Grok 4.20 Beta spend for multi-agent research, planning, and production copilots.

Read guide →

FAQ

How often are guides updated?

Each time the pricing dataset syncs, we flag guides whose source models changed and refresh their scenarios + citations.

Can I request a custom guide?

Yes—send workloads, traffic assumptions, or compliance concerns to hello@tokentally.net.

Audit trail

Each guide links back to provider docs and the master methodology so finance/legal teams can verify the math.

We keep copies of every change (citations + timestamps) in version control. If you need supporting evidence for an audit, reach out.

Deep dives for finance, product, and engineering

Alibaba Cloud

Anthropic

DeepSeek

Google

Meta

MiniMax

Mistral

OpenAI

Perplexity

xAI