Token cost library
Model pricing guides
Each guide breaks down a single model’s token pricing, realistic workloads, and FAQs. Use them to educate stakeholders or benchmark multiple providers before you ship.
12 of 37 guides are launch-ready.
How to use this library
Deep dives for finance, product, and engineering
Every guide packages scenario math, cache assumptions, FAQs, and pricing citations for a single model.
Share them when stakeholders need qualitative context—not just the spreadsheet.
Inside each guide you’ll find:
- Scenario presets with per-request, monthly, and yearly totals.
- Use cases that explain where the model shines (or falls short).
- FAQ + pricing citations so legal/finance can audit the numbers.
Need a model that isn’t listed yet? Email hello@tokentally.net and we’ll queue it up.
Launch set
QA’d and ready for prime time. We’ll add the rest of the library once their copy is reviewed.
Anthropic
Claude Opus 4.6
Claude Opus 4.6 Pricing Guide
Last check: Mar 13, 2026
Anthropic
Claude Sonnet 4.6
Claude Sonnet 4.6 Cost Planner
Last check: Mar 13, 2026
DeepSeek
DeepSeek Chat V3.2
DeepSeek Chat Pricing Guide
Last check: Mar 13, 2026
Gemini 2.5 Flash
Gemini 2.5 Flash Pricing
Last check: Mar 13, 2026
Gemini 2.5 Pro
Gemini 2.5 Pro Pricing Guide
Last check: Mar 13, 2026
OpenAI
GPT-4.1
GPT-4.1 Pricing Guide
Last check: Mar 13, 2026
OpenAI
GPT-4o Mini
GPT-4o Mini Cost Breakdown
Last check: Mar 13, 2026
OpenAI
GPT-5
GPT-5 Pricing Guide
Last check: Mar 13, 2026
OpenAI
GPT-5.4
GPT-5.4 Pricing Guide
Last check: Mar 13, 2026
Alibaba Cloud
Qwen3-Max (Global)
Qwen3-Max Pricing Guide
Last check: Mar 13, 2026
Perplexity
Sonar
Perplexity Sonar Pricing Guide
Last check: Mar 13, 2026
Perplexity
Sonar Pro
Perplexity Sonar Pro Pricing Guide
Last check: Mar 13, 2026
Alibaba Cloud
3 guidesAlibaba Cloud
Qwen3-Max (Global)
Last check: Mar 13, 2026
Qwen3-Max global pricing
Model Alibaba Cloud’s flagship Qwen3-Max spend for long-context research, ops copilots, and strategy assistants.
Read guide →Alibaba Cloud
Qwen3.5-Flash (Global)
Last check: Mar 13, 2026
Qwen3.5-Flash pricing
Forecast Qwen3.5-Flash spend for mega-scale assistants, notifications, and automation.
Read guide →Alibaba Cloud
Qwen3.5-Plus (Global)
Last check: Mar 13, 2026
Qwen3.5-Plus token costs
Plan Qwen3.5-Plus deployments that need multimodal inputs and million-token context without premium rates.
Read guide →Anthropic
5 guidesAnthropic
Claude 3.7 Haiku
Last check: Mar 13, 2026
Claude 3.7 Haiku pricing
Plan Claude Haiku usage for instant support bots, workflows, and summarizers.
Read guide →Anthropic
Claude 3.7 Sonnet
Last check: Mar 13, 2026
Claude 3.7 Sonnet token costs
Detailed Claude Sonnet pricing with budget examples for reasoning-heavy workflows.
Read guide →Anthropic
Claude Haiku 4.5
Last check: Mar 13, 2026
Haiku 4.5 cost planner
Forecast ultra-fast Claude Haiku 4.5 usage at $1/$5 per million tokens.
Read guide →Anthropic
Claude Opus 4.6
Last check: Mar 13, 2026
Opus 4.6 token costs
Budget Anthropic’s flagship Opus 4.6 for agentic workflows, coding, and extended reasoning.
Read guide →Anthropic
Claude Sonnet 4.6
Last check: Mar 13, 2026
Sonnet 4.6 pricing
Plan Sonnet 4.6 deployments that balance speed, intelligence, and the 1M-token context beta.
Read guide →DeepSeek
2 guidesDeepSeek
DeepSeek Chat V3.2
Last check: Mar 13, 2026
DeepSeek Chat token costs
Break down DeepSeek Chat's low-cost tiers and estimate cache-hit vs cache-miss spend.
Read guide →DeepSeek
DeepSeek Reasoner V3.2
Last check: Mar 13, 2026
DeepSeek Reasoner token costs
Plan DeepSeek Reasoner usage for complex reasoning, math, and code automation.
Read guide →Gemini 1.5 Flash
Last check: Mar 13, 2026
Gemini Flash pricing
Find the sweet spot between context length and price with Gemini Flash.
Read guide →Gemini 1.5 Pro
Last check: Mar 13, 2026
Gemini 1.5 Pro token costs
Budget Gemini 1.5 Pro for long-context video, audio, and document understanding.
Read guide →Gemini 2.5 Flash
Last check: Mar 13, 2026
Gemini 2.5 Flash costs
Budget Gemini 2.5 Flash at $0.30/$2.50 per million tokens for hybrid reasoning workloads.
Read guide →Gemini 2.5 Flash-Lite
Last check: Mar 13, 2026
Gemini 2.5 Flash-Lite costs
Model Gemini 2.5 Flash-Lite at $0.10/$0.40 per million tokens for massive scale.
Read guide →Gemini 2.5 Pro
Last check: Mar 13, 2026
Gemini 2.5 Pro costs
Model Gemini 2.5 Pro at $1.25/$10 per million tokens with 1M context support.
Read guide →Gemini 3 Flash Preview
Last check: Mar 13, 2026
Gemini 3 Flash costs
Plan Gemini 3 Flash preview deployments at $0.50/$3.00 per million tokens.
Read guide →Gemini 3.1 Flash-Lite Preview
Last check: Mar 13, 2026
Gemini 3.1 Flash-Lite costs
Budget ultra-low-cost Gemini 3.1 Flash-Lite preview workloads at $0.25/$1.50 per million tokens.
Read guide →Gemini 3.1 Pro Preview
Last check: Mar 13, 2026
Gemini 3.1 Pro Preview costs
Model the $2/$12 per-million token rates (and long-context uplifts) for Gemini 3.1 Pro Preview.
Read guide →MiniMax
2 guidesMiniMax
MiniMax M2
Last check: Mar 13, 2026
MiniMax M2 cost planner
Model MiniMax M2’s $0.30/$1.20 Bedrock rates for coding copilots and long-context agents.
Read guide →MiniMax
MiniMax M2.1
Last check: Mar 13, 2026
MiniMax M2.1 pricing
Forecast MiniMax M2.1 workloads that need higher throughput coding and agent loops at the same token rate as M2.
Read guide →OpenAI
11 guidesOpenAI
GPT-4.1
Last check: Mar 13, 2026
GPT-4.1 cost planner
Understand GPT-4.1’s premium pricing and plan for reasoning-heavy workloads.
Read guide →OpenAI
GPT-4.1 Mini
Last check: Mar 13, 2026
GPT-4.1 mini pricing
Forecast GPT-4.1 mini spend for drafting, lightweight agents, and experimentation.
Read guide →OpenAI
GPT-4o
Last check: Mar 13, 2026
GPT-4o pricing explained
Up-to-date GPT-4o token costs plus real-world scenarios for support, creative, and analytics workloads.
Read guide →OpenAI
GPT-4o Mini
Last check: Mar 13, 2026
GPT-4o mini token costs
See how GPT-4o mini keeps prompt spend down for high-volume assistants and automations.
Read guide →OpenAI
GPT-5
Last check: Mar 13, 2026
GPT-5 pricing
Model baseline GPT-5 usage (same pricing as GPT-5.1) for broad deployments.
Read guide →OpenAI
GPT-5 Mini
Last check: Mar 13, 2026
GPT-5 Mini pricing
Budget GPT-5 Mini for cost-sensitive assistants at $0.25/$2 per million tokens.
Read guide →OpenAI
GPT-5 Nano
Last check: Mar 13, 2026
GPT-5 Nano pricing
Forecast ultra-low-cost GPT-5 Nano usage at $0.05/$0.40 per million tokens.
Read guide →OpenAI
GPT-5.1
Last check: Mar 13, 2026
GPT-5.1 cost breakdown
Budget GPT-5.1 usage for large copilots, data agents, and enterprise chat flows.
Read guide →OpenAI
GPT-5.2
Last check: Mar 13, 2026
GPT-5.2 cost planner
Plan GPT-5.2 deployments that need premium reasoning at $1.75/$14 per million tokens.
Read guide →OpenAI
GPT-5.4
Last check: Mar 13, 2026
GPT-5.4 cost planner
Model OpenAI’s flagship GPT-5.4 across both 1.05M context work and short context inference.
Read guide →OpenAI
GPT-5.4 Pro
Last check: Mar 13, 2026
GPT-5.4 Pro pricing
Budget OpenAI’s highest-tier GPT-5.4 Pro runs for mission-critical reasoning workloads.
Read guide →Perplexity
2 guidesPerplexity
Sonar
Last check: Mar 13, 2026
Sonar cost planner
Map Sonar’s $1/$1 token rates plus the low-cost search-context fees for grounded Q&A flows.
Read guide →Perplexity
Sonar Pro
Last check: Mar 13, 2026
Sonar Pro token costs
Budget Sonar Pro’s deeper research runs, including Pro Search request tiers and premium output pricing.
Read guide →xAI
2 guidesxAI
Grok 4.1 Fast
Last check: Mar 13, 2026
Grok 4.1 Fast pricing
Map out Grok 4.1 Fast usage for realtime assistants, dashboards, and bulk automation.
Read guide →xAI
Grok 4.20 Beta (Reasoning)
Last check: Mar 13, 2026
Grok 4.20 Beta token costs
Forecast Grok 4.20 Beta spend for multi-agent research, planning, and production copilots.
Read guide →FAQ
How often are guides updated?
Each time the pricing dataset syncs, we flag guides whose source models changed and refresh their scenarios + citations.
Can I request a custom guide?
Yes—send workloads, traffic assumptions, or compliance concerns to hello@tokentally.net.
Audit trail
Each guide links back to provider docs and the master methodology so finance/legal teams can verify the math.
We keep copies of every change (citations + timestamps) in version control. If you need supporting evidence for an audit, reach out.