TokenTally · prompt cost intelligence

Budget every AI feature before the bill arrives.

TokenTally ingests live pricing from OpenAI, Anthropic, Google, Meta, xAI, Alibaba, Perplexity, MiniMax, and more. We document the math, publish transparent scenarios, and keep finance + engineering on the same spreadsheet-free page.

Launch live calculator Compare providers See methodology

Dataset refresh

Mar 13 2026

Models tracked

48 providers

Scenario templates

1,200+

Guides published

30+

Default scenario

Baseline GPT‑4.1 workload (auditable + editable)

A typical advanced assistant request uses 400 prompt tokens + 250 completion tokens. At 5,000 monthly requests, that’s 2M prompt tokens and 1.25M completion tokens. TokenTally keeps the math visible so finance, PM, and engineering can all gut-check it.

Per request

$0.027

Per month

$135

Annualized

$1,620

Total tokens / mo

16.3M

Need your own numbers? Paste real prompts into the calculator, toggle cache-hit assumptions, and share a permalink with your team.

What you can plan in minutes

Support & success

Tier-1 chat, CRM copilots, deflection bots, multilingual queues, and cache-aware macros.

Analytics & ops

KPI copilots, anomaly explainers, notebooks, and multi-agent ETL orchestrators.

Creative & RAG

Research assistants, localization sweeps, marketing briefers, and doc-heavy RAG flows.

Latest pricing notes

What changed this month

OpenAI GPT‑5.4 tiers. Added long-context surcharges (2× input / 1.5× output once you cross 272K tokens) with side-by-side comparisons to GPT‑4.1 and GPT‑4o Mini.
Anthropic Claude 4.6 family. New cache pricing, 1M-token beta notes, and scenario presets for Sonnet vs Opus tradeoffs.
Google Gemini 3.x previews. Documented Search/Maps grounding fees and audio token uplifts so you can defend forecasted spend.

How we verify

Every refresh is logged in the methodology page with citations to provider docs, console screenshots, or billing emails. We store evidence before UI changes ship so auditors can retrace the math.

TokenTally never stores pasted prompts from the calculator; token counts are computed in-browser to keep sensitive context private.

Featured guides

Deep dives worth bookmarking

GPT‑4.1 Pricing Guide Advanced RAG & orchestration workloads

Break down compliance copilots, code refactor agents, and analyst loops with transparent per-request math and cache-hit scenarios.

Claude 3.7 Sonnet token costs Legal + strategy copilots

Model long-form briefs, policy reviews, and retrieval-heavy flows while staying under Anthropic’s long-context surcharges.

Gemini 2.5 Flash deployment playbook 1M-context assistants with Search grounding

Understand grounding fees, TPM caps, and when to upgrade to Pro vs. keep Flash for notification digests and analytics bots.

Trust & readiness

Who we are

TokenTally is an operator-led project documenting real LLM budgets. Read the About page for the roadmap and team values.

Policies

Privacy, Terms, and financial disclaimers live on dedicated pages (Privacy, Disclaimer). We spell out how data is handled before ads run.

Contact

Questions, partnerships, or pricing corrections? Email hello@tokentally.net or use the form on the Contact page.

Engineer + finance handshake

Share or bookmark your scenario

Every calculator state is linkable: https://tokentally.net/?model=openai:gpt-4.1&prompt=400&completion=250&rpm=5000. Send it to finance for approvals or embed it in launch docs so costs are never a surprise.

Model my workload Browse all guides