How to stop an AI agent from running up your token bill
Limit Claude Code token usage with a budget that actually kills the session, plus an approval gate so a loop cannot run for hours unwatched.
To limit how many tokens an AI agent like Claude Code burns, set a daily budget that stops the session when the cost crosses it, and put an approval gate in front of the expensive steps so the agent cannot loop for hours while you are away. A warning flag tells you the number is high after the fact. A hard stop and a human gate keep it from climbing in the first place.
Key takeaways
- A max-budget flag is a warning, not a brake. The session keeps going.
- Cost guard reads the running cost from the agent's transcript and kills the session at a daily budget you set.
- An approval gate stops a loop from spending unwatched, because the agent has to wait for you on the steps that matter.
Why the built-in flags do not cap the bill
Most agents expose some setting that looks like a budget. A max tokens value, a context limit, a warning when a run gets expensive. These are useful, but they are bounds on a single request or a notice you read afterward. None of them watch a whole session and pull the plug.
The bill comes from the thing those flags do not cover: a long unattended run. You hand the agent a task, walk away, and it loops. It retries, re-reads the same files, tries a slightly different approach, and keeps calling the model. By the time you look, the run is hours deep and the cost is already spent. Nothing in the run could force it to stop.
Step 1: set a budget that actually kills the session
Pushary's cost guard reads the cost of a session straight from the agent's transcript. Agents like Claude Code write a transcript as they work, and it carries the token usage and cost of the run. Cost guard tracks that running total against a daily budget you configure, and when the session crosses the budget, it is stopped.
That stop is the same mechanism behind the kill switch. The difference is the trigger. A kill switch is you, on your phone, pulling the plug. Cost guard is a number deciding for you, so a runaway loop cannot burn the whole day before you check. There is a fuller walkthrough in the post on killing a session at a daily budget.
Two honest notes on scope. This works where the agent writes a transcript Pushary can read, the CLI-hook agents (Claude Code, Codex, Gemini CLI, Cursor, Hermes). And cost guard stops the session, it does not refund spend that already happened. It caps the worst case. It cannot claw back what already ran.
Step 2: gate the expensive steps so a loop cannot run unwatched
A budget catches the runaway after it has already spent most of the money. The earlier control is an approval gate. With a permission policy, the agent stops and asks before it runs the steps you care about, and waits for your yes or no. While it waits, it is not looping and not spending.
That gate is also what keeps an agent from grinding for hours on its own. If a real task needs a deploy, a destructive command, or anything you flagged, the agent pauses there and pushes the question to your phone. You answer from the lock screen. The token meter does not move until you decide.
The policy matches on tool arguments, not just the tool name, so cheap, safe work runs free and you only gate the steps worth a token spend. Reads stay automatic: the read-only safe floor that auto-approves proven read-only commands like cd, ls, cat, and git status was decided from 1,721 real production questions, so the gate does not nag you about the cheap stuff. The policies docs cover how exact, prefix, and tool rules resolve.
Step 3: keep the receipt
Every stop and every approval lands in your history with a reason. When cost guard kills a session, that is an event you can read later. When you deny an expensive step, that decision is recorded too, along with where you answered it. If you answer to anyone on spend, the reason a run cost what it cost is in the record, not in your memory of the afternoon. This sits in the agent control panel, next to the policies and the approval flow.
Common questions
Does a token limit work with Claude Code specifically?
Yes. Claude Code connects through the Pushary CLI hook, which is what reads its transcript for cost and enforces the budget stop. The Claude Code guide covers the one-time setup. Codex, Gemini CLI, Cursor, and Hermes work the same way.
Will the budget cut off a run mid-edit?
It stops the session when the tracked cost crosses your daily budget. That is a hard line you set on purpose. If you want a softer control, the approval gate is the better fit, because it pauses the agent at specific steps instead of ending the run.
What about Claude Desktop?
The Claude Desktop connector is paste-one-URL and can only notify and ask. It has no hooks, so it cannot enforce a budget stop. Enforced limits are a CLI-hook feature, so use one of the CLI-hook agents for that.
Set the budget once and gate the steps that spend, and the bill stops being a surprise. You can see what the paid agent plans include on pricing, or read the broader case for putting a control layer in front of your AI agents.