Pushary
Blog
Guides

How to stop an AI agent from running up your token bill

Limit Claude Code token usage with a budget that actually kills the session, plus an approval gate so a loop cannot run for hours unwatched.

AG
Aadil Ghani
Founder, Pushary
Jun 15, 20264 min read
Share

To limit how many tokens an AI agent like Claude Code burns, set a daily budget that stops the session when the cost crosses it, and put an approval gate in front of the expensive steps so the agent cannot loop for hours while you are away. A warning flag tells you the number is high after the fact. A hard stop and a human gate keep it from climbing in the first place.

Key takeaways

  • A max-budget flag is a warning, not a brake. The session keeps going.
  • Cost guard reads the running cost from the agent's transcript and kills the session at a daily budget you set.
  • An approval gate stops a loop from spending unwatched, because the agent has to wait for you on the steps that matter.

Why the built-in flags do not cap the bill

Most agents expose some setting that looks like a budget. A max tokens value, a context limit, a warning when a run gets expensive. These are useful, but they are bounds on a single request or a notice you read afterward. None of them watch a whole session and pull the plug.

The bill comes from the thing those flags do not cover: a long unattended run. You hand the agent a task, walk away, and it loops. It retries, re-reads the same files, tries a slightly different approach, and keeps calling the model. By the time you look, the run is hours deep and the cost is already spent. Nothing in the run could force it to stop.

Step 1: set a budget that actually kills the session

Pushary's cost guard reads the cost of a session straight from the agent's transcript. Agents like Claude Code write a transcript as they work, and it carries the token usage and cost of the run. Cost guard tracks that running total against a daily budget you configure, and when the session crosses the budget, it is stopped.

That stop is the same mechanism behind the kill switch. The difference is the trigger. A kill switch is you, on your phone, pulling the plug. Cost guard is a number deciding for you, so a runaway loop cannot burn the whole day before you check. There is a fuller walkthrough in the post on killing a session at a daily budget.

Two honest notes on scope. This works where the agent writes a transcript Pushary can read, the CLI-hook agents (Claude Code, Codex, Gemini CLI, Cursor, Hermes). And cost guard stops the session, it does not refund spend that already happened. It caps the worst case. It cannot claw back what already ran.

Step 2: gate the expensive steps so a loop cannot run unwatched

A budget catches the runaway after it has already spent most of the money. The earlier control is an approval gate. With a permission policy, the agent stops and asks before it runs the steps you care about, and waits for your yes or no. While it waits, it is not looping and not spending.

That gate is also what keeps an agent from grinding for hours on its own. If a real task needs a deploy, a destructive command, or anything you flagged, the agent pauses there and pushes the question to your phone. You answer from the lock screen. The token meter does not move until you decide.

The policy matches on tool arguments, not just the tool name, so cheap, safe work runs free and you only gate the steps worth a token spend. Reads stay automatic: the read-only safe floor that auto-approves proven read-only commands like cd, ls, cat, and git status was decided from 1,721 real production questions, so the gate does not nag you about the cheap stuff. The policies docs cover how exact, prefix, and tool rules resolve.

Step 3: keep the receipt

Every stop and every approval lands in your history with a reason. When cost guard kills a session, that is an event you can read later. When you deny an expensive step, that decision is recorded too, along with where you answered it. If you answer to anyone on spend, the reason a run cost what it cost is in the record, not in your memory of the afternoon. This sits in the agent control panel, next to the policies and the approval flow.

Common questions

Does a token limit work with Claude Code specifically?

Yes. Claude Code connects through the Pushary CLI hook, which is what reads its transcript for cost and enforces the budget stop. The Claude Code guide covers the one-time setup. Codex, Gemini CLI, Cursor, and Hermes work the same way.

Will the budget cut off a run mid-edit?

It stops the session when the tracked cost crosses your daily budget. That is a hard line you set on purpose. If you want a softer control, the approval gate is the better fit, because it pauses the agent at specific steps instead of ending the run.

What about Claude Desktop?

The Claude Desktop connector is paste-one-URL and can only notify and ask. It has no hooks, so it cannot enforce a budget stop. Enforced limits are a CLI-hook feature, so use one of the CLI-hook agents for that.

Set the budget once and gate the steps that spend, and the bill stops being a surprise. You can see what the paid agent plans include on pricing, or read the broader case for putting a control layer in front of your AI agents.

AG
Aadil Ghani
Founder, Pushary

Building Pushary so an AI agent can reach you on your phone and wait for a yes before it does something you would not want.

Read next

Guides

What an AI agent audit log should capture for teams and compliance

The fields a coding-agent audit record needs to be worth keeping, and the honest line on what GDPR-aligned and self-assessed actually means.

Jun 27, 20265 min readAadil Ghani
Guides

Who is accountable when an AI agent makes a mistake?

An agent has no accountability of its own. The human who ran it owns the outcome, which is why a record of who approved what matters.

Jun 25, 20264 min readAadil Ghani
Guides

How to run multiple AI agents at once without losing track

The workflow and the board for running concurrent agent sessions without losing track of which one needs you.

Jun 22, 20264 min readAadil Ghani

Get a push the moment your agent needs you

Approvals, done alerts, and a kill switch for Claude Code, Codex, Cursor, and the rest. It takes a couple of minutes to set up.