To limit how many tokens an AI agent like Claude Code burns, set a daily budget that stops the session when the cost crosses it, and put an approval gate in front of the expensive steps so the agent cannot loop for hours while you are away. A warning flag tells you the number is high after the fact. A hard stop and a human gate keep it from climbing in the first place.

Key takeaways

A max-budget flag is a warning, not a brake. The session keeps going.
Cost guard reads the running cost from the agent's transcript and kills the session at a daily budget you set.
An approval gate stops a loop from spending unwatched, because the agent has to wait for you on the steps that matter.

Why the built-in flags do not cap the bill

Most agents expose some setting that looks like a budget. A max tokens value, a context limit, a warning when a run gets expensive. These are useful, but they are bounds on a single request or a notice you read afterward. None of them watch a whole session and pull the plug.

The bill comes from the thing those flags do not cover: a long unattended run. You hand the agent a task, walk away, and it loops. It retries, re-reads the same files, tries a slightly different approach, and keeps calling the model. By the time you look, the run is hours deep and the cost is already spent. Nothing in the run could force it to stop.

Step 1: set a budget that actually kills the session

Pushary's cost guard reads the cost of a session straight from the agent's transcript. Agents like Claude Code write a transcript as they work, and it carries the token usage and cost of the run. Cost guard tracks that running total against a daily budget you configure, and when the session crosses the budget, it is stopped.

That stop is the same mechanism behind the kill switch. The difference is the trigger. A kill switch is you, on your phone, pulling the plug. Cost guard is a number deciding for you, so a runaway loop cannot burn the whole day before you check. There is a fuller walkthrough in the post on killing a session at a daily budget.

Two honest notes on scope. This works where the agent writes a transcript Pushary can read, the CLI-hook agents (Claude Code, Codex, Gemini CLI, Cursor, Hermes). And cost guard stops the session, it does not refund spend that already happened. It caps the worst case. It cannot claw back what already ran.

Step 2: gate the expensive steps so a loop cannot run unwatched

A budget catches the runaway after it has already spent most of the money. The earlier control is an approval gate. With a permission policy, the agent stops and asks before it runs the steps you care about, and waits for your yes or no. While it waits, it is not looping and not spending.

That gate is also what keeps an agent from grinding for hours on its own. If a real task needs a deploy, a destructive command, or anything you flagged, the agent pauses there and pushes the question to your phone. You answer from the lock screen. The token meter does not move until you decide.

The policy matches on tool arguments, not just the tool name, so cheap, safe work runs free and you only gate the steps worth a token spend. Reads stay automatic: the read-only safe floor that auto-approves proven read-only commands like cd, ls, cat, and git status was decided from 1,721 real production questions, so the gate does not nag you about the cheap stuff. The policies docs cover how exact, prefix, and tool rules resolve.

Step 3: keep the receipt

Every stop and every approval lands in your history with a reason. When cost guard kills a session, that is an event you can read later. When you deny an expensive step, that decision is recorded too, along with where you answered it. If you answer to anyone on spend, the reason a run cost what it cost is in the record, not in your memory of the afternoon. This sits in the agent control panel, next to the policies and the approval flow.

How to stop an AI agent from running up your token bill

Why the built-in flags do not cap the bill

Step 1: set a budget that actually kills the session

Step 2: gate the expensive steps so a loop cannot run unwatched

Step 3: keep the receipt

Common questions

Does a token limit work with Claude Code specifically?

Will the budget cut off a run mid-edit?

What about Claude Desktop?

Read next

What an AI agent audit log should capture for teams and compliance

Who is accountable when an AI agent makes a mistake?

How to run multiple AI agents at once without losing track

Get a push the moment your agent needs you