Pushary
Blog
Guides

The four levels of AI agent oversight: notify, approve, policy, audit

The four levels of human oversight for AI agents: notify, approve, policy, audit. Each one defined, and why policy and audit are the durable layer.

AG
Aadil Ghani
Founder, Pushary
Jun 6, 20266 min read
Share

Human oversight of AI agents comes in four levels, each doing a job the one below it cannot. Notify tells you what happened. Approve stops the agent and waits for your yes or no. Policy decides routine cases without you. Audit records every decision so you can prove what ran and why. Notify is now table stakes because most agents ship it natively. Policy and audit are the layer that lasts.

Key takeaways

  • The four levels are notify (awareness), approve (a gate in the moment), policy (rules that decide without you), and audit (a durable record after the fact).
  • They stack. Each level assumes the one below it. Audit is only worth reading if approve and policy made real decisions worth recording.
  • Notify is parity now that agents ping natively. The work that holds value is policy plus audit, which is why Pushary leads with those.

Level 1: notify

Notify is the floor. The agent finishes a task, hits an error, or reaches a step it wants to flag, and you get a push notification on your phone. No decision is asked of you. You just know.

This level answers one question: is the agent still working, and did anything go wrong. For a long run that you walked away from, that is most of what you want. You get a ping when the build passes, another when it crashes, one more if it sits idle too long.

Notify used to be the whole product for a lot of tools. It is not anymore. Claude Code, Codex, Cursor, and others ship native notifications now, so a bare ping is parity, not an edge. The reason to still care about delivery is reliability and reach. A notification should reach your phone with the app closed, survive a flaky delivery channel, and land in Slack or the native iOS and Android app too. Pushary handles that part, but notify on its own is not where the value sits.

Level 2: approve

Approve adds a gate. The agent reaches a step that needs a human, stops, and waits. You get a question on your phone with the context, and the agent does not move until you answer. Yes runs it. No blocks it. With Pushary you can also deny with a reason, and the agent reads that reason back and adjusts instead of just failing.

This is human-in-the-loop in the literal sense. The agent wants to force push, drop a table, send an email, spend money, and the decision pauses for you. Approval cards flag the dangerous asks with a risk signal so you can tell a routine confirm from something that needs a careful read.

Approve is where enforcement matters. A notification you can ignore. A gate has to actually hold the agent. That requires a hook in the agent that intercepts the tool call before it runs, which Pushary installs for Claude Code, Codex, Gemini CLI, Cursor, and Hermes through a CLI. Claude Desktop has no hook to intercept tools, so its paste-one-URL connector can notify and ask but cannot enforce a gate. That is a real limit, not a detail to gloss over: enforced gating lives where there is a hook.

Approve only works if the gate is real. A tool that "notifies you for approval" but cannot actually block the action is still level 1 with extra words. Check whether the agent waits, or just tells you after.

Level 3: policy

Approving every action by hand does not scale. An agent runs hundreds of commands in a session, and most of them are reads. If each one pings you, you stop reading the pings. Policy fixes that by deciding the routine cases without you.

A policy is a set of rules that resolve a tool call to allow, ask, or deny before it reaches you. The useful version matches on tool arguments, not just the tool name, with exact, prefix, and tool precedence. That lets you auto-allow git status while still gating git push, even though both are git. Pushary ships a read-only safe floor on by default that auto-approves proven read-only commands like cd, ls, cat, git status, git log, git diff, and grep. That floor was not guessed. It came from 1,721 real production questions agents sent to humans, keeping only the commands that cannot mutate anything.

Policy is the difference between oversight you can sustain and oversight you abandon. With a good policy, the only things that reach your phone are the ones that genuinely need a person. Permission Autopilot mines your own approve and deny history into one-tap rule suggestions, so the policy gets tighter the more you use it. This is the first level where you are managing the agent at scale instead of babysitting it. The permission policy is where that lives.

Level 4: audit

The first three levels happen in the moment. Audit is what is left afterward. Every decision, who or what made it, what the agent did, and what changed, recorded so you can answer questions later.

A receipt per session with structured tool-action metadata. A what-changed view. An answer-source record that says whether each decision came from a phone, the web, or Slack. Exportable history, a Team digest, a weekly recap email. This is the level that matters when someone asks what happened, when you need to review an overnight run you slept through, or when a reviewer wants proof that a human approved the risky step.

Audit is the level vendors are slowest to build, because it has no payoff in the demo. It pays off weeks later. It is also the layer that turns "the agent did something" into "here is exactly what it did, when, and who cleared it." For teams and anyone with a compliance question, that record is the point. To be straight about scope: Pushary is GDPR-aligned but self-assessed, with no SOC2 or ISO certification. The audit trail gives you the record. It does not give you a certification you do not have. The audit trail covers what is captured.

How the levels stack

The four levels are not alternatives. They build on each other.

LevelQuestion it answersWhen it acts
NotifyWhat is the agent doingAfter the fact, no decision
ApproveShould this runIn the moment, blocks
PolicyWhich cases need me at allBefore the moment, automatically
AuditWhat happened and who decidedAfter, durable record

Notify without approve is a feed you watch. Approve without policy is a job you cannot keep up with. And policy with no audit means the decisions still happen, but nothing remembers them. You want all four, with policy and audit carrying the weight once notify is parity and approve is wired in.

Common questions

What are the levels of human oversight for AI agents?

Four. Notify gives you awareness with no decision. Approve stops the agent and waits for your yes or no. Policy uses rules to decide routine cases without you. Audit records every decision as a durable trail. They stack, so each level assumes the one beneath it.

Is a notification enough oversight on its own?

No. A notification tells you something happened but cannot stop it. For anything risky or irreversible you need approve, a real gate that blocks the action until a human answers. Notify is awareness, approve is control, and most agents now ship notify natively, so it is no longer the differentiator.

Why are policy and audit the durable layer?

Because notify became parity once agents started shipping native notifications. Policy is what makes oversight scale past a handful of approvals a day, and audit is the record that survives the session. Both keep their value as agents get more capable, while a bare ping does not.

Can every agent enforce all four levels?

No. Enforced approval needs a hook to intercept the tool call, which Pushary provides for Claude Code, Codex, Gemini CLI, Cursor, and Hermes through a CLI. Claude Desktop has no hooks, so its connector can notify and ask but cannot enforce a gate. Notify and audit work broadly; enforced approve and policy need the hook.

If you want all four levels in one place, that is what Pushary is built around: notify and approve on your phone, a permission policy that decides the routine cases, and an audit trail behind every run. Start from the quickstart, see what it covers across every agent you run, and check pricing when you are ready.

AG
Aadil Ghani
Founder, Pushary

Building Pushary so an AI agent can reach you on your phone and wait for a yes before it does something you would not want.

Read next

Guides

What an AI agent audit log should capture for teams and compliance

The fields a coding-agent audit record needs to be worth keeping, and the honest line on what GDPR-aligned and self-assessed actually means.

Jun 27, 20265 min readAadil Ghani
Guides

Who is accountable when an AI agent makes a mistake?

An agent has no accountability of its own. The human who ran it owns the outcome, which is why a record of who approved what matters.

Jun 25, 20264 min readAadil Ghani
Guides

How to run multiple AI agents at once without losing track

The workflow and the board for running concurrent agent sessions without losing track of which one needs you.

Jun 22, 20264 min readAadil Ghani

Get a push the moment your agent needs you

Approvals, done alerts, and a kill switch for Claude Code, Codex, Cursor, and the rest. It takes a couple of minutes to set up.