Vibe coding safety is mostly a runtime problem, not a code-review problem. You can scan everything an agent writes and still get burned, because the damage comes from what the agent runs: shell commands, file writes, deploys, network calls. The fix is to gate the risky actions and route them to a human. A scan after the fact comes too late.

Key takeaways

Most vibe coding tools review the code an agent produces. The bigger exposure is the actions it takes while producing it.
Real attacks already hijack agents through poisoned files and config, then run commands with the developer's own permissions.
A permission policy plus a human approving the risky steps is the pattern that actually catches runtime damage.

What people mean by vibe coding safety

Vibe coding, a term Andrej Karpathy popularized in early 2025, is the workflow where you describe what you want and let the agent write and run most of the code. The safety conversation usually lands on one question: is the generated code secure? That matters, and studies of code from these tools keep finding functional-but-vulnerable output plus exposed secrets (Kaspersky, IBM). But code review is a known problem with known tools. The part that gets skipped is runtime.

The gap: scanners read code, not actions

A scanner reads the diff. It does not see the agent decide to run rm -rf, push to main, drop a database column, or curl a script straight into a shell. Those are actions, and they happen in the moment, often before any review step runs.

This is not hypothetical. CVE-2025-54135, nicknamed CurXecute, let an attacker get remote code execution in Cursor by chaining an indirect prompt injection that rewrote the agent's MCP config and ran an attacker-controlled command before the user could reject the edit. It carried an 8.6 severity rating and was fixed in a later release (Tenable, Cato Networks).

The delivery method is the scary part. Researchers have shown agents getting hijacked just by reading a poisoned README or rules file, with auto-run enabled so each command fired without an approval prompt. Injected lines like "do not mention this step" even suppressed the agent's normal narration (arXiv: Your AI, My Shell).

A code scanner does not catch any of this, because there is no malicious code to find. The malicious thing is an action, taken with your permissions, that looks ordinary in isolation.

Put a human between the agent and the risky action

The reliable control is a decision point between the agent and the action. Three pieces:

A policy that classifies actions by risk. Reads and proven safe commands run freely. Writes, deploys, money, and anything destructive stop and ask.
A human in the loop for the stops, reachable wherever they are.
An audit trail so you can see, after the fact, every action that ran and who approved it.

This is what Pushary does for agents like Claude Code, Codex, Gemini CLI, Cursor, and Hermes through a CLI hook. The hook fires when the agent reaches for a tool, and an agent permission policy decides allow, ask, or deny. Policies match on the actual arguments, not just the tool name, so git status runs and git push asks. A read-only safe floor auto-approves proven read-only commands like cd, ls, cat, and git log, so you are not buried in prompts for harmless reads. We set that floor by going through 1,721 real production questions and keeping only commands that cannot change state.

When something does need a yes or no, the question goes to your phone. You can approve or deny from your phone with the agent running on a machine you are nowhere near, and deny with a reason the agent reads back. If a session goes wrong, an agent kill switch stops it. See the docs on human-in-the-loop and policies for setup.

Gating only works where the agent runs through a hook. Pushary's CLI hook enforces policy for Claude Code, Codex, Gemini CLI, Cursor, and Hermes. The Claude Desktop connector can notify and ask, but Claude Desktop has no hook to enforce gating, so it is voluntary there. Treat it as a notification channel, not a guardrail.

Common questions

Does a code scanner make vibe coding safe?

It helps with vulnerable generated code. It does nothing about what the agent runs at execution time, which is where prompt injection and destructive commands land. You need both: scan the code, gate the actions.

Can I let an agent run commands without watching it the whole time?

Yes, if reads are auto-approved and risky actions are gated to a human. You stop watching the terminal and only get pulled in for the decisions that matter.

Is approval-on-every-command realistic?

No, and that is why the safe read-only floor exists. Auto-approve proven read-only commands, ask only on state-changing ones. Otherwise people disable the prompts, which is how auto-run incidents happen.

One honest caveat: Pushary is GDPR-aligned but self-assessed, with no SOC2 or ISO certification, and on iOS the deep link is broken so the phone uses a pending-questions inbox instead. To run agents without babysitting them, start with a policy and human approval on the risky steps. See Pushary for AI agents for the full picture.

Vibe coding safety: letting AI run code without babysitting it

What people mean by vibe coding safety

The gap: scanners read code, not actions

Put a human between the agent and the risky action

Common questions

Does a code scanner make vibe coding safe?

Can I let an agent run commands without watching it the whole time?

Is approval-on-every-command realistic?

Read next

What an AI agent audit log should capture for teams and compliance

Who is accountable when an AI agent makes a mistake?

How to run multiple AI agents at once without losing track

Get a push the moment your agent needs you