
Spin Up Sandboxed Subagents with the Antigravity CLI
Google's Antigravity 2.0 CLI lets you delegate work to sandboxed subagents. A practical walkthrough of wiring it into a real dev workflow.
Key takeaways
- The agy CLI ships terminal sandboxing, credential masking, and git policy, using sandbox-exec on macOS, nsjail on Linux, and AppContainer on Windows.
- You write a goal and the orchestrator decides how to split it, spawning parallel subagents with isolated context rather than hand-written definition files.
- Review with /diff and /tasks before accepting, since the sandbox blocks bad system commands but does nothing about a subagent writing plausible wrong code.
- Google cut Antigravity quotas several times with multi-day lockouts, so stress-test your usage limits before relying on it for heavy daily work.
At I/O on May 19, Google shipped Antigravity 2.0 and, alongside it, the Antigravity CLI. The headline from the developer keynote was that you can now spin up specialized subagents to tackle complex workflows straight from your terminal, with cross-platform terminal sandboxing, credential masking, and hardened git policies baked in. That last part is what got my attention. Agent tooling that runs shell commands has been around for a while, but most of it asks you to trust it with your machine and your secrets and then hope for the best. Antigravity's pitch is that the containment is the product, not an afterthought.
I spent a few evenings running the CLI against a real project to see whether the safety story holds up in practice. This is a walkthrough of how I wired it into an actual workflow, what the sandbox actually does, and where the rough edges are. I will be honest about the launch too, because it has not been smooth.
The terminal binary is called agy. If you have used the Antigravity editor, the CLI is the same agent harness without the GUI, which means subagents, the permission model, and the MCP plumbing all carry over. A quick note before we start: this is a new product and the surface is moving fast, so where I am not certain about an exact flag I will say so rather than invent one. Check the docs at antigravity.google/docs for the current syntax.
What you'll need
- A machine running macOS, Linux, or Windows. Sandboxing is implemented differently on each (more on that below), but all three are supported.
- An Antigravity account. The free tier works for trying this out, though you will hit quota walls quickly. I will come back to that.
- A real repository to point it at. Agents are far easier to reason about when you give them a concrete task in a codebase you know, rather than a toy example.
- Git installed and configured, since the git policy features only make sense in a repo.
Installing the CLI
On macOS or Linux, the install is a single piped script:
curl -fsSL https://antigravity.google/cli/install.sh | bash
On Windows, the PowerShell equivalent:
irm https://antigravity.google/cli/install.ps1 | iex
After the binary lands, run agy install once to set up your PATH and shell integration. I am normally wary of curl | bash, and you should be too, so if that pattern bothers you, download the script, read it, and run it yourself. The behavior is the same.
Confirm the install and check which models are available to your account:
agy --version
agy models
You authenticate on first launch. Running agy with no arguments drops you into an interactive session and walks you through sign-in if you have not done it yet.
Defining a subagent for a task
Here is the part that trips people up coming from other agent tools, so it is worth being precise. In Antigravity, you do not hand-write a subagent definition file the way you might configure a persona elsewhere. The orchestrator does the decomposition. You give it a goal, it reads that goal, decides how to split the work, and spawns subagents with their own isolated context windows to handle the pieces in parallel.
So "defining a subagent" really means writing a clear enough prompt that the orchestrator picks a sensible split. For my test I used a workflow I run by hand all the time: fix a failing test, then update the docs that reference the changed behavior. That naturally decomposes into two tracks, one agent on the fix and one on the docs, which is exactly the kind of thing the parallel model is built for.
I started the session pointed at the goal:
agy "the auth middleware test in tests/auth.spec.ts is failing after the token refactor. Fix the implementation so it passes, then update any docs in docs/ that describe the old token shape."
You can also start a bare interactive session with agy and type the goal once you are in. Either way, the orchestrator comes back with a plan before it touches anything, and you can watch it decide to fan the work out. To see and manage the subagents it creates, open the agent panel with the /agents slash command inside the session. That panel is where you review what each subagent is doing and approve or reject the actions they propose.
A couple of keyboard moves that I leaned on constantly: Ctrl+J jumps focus to a subagent that is waiting on you, and Ctrl+K approves a pending action. When a tool call is proposed, plain y and n accept or reject it. Once you have two or three agents running, the ability to teleport straight to whichever one needs attention is the difference between staying in control and losing the plot.
Running it in the sandbox
This is the reason to use the CLI over a free-for-all agent. Sandboxing is on per session and the implementation is native to each OS rather than a Docker layer:
- On macOS it uses
sandbox-execprofiles. - On Linux it uses namespace isolation via nsjail.
- On Windows it runs commands inside AppContainer sessions.
You can run a session with the restrictions enabled using the sandbox flag:
agy --sandbox "run the test suite and fix whatever is red"
There is also a settings file that controls this more durably. On my machine it lives under the Antigravity config directory, and the relevant keys looked like this:
{
"enableTerminalSandbox": true,
"toolPermission": "request-review"
}
The toolPermission value maps to a set of presets. The default, request-review, asks you before each meaningful action. There are looser modes (one that proceeds automatically inside the sandbox, one that always proceeds) and a stricter one. I would not run anything but request-review or stricter until you have watched a given workflow behave a few times. You can switch presets mid-session with /permissions if a task turns out to be more hands-on or less than you expected.
One honest caveat: the sandbox is containment, not a guarantee. It restricts what shell commands can reach, but an agent operating inside it can still make a wrong edit to a file it is allowed to touch. The sandbox protects your wider system. It does not protect you from a subagent confidently doing the wrong thing inside the directory you pointed it at. Keep that distinction in mind.
Reviewing what the agents did
When the subagents finish, you review before anything is final. Inside the session, /diff shows a unified diff of every file that changed, which is the first thing I check. I read the diff the same way I would read a teammate's pull request, because functionally that is what it is. The /tasks command shows the shell execution log, so if a subagent ran your test runner or a build step you can see the actual commands and their output rather than taking the agent's summary on faith.
For my fix-and-document run, the implementation subagent produced a clean patch to the middleware, and the docs subagent caught two references to the old token shape that I would probably have missed by hand. It also rewrote a third doc paragraph that did not need touching, which is exactly why you read the diff. I rejected that hunk and kept the rest.
The safety angle: credential masking and git policy
The two features I was most curious about are the ones that are hardest to demo in a screenshot, so here is what I actually observed.
Credential masking means that when the agent reads environment, logs, or command output that contains things shaped like secrets, those values get redacted before they land in the agent's context. The practical effect is that an API key sitting in your shell environment does not get quietly hoovered into a prompt and shipped off to a model. I could not find a documented command to toggle this granularly, so treat the exact configuration surface as a check-the-docs item, but the masking behavior was on by default in my sessions.
The git policy side is more concrete. You can express rules about what the agent is allowed to run, and the natural place to harden things is a deny list for destructive shell patterns. The permissions block I used looked like this:
{
"permissions": {
"allow": ["command(git)", "command(npm test)"],
"deny": ["command(rm -rf)"]
}
}
The advice that came with this, which matches my experience, is to keep deny lists specific to your stack. It is tempting to blanket-block anything that looks dangerous, but if you deny all of curl you will break legitimate debugging, and if you deny git push --force you probably want that, but denying all of git defeats the point. Spend ten minutes thinking about what your repo genuinely never needs the agent to do, and block exactly that.
For git specifically, the thing I care about is that the agent cannot rewrite history or force-push behind my back. Putting the destructive git verbs in the deny list and keeping the permission preset at request-review got me there. Verify the exact rule syntax against the docs, since the rules file format is the kind of thing likely to change between releases.
Gotchas
A few things to go in with your eyes open about.
The quota situation is the big one, and it is the most common complaint I have seen since launch. Between late 2025 and this spring Google cut Antigravity's usage quotas several times without much warning. Developers who were running hundreds of millions of input tokens a week on the Pro plan reported hitting weekly caps at a tiny fraction of that, and the caps came with multi-day lockouts rather than the quick hourly resets you might expect. If you are evaluating this for real work, do not assume the free or even Pro quota will carry a heavy day. The pricing is also published as relative multipliers rather than concrete request counts, so budgeting precisely is genuinely hard right now. This is not a hypothetical; it is the single thing most likely to bite you.
There is also the migration angle. Google is moving Gemini CLI users over to Antigravity CLI, and the new CLI is closed source where the old one was open. If that matters to your team's tooling philosophy, factor it in.
On the sandbox, the failures I hit were mundane and fixable: network allowlist issues when a tool needed to reach something the sandbox blocked, and on macOS a couple of profile permission conflicts that traced back to System Integrity Protection. Neither was a dead end, but both cost time, so budget for some setup friction the first time you enable containment on a new machine.
Last, the obvious one that no amount of sandboxing solves: do not trust agent output blindly. The containment keeps a bad command from wrecking your system. It does nothing about a subagent that writes plausible, wrong code. Read every diff. Run the tests yourself. The whole point of the review step is that you stay the one who decides what ships.
Wrapping up
The Antigravity CLI is a reasonable piece of engineering with a frustrating rollout around it. The sandboxing, credential masking, and git policy features are real and they change the calculus of letting an agent run shell commands, which I appreciate. The subagent orchestration is genuinely useful once you get the hang of writing goals that decompose cleanly. What I cannot recommend yet is leaning on it for heavy daily work without first stress-testing your quota, because that is where people are getting stung.
If you want to try it, install it, point it at a small fix-and-document task in a repo you know, keep the sandbox on and the permissions at review, and read the diff before you accept anything. That is enough to form your own opinion, and it is the way I would introduce it to a team before trusting it with anything bigger.