by.waclaw.online / agent-operator / 02

Why CLI Tools, Not Raw API Calls

Part 2 of 9 — the deterministic tool belt, and why it is the load-bearing wall of the whole design.

The tempting shortcut

An obvious objection arrives early: modern agents can already make HTTP requests. Why not just hand the agent the API documentation and a credential and let it call the endpoints directly? No tools to build, no scripts to maintain.

You can do this. It works in a demo and fails in production, for reasons that are worth understanding because they explain the entire architecture.

What goes wrong with raw API access

The alternative: small tools as a contract

Instead, you wrap each meaningful action in a small command-line program with a stable name, documented arguments, and predictable output:

ebay-list-orders --status awaiting-shipment --json
ebay-get-message --id 88231
ebay-reply-message --id 88231 --body-file reply.txt
ebay-refund --order 4471 --amount 18.00 --reason item-not-received

Now the division of labour is clean. The agent decides that order 4471 should be refunded and how much. The tool decides how to call eBay correctly, enforces that the amount is within policy, requires a reason code, refuses if a flag is missing, and prints exactly what it did. The agent reasons; the tool executes deterministically.

This is the oldest good idea in computing wearing new clothes. It is the Unix philosophy — small programs that do one thing well and compose through text — applied to an agent instead of a shell pipeline. The agent is the shell, except it can read a manual and make judgment calls.

The properties a good operator tool has

PropertyWhy it matters to an agent
Single purposeOne verb, one object. ebay-refund, not ebay-manage-order with a mode flag. Easy for the agent to choose correctly.
Self-documenting--help explains what it does, its arguments, and its side effects, in plain language the agent can read.
Structured outputA --json mode so the agent parses results reliably instead of scraping prose.
Read/write honestyThe name and help text make clear whether the tool only reads or also changes the world. Read tools are safe to run freely; write tools are not.
Dry-run for writes--dry-run prints what would happen without doing it — lets the agent (and you) preview before committing.
Policy enforcementHard limits live in the tool, not in a hope that the agent behaves. The tool refuses out-of-policy requests with a clear error.
Loud, structured errorsOn failure it exits non-zero and prints why, so the agent can react instead of assuming success.
Idempotency where possibleRunning "acknowledge order 4471" twice should not send two messages. Tools that can be safely retried are tools an agent can use confidently.

Why CLI specifically, and not, say, MCP?

A fair question, since the agent ecosystem has richer integration points than shelling out. CLI tools earn their place for three reasons:

Build note. None of this forbids also exposing the tools through MCP or a function-calling interface later — a CLI and an MCP server can share the same underlying library. The point is that the contract should be expressible as plain commands first. If you can drive your operator from a terminal, any agent can drive it too.

The mental model to carry forward

The tool belt is the fence. Everything the agent can do to the outside world, it does through a tool you wrote. If a capability isn't in the belt, the agent doesn't have it. Adding a tool is a deliberate act of granting power.

This is what makes the whole pattern safe enough to trust. The agent's autonomy is bounded not by its good intentions but by the finite, reviewable set of levers in front of it. In Part 3 we look at the most important property those levers carry: whose identity they act under.