Part 2 of 9 — the deterministic tool belt, and why it is the load-bearing wall of the whole design.
An obvious objection arrives early: modern agents can already make HTTP requests. Why not just hand the agent the API documentation and a credential and let it call the endpoints directly? No tools to build, no scripts to maintain.
You can do this. It works in a demo and fails in production, for reasons that are worth understanding because they explain the entire architecture.
ebay-refund --order 4471 --amount 18.00" is.Instead, you wrap each meaningful action in a small command-line program with a stable name, documented arguments, and predictable output:
ebay-list-orders --status awaiting-shipment --json
ebay-get-message --id 88231
ebay-reply-message --id 88231 --body-file reply.txt
ebay-refund --order 4471 --amount 18.00 --reason item-not-received
Now the division of labour is clean. The agent decides that order 4471 should be refunded and how much. The tool decides how to call eBay correctly, enforces that the amount is within policy, requires a reason code, refuses if a flag is missing, and prints exactly what it did. The agent reasons; the tool executes deterministically.
This is the oldest good idea in computing wearing new clothes. It is the Unix philosophy — small programs that do one thing well and compose through text — applied to an agent instead of a shell pipeline. The agent is the shell, except it can read a manual and make judgment calls.
| Property | Why it matters to an agent |
|---|---|
| Single purpose | One verb, one object. ebay-refund, not ebay-manage-order with a mode flag. Easy for the agent to choose correctly. |
| Self-documenting | --help explains what it does, its arguments, and its side effects, in plain language the agent can read. |
| Structured output | A --json mode so the agent parses results reliably instead of scraping prose. |
| Read/write honesty | The name and help text make clear whether the tool only reads or also changes the world. Read tools are safe to run freely; write tools are not. |
| Dry-run for writes | --dry-run prints what would happen without doing it — lets the agent (and you) preview before committing. |
| Policy enforcement | Hard limits live in the tool, not in a hope that the agent behaves. The tool refuses out-of-policy requests with a clear error. |
| Loud, structured errors | On failure it exits non-zero and prints why, so the agent can react instead of assuming success. |
| Idempotency where possible | Running "acknowledge order 4471" twice should not send two messages. Tools that can be safely retried are tools an agent can use confidently. |
A fair question, since the agent ecosystem has richer integration points than shelling out. CLI tools earn their place for three reasons:
The tool belt is the fence. Everything the agent can do to the outside world, it does through a tool you wrote. If a capability isn't in the belt, the agent doesn't have it. Adding a tool is a deliberate act of granting power.
This is what makes the whole pattern safe enough to trust. The agent's autonomy is bounded not by its good intentions but by the finite, reviewable set of levers in front of it. In Part 3 we look at the most important property those levers carry: whose identity they act under.