by.waclaw.online / agent-operator / 04

Anatomy of an Orchestration Tool

Part 4 of 9 build — a concrete reference stack, and what every tool must expose to be a good citizen of the belt. We describe structure and contracts, not full source.

A reference stack (one opinionated choice)

To keep the guide concrete we pick one stack and stick with it. This is for clarity, not orthodoxy — every choice here has good alternatives, and the pattern is indifferent to the language.

Concern	Our choice	Why
Language	Python 3.11+	Ubiquitous, great HTTP/SDK ecosystem, readable. Go, Node, or Rust work just as well.
CLI framework	Typer (or Click)	Generates `--help` from type hints and docstrings — self-documentation almost for free.
Packaging	One installable package, many entry-point commands	Ships as a single `pipx install`; each tool becomes a command on `PATH`.
Secret storage	`keyring` → OS keychain	Tokens never touch disk in plaintext or the agent's context (Part 3).
Config	`~/.config/ebay-operator/config.toml`	Non-secret settings (store ID, defaults). Secrets stay in the keychain.
Output	Human text by default, `--json` on demand	Readable for you, parseable for the agent.
The agent	Claude Code (illustratively)	Any shell-capable agent — Codex, Pi — works identically; nothing below is agent-specific.

Done for simplicity. We commit to Python + Typer + keyring so the examples are concrete and consistent. If your team lives in TypeScript, or you prefer a Go binary you can drop on any machine, or you already have an internal SDK — use that instead. The contract a tool must honor (below) is what matters; the implementation language does not.

The repository layout

The whole operator lives in one version-controlled repository. This is the shareable unit (Part 6):

ebay-operator/
├── soul.md                 # the agent's operating instructions (Part 5)
├── README.md               # human setup: install, auth login
├── pyproject.toml          # declares every tool as a console entry point
├── src/ebay_operator/
│   ├── _client.py          # shared: auth, token refresh, rate limiting
│   ├── _policy.py          # shared: hard limits (max refund, etc.)
│   ├── orders.py           # ebay-list-orders, ebay-get-order, ...
│   ├── messages.py         # ebay-list-messages, ebay-reply-message
│   ├── returns.py          # ebay-list-returns, ebay-refund, ...
│   └── pricing.py          # ebay-reprice, ...
├── skills/                 # reusable procedures (Part 6)
│   ├── morning-routine.md
│   └── handle-not-as-described.md
└── tests/                  # each tool is unit-tested

Two ideas pay off immediately. First, shared internals (_client.py, _policy.py) mean authentication and policy are written once and every tool inherits them — a refund limit can't be enforced in one tool and forgotten in another. Second, the repo is the artifact you share, version, and review: tools, instructions, and skills travel together.

The contract every tool exposes

Rather than show source, here is the contract — what the agent (or you) sees. This is what actually matters, because it is the interface the agent reasons about.

1. A self-describing `--help`

$ ebay-refund --help

Issue a refund against an order.  [WRITE — changes the world]

Refunds money to the buyer via the eBay API as the
authenticated seller. Subject to policy limits in _policy.py:
refunds above $100 require --confirm and will otherwise refuse.

Arguments:
  --order TEXT       Order ID to refund.            [required]
  --amount FLOAT     Amount in account currency.    [required]
  --reason TEXT      Reason code: item-not-received |
                     not-as-described | buyer-remorse [required]
  --confirm          Required for refunds over the policy limit.
  --dry-run          Print what would happen; do not call eBay.
  --json             Emit a machine-readable result.

The help text states, in the first line, that this is a WRITE tool and what it touches. An agent reading this knows it must be careful, knows the reason codes, and knows the confirmation rule before it ever runs the command.

2. A dry-run that previews

$ ebay-refund --order 4471 --amount 18.00 --reason not-as-described --dry-run

DRY RUN — no refund issued.
Would refund $18.00 to buyer of order 4471 (reason: not-as-described).
Within policy limit ($100). No --confirm required.

3. Honest, structured output

$ ebay-refund --order 4471 --amount 18.00 --reason not-as-described --json
{"action": "refund", "order": "4471", "amount": 18.00,
 "currency": "USD", "status": "completed", "refund_id": "RF-90213"}

4. Refusal when policy says no

$ ebay-refund --order 9002 --amount 180.00 --reason buyer-remorse
ERROR: refund of $180.00 exceeds auto-approve limit ($100.00).
Re-run with --confirm to override, or escalate to a human.
exit status: 2

This last behavior is the crux. The limit is not a suggestion in soul.md that the agent might forget under a long context — it is enforced in code, in _policy.py, and the tool refuses. soul.md tells the agent to escalate rather than blindly add --confirm; the tool guarantees that even a confused agent cannot quietly overspend. Belt and suspenders: policy in the prose and in the code.

The schematic shape of a tool

We deliberately stop at the skeleton — the real body is provider-specific and ages quickly. Conceptually each write tool is the same five steps:

def refund(order, amount, reason, confirm=False, dry_run=False, json=False):
    1. load credentials   →  _client.get_token()      # from keychain
    2. check policy        →  _policy.check_refund(amount, confirm)
    3. if dry_run: print the plan and return          # no side effect
    4. perform the action  →  client.post(...refund...)  # the only write
    5. report the result   →  human text, or JSON if --json

flowchart TB A([Agent runs a write tool]) --> C[Load token from OS keychain] C --> P{Within policy?} P -->|no| R["Refuse: exit non-zero,<br/>explain why"] R --> ESC[Agent escalates per soul.md] P -->|yes| D{"--dry-run?"} D -->|yes| PL["Print the plan<br/>(no side effect)"] D -->|no| ACT["Perform the action<br/>the one API write"] ACT --> OUT["Report result<br/>text or --json"]

Every write tool follows the same five steps. The policy gate and dry-run branch are what make the agent's writes safe to trust.

Read tools are the same minus the policy and dry-run branches. Once you have written two or three this way, the rest are mechanical — which is exactly why the durable deliverable of this guide is the catalog of tools and their contracts (Part 8), not their interchangeable bodies.

We now have a fence of well-behaved tools. Part 5 gives the agent the judgment to use them well.

Anatomy of an Orchestration Tool

A reference stack (one opinionated choice)

The repository layout

The contract every tool exposes

1. A self-describing --help

2. A dry-run that previews

3. Honest, structured output

4. Refusal when policy says no

The schematic shape of a tool

1. A self-describing `--help`