The agent that knows what things cost

Last release we gave the agent a wallet — /goal --budget put a spending cap on autonomy. This release goes further: 0.9.9 teaches the whole agent to think about money. It rations its budget like a road-tripper watching the fuel gauge, explains which model it's choosing and why, tells you your burn rate, and even fact-checks its own model registry against reality.

We're calling it the cost-aware release. It started, as good releases often do, with an honest look in the mirror.

First, a confession

Elyra has had smart routing for a while — the opt-in feature that picks cheap models for simple turns and powerful ones for hard work. Before this release, we audited it properly. The verdict: good idea, flawed execution. Four real problems:

It measured conversation length in raw messages — but every tool result is a message, so almost every session "became long" within three turns and escalated permanently to the most expensive tier. The cost-saving feature was quietly choosing the priciest model, always.
It never checked image support. A conversation with screenshots could be routed to a text-only model — and break.
Within a tier it picked the model with the biggest context window as "best value," which could crown a slow 1M-context flagship as your "fast" model. Exactly backwards.
Every model switch happened silently. You'd look up and be on a different model with no explanation.

All four are fixed in 0.9.9, with a 30-test suite to keep them fixed. We could have patched quietly. We'd rather tell you — because the trust this feature needs is exactly the kind you don't get by hiding the repair work.

You pick the models, it picks the moment

The biggest trust upgrade: you no longer have to accept the router's taste in models. Pin one per tier in ~/.elyra/agent/settings.json:

{
  "smartRouting": true,
  "smartRoutingModels": {
    "fast": "anthropic/claude-haiku-4-5",
    "balanced": "anthropic/claude-sonnet-4-6",
    "powerful": "anthropic/claude-fable-5"
  }
}

The division of labor becomes clean: the heuristic decides the tier, you decide the model. And every switch now announces itself as a dim line in the chat:

smart routing: fast — read-only tools with short user message → Claude Haiku 4.5 (was Claude Fable 5)

No more mystery. If you're ever curious what it's thinking, ask:

/route

Smart Routing
Next turn tier: fast — read-only tools with short user message
▶ fast:     Claude Haiku 4.5 (pinned)
balanced: Claude Sonnet 4.6 (pinned)
powerful: Claude Fable 5 (pinned)

The headline: routing that rations

Here's where it gets genuinely new. Set a budgeted goal:

/goal npm test --budget 2.50

The loop already stops at the cap — that was 0.9.8. What's new is what happens on the way there. The router now knows how much of the budget is spent, and it adapts:

Under 75% spent: full freedom — hard turns get the powerful model.
Over 75%: capped at balanced. The heavy thinking should be done by now.
Over 90%: fast tier only. Finishing touches on a budget.

smart routing: balanced — complex keywords in user message (capped to balanced: 78% of goal budget spent) → Claude Sonnet 4.6

Think of it as how you'd actually run a road trip: floor it on the open highway early, coast into the gas station. The agent spends its expensive reasoning when the problem is hardest and the budget freshest — and rations as funds run low, instead of hitting the wall at full price. Nobody else's agent does this.

Know your burn rate

/cost has always shown what a session cost. Now it shows where you're heading:

Estimated cost: $1.8420
Burn rate: $3.12/hour at the current pace

A small line with a real effect: spend becomes something you steer by, not something you discover at the end of the day. Pair it with a budgeted goal and you have a complete cost cockpit.

Trust, but verify: `elyra doctor models`

One more for the skeptics (we're among them). Model registries describe what providers claim — but providers retire models, change prices, and shift behavior. We learned this the hard way recently when a registry said a Bedrock model used one thinking mode and reality disagreed.

So now you can check, straight from the terminal:

elyra doctor models --provider anthropic

# Model Probe Report
Probed: 2 | OK: 1 | Failed: 1 | Mismatches: 0
¶Results

[anthropic] claude-3-haiku-20240307 -- FAILED (487ms): 404 not_found_error
[anthropic] claude-haiku-4-5 -- OK (893ms), thinking(minimal): observed

It makes minimal real calls to the cheapest models per provider and compares observed behavior — availability, reasoning, thinking mode — against what the registry claims. True story: during its own development, the very first run caught that an old Haiku model in our registry had been retired by Anthropic. The tool found real drift before it was even released. It exits non-zero on failures, so you can drop it straight into CI.

How to upgrade

npm install -g @elyracode/coding-agent

Or from a session: /update — which, since last release, will honestly tell you if there's nothing to do.

The short version

Budget-aware routing — tiers cap as your /goal budget depletes; spend early, ration late.
/route — see exactly what the router will do and why.
smartRoutingModels — pin your model per tier; switches are never silent.
Burn rate in /cost — $/hour at your current pace.
elyra doctor models — live-verify providers against the registry, CI-ready.
Fixed — the four routing flaws above, with tests.

An agent you can leave alone is only half the goal. An agent you can leave alone with your credit card — that's the bar this release aims for. Set a budget, pin your models, glance at /route when curious, and let it manage the money like it manages the code.

Happy building.