The Case for a Small System Prompt

Every AI coding agent begins each turn the same way: it sends a system prompt. This is the block of text that tells the model who it is, what tools it has, and how to behave. It's invisible to you, but it's there on every single request, shaping every response and costing tokens every time.

Most agents treat the system prompt as a kitchen sink. Claude Code, Cursor's agent, and others ship system prompts that run to thousands of tokens — exhaustive tool instructions, behavioral rules, formatting conventions, edge-case handling, the works. There's a logic to it: front-load everything the model might need, and you get consistent behavior.

Elyra takes the opposite bet. Its system prompt is small, assembled on the fly, and built around a simple idea: don't tell the model what it doesn't need to know right now.

Here's why that's not just a stylistic choice — it's a measurable efficiency win.

The base prompt is tiny

Elyra's core system prompt, stripped of dynamic parts, is about this:

You are an expert coding assistant operating inside elyra, a coding agent harness. You help users by reading files, executing commands, editing code, and writing new files. Available tools: read: Read file contents bash: Execute a bash command edit: Edit a single file using exact text replacement write: Create or overwrite files Guidelines: Prefer grep/find/ls tools over bash for file exploration Be concise in your responses Show file paths clearly when working with files

Current working directory: /your/project

That's the spine. A few hundred tokens. Compare that to the multi-thousand-token manifestos other agents lead with, and the difference is stark on the very first request — and it compounds, because the system prompt is resent on every turn of every conversation.

Tools live in the right channel

Notice what's not in that prompt: the detailed tool schemas. There's no paragraph explaining every parameter of the edit tool, no examples of valid bash invocations baked into the prose.

That's deliberate. Modern LLM APIs accept tools as a structured tools parameter — a proper, typed channel separate from the system prompt. Elyra puts the full schemas there, where they belong, and keeps only a one-line snippet in the prompt text.

The kitchen-sink approach often duplicates tool documentation in the prose and the schema. That's the same information, paid for twice, on every request. Elyra says it once, in the right place.

Only what's active appears

This is where Elyra's prompt gets genuinely clever. It's not a static file — it's assembled per session based on what's actually available.

The guidelines are conditional. From the actual code:

if (hasBash && (hasGrep || hasFind || hasLs)) {
addGuideline("Prefer grep/find/ls tools over bash for
file exploration (faster, respects .gitignore)");
}
if (tools.includes("skill_write")) {
addGuideline("After solving a genuinely hard problem...
use the skill_write tool to save it as a skill.");
}

If you're running with a restricted toolset that has no grep, the "prefer grep" guideline simply isn't there. If the skill_write tool isn't active, the model never reads instructions about saving skills. The prompt only ever contains advice that's relevant to the tools in front of it.

A static system prompt can't do this. It has to describe every tool and every behavior unconditionally, because it was written once and shipped to everyone. Elyra's prompt is tailored to the exact session you're in.

Skills use progressive disclosure

Here's the efficiency move I find most elegant.

Elyra has skills — markdown files that teach the agent how to do specific things. A project might have a dozen or more. The naive approach would be to paste all of them into the system prompt so the model "knows" them.

Elyra doesn't. It includes only the name, description, and file path of each skill:

<available_skills>
<skill>
<name>stripe-webhook-verification</name>
<description>Use when verifying Stripe webhook
signatures behind a proxy.</description>
<location>/project/.elyra/skills/stripe-webhook-verification/SKILL.md</location>
</skill>
<skill>
<name>railway-healthcheck-grace-period</name>
<description>Use when the Railway load balancer kills
the container during boot.</description>
<location>/project/.elyra/skills/railway-healthcheck-grace-period/SKILL.md</location>
</skill>
</available_skills>

Seventeen skills cost you seventeen short lines, not seventeen full documents. The model reads the descriptions, decides which skill matches the task, and loads the full content on demand with the read tool — but only when it's actually relevant.

This is the difference between handing someone a library catalog versus dumping every book on their desk. The catalog is enough to find what you need, and it's a thousand times lighter.

Documentation by reference, not by inclusion

The same principle applies to Elyra's own docs. When you ask the agent about Elyra's SDK or extension API, it doesn't have thousands of tokens of documentation pre-loaded. The prompt just tells it where the docs live:

- When asked about: extensions (docs/extensions.md), themes
(docs/themes.md), skills (docs/skills.md), TUI components
(docs/tui.md)...

Always read elyra .md files completely and follow links

Paths, not content. The model reads the relevant doc only when the conversation calls for it. Ninety-five percent of sessions never touch Elyra's internals, so why pay to describe them every time?

The prompt is cache-stable

This one is subtle but it directly hits your bill.

Every major provider now offers prompt caching: if the prefix of your request matches a previous one, you pay a fraction of the cost for the cached portion. The catch is that the prefix has to be stable — change one token near the top and the cache misses.

Elyra's system prompt is deliberately structured to stay stable. There's a revealing comment in the code:

// Add working directory last (date is injected via
// transformContext for cache stability)

The current date — which obviously changes — is kept out of the system prompt and injected separately downstream. The system prompt itself contains nothing volatile, so its cache entry stays valid across turns and across sessions. A system prompt with a timestamp baked into the top would bust the cache on every new day, quietly costing you full price on content that never changed.

The context is also layered in a fixed order — stack info, then project context, then codebase memory, then skills, then working directory. Predictable structure means predictable caching.

Why this adds up

Put it together and the efficiency comes from several directions at once:

Fewer tokens per request. The system prompt is sent on every turn. A smaller one is cheaper on turn one and every turn after.

Better cache hit rates. A stable, volatile-free prefix means the provider's prompt cache actually works, cutting both cost and latency.

Less diluted attention. Models don't have infinite focus. A prompt that's ninety percent irrelevant instructions for this particular task makes the model work harder to find the ten percent that matters. A focused prompt points the model straight at the job.

Provider portability. Elyra runs on thirty-plus models across many providers. A system prompt tuned with model-specific tricks and quirks would behave inconsistently across them. A lean, general prompt travels well — it works the same whether you're on Opus 4.8, GPT-5, or a local model.

The honest tradeoff

I won't pretend the big-prompt approach is wrong. It's a real design choice with real benefits. A thousand-token system prompt front-loads consistency: the model behaves the same way every time because every behavior is spelled out. For a product serving millions of users who can't be trusted to configure anything, that predictability has value.

Elyra makes a different bet. It trusts that strong models, good tool design, and on-demand context loading produce better results than an exhaustive rulebook — and that the token savings are worth it. The prompt says what's needed, points to where the rest lives, and gets out of the way.

The result is an agent that costs less to run, caches better, stays sharp on the task in front of it, and behaves consistently across every model you might point it at. Not because it knows less — but because it knows when to know things.

A good system prompt isn't the one that says the most. It's the one that says exactly enough.