Elyra · · 4 min read

Git Bisect, But for Your AI's Reasoning - Elyra v0.9.3

Elyra's /bisect borrows git's binary-search debugging and points it at the agent's own turns — finding the exact one that introduced a regression.

Git Bisect, But for Your AI's Reasoning - Elyra v0.9.3

Elyra's /bisect — finding the exact turn that broke things

There's a special kind of frustration that comes from working with an AI agent across a long session. You started with working code. Twenty turns later, something is broken. The agent edited a dozen files across those turns, and somewhere in there — you don't know where — it introduced the bug.

Your options are bad. Scroll back through twenty turns of conversation hoping to spot it. Or git diff the whole mess and read every change. Or just /rewind to some earlier point and lose all the good work that came after the bug.

Git solved this exact problem for commits decades ago: git bisect. Binary-search your history, test at each step, and it tells you precisely which commit introduced the regression. Elyra's /bisect does the same thing — but over the agent's own turns.

As far as I know, no other coding agent can do this. And the reason is interesting.

Why only Elyra can do this

git bisect works because git has a commit at every point in history. To bisect, you need a restorable snapshot at each step.

Most agents don't have that. They edit files in place; the only record of "what the code looked like at turn 12" is gone the moment turn 13 overwrites it.

Elyra is different. Before every single turn, it quietly snapshots your working tree using a git primitive (git stash create) that captures the state without disturbing anything. This is the same machinery that powers /rewind. It means Elyra has a restorable filesystem snapshot tied to every point in the conversation.

That's the missing ingredient. Once you have a snapshot per turn, bisecting over them is just binary search.

How it works

Say your test suite passes at the start of a session and fails now. You run:

/bisect npm test

Elyra treats this exactly like git bisect run:

// 1. Confirm the current state is actually broken
if ((await runCommand()) === 0) {
    // "Current state passes — nothing to bisect."
}

// 2. Confirm the earliest checkpoint was good // 3. Binary search between good (lo) and bad (hi) while (hi - lo > 1) { const mid = Math.floor((lo + hi) / 2); this.session.restoreFilesToCheckpoint(points[mid].entryId); const good = (await runCommand()) === 0; // exit 0 = good if (good) lo = mid; else hi = mid; } // points[hi] is the first bad checkpoint

It restores the working tree to a checkpoint, runs npm test, and uses the exit code: zero means the bug wasn't there yet, non-zero means it was. Binary search halves the suspects each step. Twenty turns? Five tests. A hundred? Seven.

And then it tells you exactly which turn introduced the failure:

First bad checkpoint: "Refactor the payment validation to use the new schema"
The change introduced at this turn is the likely culprit. Use /rewind to return here.

No command? No problem:

/bisect

This runs it manually — Elyra restores the files to each midpoint and asks you to test (in another terminal or by inspecting) and mark it Good or Bad.

The safety net

Repeatedly restoring your working tree is exactly the kind of thing that could lose work if it goes wrong. So bisect snapshots your current state before it starts, and restores it in a finally block no matter how the search ends — found, aborted, or errored:

const snapshot = this.session.createWorkingTreeSnapshot();
try {
// ... the whole bisect search ...
} finally {
this.session.restoreFilesToSnapshot(snapshot);  // always
}

You end exactly where you started, plus the knowledge of which turn broke things. Then one /rewind puts you back at that point to fix it properly.

Why it matters

The longer you let an agent run, the more valuable this becomes — and the more agents are being pushed toward long, autonomous sessions. But long autonomy has a cost: when something breaks, the haystack is huge.

/bisect turns "somewhere in the last hundred turns" into "this exact turn, in seven tests." It's the debugging primitive that long-running agents have been missing, and it falls out naturally from an architecture that snapshots state at every step.

/bisect npm test