a model and me

Why your AI harness needs a circuit breaker

2026-05-22T00:00:00+00:00

Multi-agent systems fail in correlated ways. One bad tool call cascades into twenty retries, and the bill arrives Monday. A pattern from electrical engineering, ported badly into software.

The first time I shipped a multi-agent harness to anything resembling production, it took out our OpenAI quota in ninety seconds. Not because the prompts were expensive — they weren't — but because one agent's tool call returned malformed JSON, the planner agent retried, and the supervisor cheerfully scheduled the same broken plan eight more times.

The fix is older than I am. It's borrowed wholesale from electrical engineering: a circuit breaker. When the same failure pattern repeats, stop. Open the circuit. Let the system cool down before you try again.¹

A naive first attempt

My first cut was an int _failures field on the harness root. After three, throw. It worked for about a day.

// don't do this
public class NaiveBreaker
{
    private int _failures = 0;

    public async Task<T> InvokeAsync<T>(Func<Task<T>> op)
    {
        if (_failures >= 3)
            throw new CircuitOpenException();
        try { return await op(); }
        catch { _failures++; throw; }
    }
}

note A counter without a half-open state isn't a circuit breaker — it's a kill switch. The system never recovers on its own.

The real pattern has three states: closed (normal), open (failing fast), and half-open (cautiously probing). The transition between them is the entire trick.

The three states

State	What's happening	Transition out
Closed	Calls flow through. Failures are counted.	Failure threshold → Open
Open	Calls fail fast without invoking the dependency.	Cooldown elapses → Half-Open
Half-Open	One probe call is allowed.	Probe succeeds → Closed, fails → Open

The math for the cooldown matters more than I expected. Linear backoff is wrong for AI harnesses — model providers tend to recover in bursts. I've been running an exponential with jitter:

\[t_{cooldown} = \min\left(t_{max},\; t_{base} \cdot 2^{n}\right) \cdot \big(1 + U(-0.2, 0.2)\big)\]

where $n$ is the number of consecutive open transitions. The jitter prevents a thundering herd when many breakers reopen simultaneously — a real failure mode if you're running parallel agents that share a downstream tool.

Implementation in C# 13

A working version, with the half-open state and an injectable clock for testing:

public sealed class CircuitBreaker(
    int failureThreshold = 5,
    TimeSpan? cooldown = null,
    TimeProvider? time = null)
{
    private readonly TimeSpan _cooldown = cooldown ?? TimeSpan.FromSeconds(30);
    private readonly TimeProvider _time = time ?? TimeProvider.System;
    private readonly Lock _gate = new();

    private int _failures;
    private CircuitState _state = CircuitState.Closed;
    private DateTimeOffset _openedAt;

    public async Task<T> InvokeAsync<T>(
        Func<CancellationToken, Task<T>> op,
        CancellationToken ct = default)
    {
        EnsureCanProceed();
        try
        {
            var result = await op(ct).ConfigureAwait(false);
            OnSuccess();
            return result;
        }
        catch when (!ct.IsCancellationRequested)
        {
            OnFailure();
            throw;
        }
    }

    private void EnsureCanProceed()
    {
        lock (_gate)
        {
            if (_state is CircuitState.Open)
            {
                if (_time.GetUtcNow() - _openedAt < _cooldown)
                    throw new CircuitOpenException();
                _state = CircuitState.HalfOpen;
            }
        }
    }

    private void OnSuccess()  { lock (_gate) { _failures = 0; _state = CircuitState.Closed; } }
    private void OnFailure()
    {
        lock (_gate)
        {
            _failures++;
            if (_failures >= failureThreshold || _state is CircuitState.HalfOpen)
            {
                _state = CircuitState.Open;
                _openedAt = _time.GetUtcNow();
            }
        }
    }
}

public enum CircuitState { Closed, Open, HalfOpen }
public sealed class CircuitOpenException : Exception;

A few notes on this version:

The Lock type is the C# 13 named lock. Reads and writes to the state fields are short, so a single lock is fine; if you're seeing contention, you've got bigger problems than the breaker.
TimeProvider is injectable so the test suite can advance time deterministically. Don't use DateTime.UtcNow directly — you'll regret it.
ConfigureAwait(false) because this is library-ish code.

tip In production, prefer Polly's ResiliencePipelineBuilder with AddCircuitBreaker. The above is for teaching — Polly handles the edges (timeouts inside the breaker, isolation between breakers, telemetry) that a hand-rolled version misses.

Tuning the thresholds

Three knobs, in order of how often I touch them:

Failure threshold. Start at 5 for chatty providers, 3 for ones you pay per call. Lower for cold paths.
Cooldown base. 10s is fine for most providers; 30s if you're seeing rate-limit-and-recover patterns.
Sliding window vs. consecutive count. Consecutive is simpler and surprisingly good. Switch to a sliding window only if you're seeing intermittent failures that should trip the breaker but don't.

warn Don't share a single breaker across logically distinct dependencies. One bad tool shouldn't blackhole the entire agent. Scope breakers to the narrowest unit that makes sense — usually (tool_id, provider).

What it doesn't fix

Circuit breakers stop cascades; they don't stop bad plans. If your agent is asking the wrong question, the breaker will dutifully stop you from asking it twenty times — and then the agent will pick the next-most-confident question and keep going. That's a separate problem, and a more interesting one. I'll write it up next.

Michael Nygard, Release It! (2007). The book that put this pattern in front of a generation of services engineers. The original Hystrix docs at Netflix are also worth reading; the project itself is retired but the concepts hold. ↩

Multi-agent orchestration is just distributed systems with worse error messages

2026-05-14T00:00:00+00:00

Every "multi-agent framework" I've used eventually rediscovers a paper from

Sometimes badly. Here's a tour of what the field has been calling new that isn't.

I spent last weekend reading the changelogs of four popular multi-agent frameworks. Three of them shipped, in the same quarter, a feature they each called something different but which was, structurally, a vector clock.¹ One of them had a tutorial blog post explaining the design as "novel." It is not novel. It is from 1978.

This isn't a complaint. It's an observation about where the field is: we are collectively re-deriving the entire distributed systems literature, with language-model latency added.

The same problems, slightly relabelled

A short table of correspondences I keep on my desk:

Multi-agent thing	Distributed systems thing
Agent loop	Event loop / actor
Tool dispatch	RPC
Shared memory / blackboard	Distributed cache
"Plan revision"	Optimistic concurrency control
Supervisor agent	Cluster coordinator (Raft, Zab, …)
"Long-term memory"	Replicated log + materialised views
Hand-off	Message passing with explicit channels

The mapping is not 1:1. Some of these have genuinely new properties because agents are non-deterministic in ways processes are not. But the shape of the problems — ordering, consistency, recovery, partial failure — is identical.

Three orderings

A worked example: ordering. Suppose two agents both update the shared blackboard concurrently. Which write wins?

Three answers, each with a long history outside AI:

Last-write-wins                   ← lossy; fine for caches, never for plans
Lamport timestamps                ← logical clock; orders causally-related events
CRDT (e.g., G-Counter, OR-Set)    ← order doesn't matter; merge is deterministic

I've watched a popular framework reinvent option 1 (silent overwrites), get burned, ship option 2 (calling it "agent-aware versioning"), get burned again on concurrent merges, and finally land on option 3 (calling it "convergent state"). The cycle took eight months. The original paper is from 1986.²

note If you're building a harness, read the CRDT survey by Shapiro et al. before you implement your shared-memory layer. It is genuinely much harder to retrofit consistency than to start with it.

What's actually new

Some things really are new:

The dependency is non-deterministic. RPC to a service returns the same answer for the same input (modulo state). RPC to an LLM does not. This breaks retry semantics in subtle ways — a "deterministic retry" isn't.
The cost model is bizarre. Latency you can model; per-token cost with caching makes the "is this retry free?" question much harder.
State is partially in natural language. You cannot diff two agents' worldviews with git diff. You can with embeddings, approximately, but the tooling is bad.

These deserve their own literature. But the other 80% is just systems engineering done in a louder room.

What to read

If you only read three things before writing your next harness:

Lamport, Time, Clocks, and the Ordering of Events in a Distributed System (1978). The original.
Shapiro et al., A comprehensive study of Convergent and Commutative Replicated Data Types (2011). The CRDT survey.
Nygard, Release It! (2007). Failure modes. The circuit-breaker chapter alone is worth the cover price.

I'm not against frameworks. I'm against pretending we're inventing what we're rediscovering.

A vector clock is a list of per-process counters that lets you establish a partial ordering of events in a distributed system. If that sentence sounds like every "agent context" feature you've seen shipped in 2025, you understand my point. ↩
Strictly: the foundational CRDT-ish work is from 1986 (Wuu & Bernstein on a replicated dictionary); the modern formalization is Shapiro et al. (2011). ↩

Building a deterministic agent loop in C# 13

2026-04-29T00:00:00+00:00

A walk-through of the agent loop I've been running in production, with the parts that surprised me marked.

C# 13's primary constructors and field keyword make this kind of code shorter than it used to be. Here's the minimum viable shape of an agent loop, fully deterministic given a fixed model temperature and a seedable tool layer.

The contract

An agent loop, at minimum, is:

public interface IAgent
{
    Task<AgentResult> RunAsync(
        AgentRequest request,
        CancellationToken ct = default);
}

Everything else is implementation detail. The request carries the seed context; the result carries the final output plus a trace.

The loop

public sealed class Agent(
    IModelClient model,
    IToolRegistry tools,
    AgentPolicy policy) : IAgent
{
    public async Task<AgentResult> RunAsync(
        AgentRequest request,
        CancellationToken ct = default)
    {
        var trace = new AgentTrace(request.Id);
        var conversation = request.SeedMessages.ToList();

        for (var step = 0; step < policy.MaxSteps; step++)
        {
            ct.ThrowIfCancellationRequested();

            var response = await model.ChatAsync(conversation, tools.Schema, ct);
            conversation.Add(response.Message);
            trace.Record(step, response);

            if (response.Message.ToolCalls is not { Count: > 0 } calls)
                return AgentResult.Final(response.Message.Content, trace);

            foreach (var call in calls)
            {
                var observation = await tools.InvokeAsync(call, ct);
                conversation.Add(ToolMessage.From(call, observation));
                trace.Record(call, observation);
            }
        }

        return AgentResult.Exhausted(trace);
    }
}

note Determinism comes from three places: (1) temperature=0 on the model client, (2) tools deterministic for a given input, and (3) policy.MaxSteps finite. Drop any one and you've got nondeterminism back.

What surprised me

Three things, in increasing order of how long it took me to notice.

1. Tool ordering inside a single turn matters

The loop above iterates tool calls in the order the model returned them. If a model returns [search, calculator] in one turn and the search depends on a value the calculator produces, you've lost. I've been splitting these across turns (one tool per response) and getting better behaviour, at the cost of latency.

2. The trace is the product

The thing I wish I'd known on day one: the trace — not the final answer — is what you debug from, what you replay from, what you cache against. Make it a first-class object from the start. Mine has shape:

public sealed record AgentTrace(Guid RunId)
{
    public ImmutableList<TraceEntry> Entries { get; init; } = [];
    public DateTimeOffset StartedAt { get; init; } = DateTimeOffset.UtcNow;
    public DateTimeOffset? FinishedAt { get; init; }
}

I serialize one of these per run. They're cheap, they replay exactly, and they're the only way I've found to debug a failed run two weeks later.

3. `IAsyncEnumerable` is right, but later

You'll be tempted to make RunAsync an IAsyncEnumerable so callers can stream. Resist for v1. Streaming complicates cancellation, cancellation complicates the breaker (yesterday's post), the breaker complicates everything. Ship the Task version first. Stream when the product team asks for it twice.

tip If you're going to expose streaming, do it as a separate RunStreamingAsync method that internally calls into the same primitives. Keep the simple path simple.

Where this goes

Next post: tool dispatch under contention — what happens when two agents in the same harness want the same tool at the same time, and how deterministic ordering survives.